About me

Brief Intro

Hi I am Guiming Hardy Chen (preferred Hardy), a self-motivated researcher in NLP/AI. I just obtained my Bachelor degree in Statistics-Data Science from School of Data Science, CUHKSZ . Currently I am a full-time RA supervised by Professor Benyou Wang, working on LLM-related topics. Starting from October 2022, I have been working with Professor Wang on VLMs and LLMs and have completed several projects. During 2022 summer, I conducted a course project and a research project on word embeddings. In 2021 Fall, I researched on backdoor attacks with Professor Baoyuan Wu, from whom I grew my interest in security topics in AI.

Currently, my research interests are:

  • developing effective and efficient LLMs
  • robustness and security issue of LLMs/VLMs

I love discussion and collaboration. Please dropme an email if you are interested in my work!

Education

B.S. in Statistics, The Chinese University of Hong Kong, Shenzhen, 2019-2023

Awards

Undergraduate Research Awards, selected with funding, 2023

Outstanding Contribution Award, School of Data Science, 2023

Excellent Peer Advisor Award, School of Data Science, 2023

Dean’s List, School of Data Science, AY2021-2022

Undergraduate Research Awards, selected with funding, 2022

Silver Prize, Global AI Challenge, 2022


News


[Hiring!] NLP lab at CUHKSZ led by Prof. Haizhou Li and Prof. Benyou Wang is hiring! Checkout the JD.

[04/2024] We release MileBench, a pioneering dataset for evaluating the long-context ability of Image/Video comprehension models.

[02/2024] We release ALLaVA, a large-scale multimodal dataset with ~1.2M samples for LVLM training! I am very fortunate to lead this project!

[11/2023] We release MLLM-Bench, the first multimodal benchmark that leverages GPT-4V as the judge for open-ended questions. The 420 questions are constructed and classified into 6 cognitive levels in the Revised Bloom’s Taxonomy.

[05/2023] My FIRST paper got accepted to the Findings of ACL2023! We found that CLIP-style text encoders are better than BERT-style ones in associating visual and textual information. Arxiv version will soon be released. Thanks Zhihong!

[04/2023] Undergraduate Research Awards with funding, CUHKSZ

[04/2023] Democratizing ChatGPT never stops!!! Phoenix trained by our lab was published, along with LLMZoo, the first benchmark for evaluating LLMs.

[01/2023] Outstanding Contribution Award, School of Data Science

[01/2023] Excellent Peer Advisor Award, School of Data Science

Publications


  • HuatuoGPT, towards Taming Language Model to Be a Doctor
    Hongbo Zhang, Junying Chen, Feng Jiang*, Fei Yu, Zhihong Chen, Jianquan Li, Guiming Chen, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li
    EMNLP 2023 Findings.
    [Paper] [Code]

  • On the Difference of BERT-style and CLIP-style Text Encoders
    Zhihong Chen*, Guiming Hardy Chen*, Shizhe Diao, Xiang Wan and Benyou Wang
    ACL 2023 Findings.
    [Paper] [Code]

Preprints


  • MileBench: Benchmarking MLLMs in Long Context
    Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang
    [Paper] [Code] [Project Page]

  • ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model
    Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang
    [Paper] [Code] [demo]

  • Humans or LLMs as the Judge? A Study on Judgement Biases
    Guiming Hardy Chen*, Shunian Chen*, Ziche Liu, Feng Jiang, Benyou Wang
    [Paper]

  • CMB: A Comprehensive Medical Benchmark in Chinese
    Xidong Wang*, Guiming Hardy Chen*, Dingjie Song*, Zhiyi Zhang*, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li
    [Paper] [Code] [Website]

  • Phoenix: Democratizing ChatGPT across Languages
    Zhihong Chen, Feng Jiang, Junying Chen, Tiannan Wang, Fei Yu, Guiming Chen, Hongbo Zhang, Juhao Liang, Chen Zhang, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li
    [Paper] [Code] [Chat with Phoenix]

Experience


10/2022 - present: I am working as an RA (research assistant) at the NLP lab of CUHKSZ, under the supervision of Zhihong and Prof. Wang Benyou. I have participated in three projects (bert-clip-synesthesia, phoenix and huatuo) and learned a lot from my buddies.

05/2023: I worked with Ziqi on an NMT project, which is a part of the course DDA4220 (Deep Learning and Application). I designed experiments to reveal the laziness of Bahdanau RNN and investigate the impact of teacher-forcing ratio on its training. I also trained 16 vanilla Transformers from scratch to see how the imbalance of the number of encoder and decoder layers affects its performance. Lastly, I prompted phoenix and chimera and compared their zero-shot translation results.
[Report] [Code]

06/2022 - 09/2022: I worked as an RA with Prof. Shen Zhiqiang@HKUST. We worked on a project on improving GloVe embedding.

03/2022 - 05/2022: I worked (almost) individually on a course project, where I proposed an embedding framework based on the PageRank algorithm. This was the very start of my NLP journey.

02/2022 - 04/2022: I worked as an RA with Prof. Li Zhaoyuan@CUHKSZ and Prof. Liu Sibo@HKBU, focusing on collecting and analyzing financial data.

06/2021 - 09/2021: I worked as an RA with Prof. Wu Baoyuan@SCLBD@SRIBD on backdoor attack methods. This was my first taste of research. Thank you Prof. Wu!

10/2020 - 12/2021: I worked as a student assistant@SCLBD@SRIBD where I met awesome people and had some unforgettable experience of my first job. Thank you Lori for introducing me to Prof. Wu.