Boxi Cao's Homepage

Boxi Cao 曹博希

I am a Ph.D. Candidate (from 2019.09) in the Chinese Information Processing Laboratory at the Institute of Software, Chinese Academy of Sciences, under the Supervision of Professor Xianpei Han and Professor Le Sun. I received my Bachelor degree in Beijing University of Posts and Telecommunications in June 2019. My research interests include:

Natural Language Processing
Knowledge Lifecycle in Large Language Models
Alignment and Evaluation for LLMs

Contact: boxi2020 AT iscas dot ac dot cn

Google Scholar / Semantic Scholar / ACL Anthology / GitHub / Blog / Zhihu / Douban

News

08/2024	Proud to announce that our paper "Spiral of Silences" has won the Area Chair Award in ACL 2024.
08/2024	Glad to introduce StructEval , which provides more reliable and consistent evaluation for LLMs.
07/2024	We release RACE benchmark, a multi-dimensional benchmark for code generation.
06/2024	We release the first survey about "Automated Alignment of LLMs"!
05/2024	Three papers got accepted by ACL 2024.
10/2023	One first-authored paper got accepted by EMNLP 2023 main conference.
08/2023	We present a tutorial on CCKS 2023 about the life cycle of knowledge in big language models.

Publications

2024

Towards Scalable Automated Alignment of LLMs: A Survey
Boxi Cao*, Keming Lu*, Xinyu Lu*, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu
arXiv preprint arXiv: 2406.01252 (2024)
Preprint/ Paperlist

StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun
Findings of Association for Computational Linguistics (Findings of ACL 2024)
Preprint / Benchmark / Leaderboard

The Life Cycle of Knowledge in Big Language Models: A Survey
Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun
Machine Intelligence Research (2024)
Tutorial Slides / Paperlist / Paper / Preprint

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
Boxi Cao, Qiaoyu Tang, Hongyu Lin, Xianpei Han, Jiawei Chen, Tianshu Wang, Le Sun
The 2024 International Conference on Computational Linguistics (COLING 2024)
Preprint

Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun
Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Preprint

Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xinapei Han, Le Sun
arXiv preprint arXiv:2407.11470 (2024)
Preprint / Benchmark / Leaderboard

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation
Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)
Preprint / Code / Models

Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering
Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun
Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Area Chair Award
Preprint

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin
arXiv preprint arXiv:2407.11470 (2024)
Preprint / Paperlist

Towards Universal Dense Blocking for Entity Resolution
Tianshu Wang, Hongyu Lin, Xianpei Han, Xiaoyang Chen, Boxi Cao, Le Sun
arXiv preprint arXiv:2404.14831 (2024)
Preprint

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression
Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han
arXiv preprint arXiv:2404.16248 (2024)
Preprint

2023

Does the Correctness of Factual Knowledge Matter for Factual Knowledge-Enhanced Pre-trained Language Models?
Boxi Cao*, Qiaoyu Tang*, Hongyu Lin, Xianpei Han, Le Sun
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
Paper

Learning In-context Learning for Named Entity Recognition
Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao, Xianpei Han, Le Sun
Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Preprint / Paper

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, Le Sun
arXiv preprint arXiv:2306.05301 (2023)
Preprint

2022

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
Boxi Cao, Hongyu Lin, Xianpei Han, Fangchao Liu, Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Paper / Preprint / Code / Slides / Poster

Pre-training to Match for Unified Low-shot Relation Extraction
Fangchao Liu, Hongyu Lin, Xianpei Han, Boxi Cao, Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Paper / Preprint / Code

2021

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases
Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun, Lingyong Yan, Meng Liao, Tong Xue, Jin Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021)
Paper / Preprint / Code / Slides / Poster

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, et al.
arXiv preprint arXiv:2112.13610 (2021)
Website / Preprint / Github / News

Academic Services

Reviewer/PC Member

ACL 2023
EMNLP 2022, 2023
COLING 2022, 2024

Education

2019-Present	Ph.D in Computer Software and Theory (Candidate) School of Computer Science and Technology, University of Chinese Academy of Sciences
2015-2019	B. Eng in Computer Science and Technology School of Computer Science, Beijing University of Posts and Telecommunications
2012-2015	Nanya Middle Shool of Changsha

Selected Honors and Awards

2022	Pacemaker to Merit Student, University of Chinese Academy of Sciences (Top 1%)
2021	Merit Student, University of Chinese Academy of Sciences
2019	Outstanding Graduates, Beijing Municipal Commission of Education
2018, 2016	Merit Student, Beijing University of Posts and Telecommunications
2017, 2016	First-class Scholarship, Beijing University of Posts and Telecommunications
2017	Outstanding Student Cadre, Beijing University of Posts and Telecommunications
2016	Merit Student, Beijing Municipal Commission of Education

Selected Competition Awards

2022	Third Prize, Language and Intelligence Challenge - Sentiment Interpretation Task (LIC 2022)
2017, 2016	Bronze Medal, China Collegiate Programming Contest (ACM-CCPC)
2017	Second Prize, Group Programming Lodder Tournament (CCCC-GPLT)
2017	Second Prize, China Collegiate Cloud Computing Application and Innovation Competition