Boxi Cao 曹博希

I am a Ph.D. Candidate (from 2019.09) in the Chinese Information Processing Laboratory at the Institute of Software, Chinese Academy of Sciences, under the Supervision of Professor Xianpei Han and Professor Le Sun. I received my Bachelor degree in Beijing University of Posts and Telecommunications in June 2019. My research interests include:

  • Natural Language Processing
  • Knowledge Lifecycle in Large Language Models
  • Alignment and Evaluation for LLMs

Contact: boxi2020 AT iscas dot ac dot cn

Google Scholar    /    Semantic Scholar    /    ACL Anthology   /    GitHub    /    Blog    /    Zhihu   /    Douban

News
06/2024 We release the first survey about "Automated Alignment of LLMs"!
05/2024 Three papers got accepted by ACL 2024.
10/2023 One first-authored paper got accepted by EMNLP 2023 main conference.
08/2023 We present a tutorial on CCKS 2023 about the life cycle of knowledge in big language models.
08/2023 Exceeded 100 citations on Google Scholar!
05/2023 One co-authored paper got accepted by ACL 2023 main conference.
12/2022 One first-authored survey paper got accepted by Machine Intelligence Research.
02/2022 One first-authored paper got accepted by ACL 2022 main conference.
02/2022 One co-authored paper got accepted by ACL 2022 main conference.
12/2021 Participated in building the benchmark CUGE as a core member.
12/2021 One paper got recommended by Micheal Galkin on Towards Data Science.
05/2021 One first-authored paper got accepted by ACL 2021 main conference.
Publications
2024
Towards Scalable Automated Alignment of LLMs: A Survey
Boxi Cao*, Keming Lu*, Xinyu Lu*, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu
arXiv preprint arXiv: 2406.01252 (2024)
Preprint/ Paperlist
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun
Findings of Association for Computational Linguistics (Findings of ACL 2024)
The Life Cycle of Knowledge in Big Language Models: A Survey
Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun
Machine Intelligence Research (2024)
Tutorial Slides / Paperlist / Paper / Preprint
Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
Boxi Cao, Qiaoyu Tang, Hongyu Lin, Xianpei Han, Jiawei Chen, Tianshu Wang, Le Sun
The 2024 International Conference on Computational Linguistics (COLING 2024)
Preprint
Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun
Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Preprint
Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering
Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun
Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Preprint
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation
Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun
arXiv preprint arXiv:2404.06809 (2024)
Preprint / Code / Models
Towards Universal Dense Blocking for Entity Resolution
Tianshu Wang, Hongyu Lin, Xianpei Han, Xiaoyang Chen, Boxi Cao, Le Sun
arXiv preprint arXiv:2404.14831 (2024)
Preprint
URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression
Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han
arXiv preprint arXiv:2404.16248 (2024)
Preprint
2023
Does the Correctness of Factual Knowledge Matter for Factual Knowledge-Enhanced Pre-trained Language Models?
Boxi Cao*, Qiaoyu Tang*, Hongyu Lin, Xianpei Han, Le Sun
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
Paper
Learning In-context Learning for Named Entity Recognition
Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao, Xianpei Han, Le Sun
Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Preprint / Paper
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, Le Sun
arXiv preprint arXiv:2306.05301 (2023)
Preprint
2022
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
Boxi Cao, Hongyu Lin, Xianpei Han, Fangchao Liu, Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Paper / Preprint / Code / Slides / Poster
Pre-training to Match for Unified Low-shot Relation Extraction
Fangchao Liu, Hongyu Lin, Xianpei Han, Boxi Cao, Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Paper / Preprint / Code
2021
Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases
Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun, Lingyong Yan, Meng Liao, Tong Xue, Jin Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021)
Paper / Preprint / Code / Slides / Poster
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, et al.
arXiv preprint arXiv:2112.13610 (2021)
Website / Preprint / Github / News
Academic Services
Reviewer/PC Member ACL 2023
EMNLP 2022, 2023
COLING 2022, 2024
Education
2019-Present Ph.D in Computer Software and Theory (Candidate)
School of Computer Science and Technology, University of Chinese Academy of Sciences
2015-2019 B. Eng in Computer Science and Technology
School of Computer Science, Beijing University of Posts and Telecommunications
2012-2015 Nanya Middle Shool of Changsha
Selected Honors and Awards
2022 Pacemaker to Merit Student, University of Chinese Academy of Sciences (Top 1%)
2021 Merit Student, University of Chinese Academy of Sciences
2019 Outstanding Graduates, Beijing Municipal Commission of Education
2018, 2016 Merit Student, Beijing University of Posts and Telecommunications
2017, 2016 First-class Scholarship, Beijing University of Posts and Telecommunications
2017 Outstanding Student Cadre, Beijing University of Posts and Telecommunications
2016 Merit Student, Beijing Municipal Commission of Education
Selected Competition Awards
2022 Third Prize, Language and Intelligence Challenge - Sentiment Interpretation Task (LIC 2022)
2017, 2016 Bronze Medal, China Collegiate Programming Contest (ACM-CCPC)
2017 Second Prize, Group Programming Lodder Tournament (CCCC-GPLT)
2017 Second Prize, China Collegiate Cloud Computing Application and Innovation Competition