Xi YE ( 叶曦 )

(preprint) Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks code

Yoonsang Lee, Howard Yen, Xi Ye, and Danqi Chen. Arxiv 2026.

(preprint) Detecting and Suppressing Reward Hacking with Gradient Fingerprints code

Songtao Wang, Quang Hieu Pham, Fangcong Yin, Xinpeng Wang, Jocelyn Qiaochu Chen, Greg Durrett, and Xi Ye. Arxiv 2026.

(preprint) DySCO: Dynamic Attention-Scaling Decoding for Long-Context LMs code

Xi Ye*, Wuwei Zhang*, Fangcong Yin, Howard Yen, and Danqi Chen. Arxiv 2026.

(preprint) Advancing General-Purpose Reasoning Models with Modular Gradient Surgery code

Min Cai, Yu Liang, Longzheng Wang, Yan Wang, Yueyang Zhang, Long Xia, Zhiyuan Sun, Xi Ye, and Daiting Shi. Arxiv 2026.

Beyond Single-shot Writing: Deep Research Agents are Unreliable at Multi-turn Report Revision code

Bingsen Chen*, Boyan Li*, Ping Nie, Yuyu Zhang, Xi Ye^#, and Chen Zhao^#. Proceedings of ACL 2026.

(preprint) Language Models that Think, Chat Better code

Adithya Bhaskar*, Xi Ye*, and Danqi Chen. Arxiv 2025.

(preprint) Learning Composable Chains-of-Thought code

Fangcong Yin, Zeyu Leo Liu, Liu Leqi, Xi Ye, and Greg Durrett. Arxiv 2025.

(preprint) Learning to Reason Across Parallel Samples for LLM Reasoning

Jianing Qi, Xi Ye, Hao Tang, Zhigang Zhu, and Eunsol Choi. Arxiv 2025.

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking code

Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, and Xi Ye. Proceedings of EMNLP 2025.

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation code

Xi Ye, Fangcong Yin*, Yinghui He*, Joie Zhang*, Howard Yen*, Tianyu Gao, Greg Durrett, and Danqi Chen. Proceedings of COLM 2025.

Inter-Passage Verification for Multi-evidence Multi-answer QA code

Bingsen Chen, Shengjie Wang, Xi Ye, and Chen Zhao. Findings of ACL 2025.

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, and Greg Durrett. Proceedings of ICLR 2025.

LoFiT: Localized Fine-tuning on LLM Representations code

Fangcong Yin, Xi Ye, and Greg Durrett. Proceedings of NeurIPS 2024.

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, and Linda Ruth Petzold. Proceedings of AAAI 2025 (oral).

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name website

Yoonsang Lee, Xi Ye, and Eunsol Choi. Proceedings of COLM 2024.

(preprint) CodeUpdateArena: Benchmarking Knowledge Editing on API Updates code

Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, and Greg Durrett. ArXiv 2024.

Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

Xi Ye, Ruoxi Sun, Sercan Ö. Arik, and Tomas Pfister. Proceedings of NAACL 2024.

Crafting In-context Examples according to LMs' Parametric Knowledge code

Yoonsang Lee*, Pranav Atreya*, Xi Ye, and Eunsol Choi. Findings of NAACL 2024.

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning website

Zayne Sprague, Xi Ye, Kaj Bostrom, Swarat Chaudhuri, and Greg Durrett. Proceedings of ICLR 2024 (spotlight).

SatLM: Satisfiability-Aided Language Models Using Declarative Prompting code

Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. Proceedings of NeurIPS 2023.

Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting code

Xi Ye, and Greg Durrett. Proceedings of EMNLP 2023.

Complementary Explanations for Effective In-Context Learning code

Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, and Ramakanth Pasunuru. Findings of ACL, 2023.

EEL: Efficiently Encoding Lattices for Reranking code

Prasann Singhal, Jiacheng Xu, Xi Ye, and Greg Durrett. Proceedings of ACL, 2023.

Assessing Out-of-Domain Language Model Performance from Few Examples

Prasann Singhal*, Jarad Forristal*, Xi Ye, and Greg Durrett. Proceedings of EACL 2023.

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning code

Xi Ye and Greg Durrett. Proceedings of NeurIPS 2022.

Can Explanations Be Useful for Calibrating Black Box Models? code

Xi Ye and Greg Durrett. Proceedings of ACL 2022.

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering code

Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, and Caiming Xiong. Proceedings of ACL 2022.

Diagnosing Ensemble Few-Shot Classifiers demo

Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, and Shixia Liu. Transactions of TVCG 2022.

Connecting Attributions and QA Model Behavior on Realistic Counterfactuals code

Xi Ye, Rohan Nair, and Greg Durrett. Proceedings of EMNLP 2021.

Optimal Neural Program Synthesis from Multimodal Specifications code

Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. Findings of EMNLP 2021.

Benchmarking Multimodal Regex Synthesis with Complex Structures code data

Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. Proceedings of ACL 2020.

Sketch-Driven Regular Expression Generation from Natural Language and Examples code

Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, and Greg Durrett. Transactions of ACL 2020.

Multi-Modal Synthesis of Regular Expressions code

Qiaochu Chen, Xinyu Wang, Xi Ye , Greg Durrett, and Isil Dillig. Proceedings of PLDI 2020

Interactive Correction of Mislabeled Training Data video

Shouxing Xiang*, Xi Ye*, Jiazhi Xia, Jing Wu, Yang Chen, and Shixia Liu. Proceedings of VAST 2019

Xi Ye
叶曦

Publications

Service

Notes for Prospective Students and Visitors

Xi Ye 叶曦

Publications

Service

Notes for Prospective Students and Visitors

Xi Ye
叶曦