|
|
2025.09.19 Three papers accepted by NeurIPS 2025: Generative RLHF-V (Main), InterMT (DB Track, Spotlight), and Safe RLHF-V (Main)!
2025.08.04 Our paper: Language Models Resist Alignment: Evidence From Data Compression won the ACL 2025 Best Paper Award
2024.12.14 Our paper: Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback have accepted by AAAI 2025 (Oral) (AI Alignment Track).
2024.09.15 Our framework: OmniSafe have accepted by JMLR 2024 (The most popular open-source Safe RL framework).
2023.10.30 Big News! We released AI Alignment: A Comprehensive Survey.
2023.09.27 Our benchmark: Safety-Gymnasium have accepted by NeurIPS 2023 (DB Track) (The most popular open-source Safe RL benchmark).
As a newly enrolled Ph.D. student, I am striving to pursue these awards.
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou*, Jiaming Ji*, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS), 2025.
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
Boyuan Chen*, Donghai Hong*, Jiaming Ji*, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, Qirui Zheng, Wenxin Li, Sirui Han, Yike Guo, Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS, Database Track), 2025, Spotlight.
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Jiaming Ji, Xinyu Chen, Rui Pan, Conghui Zhang, Han Zhu, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Yida Tang, Sirui Han, Yike Guo, Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS), 2025.
Language Models Resist Alignment [GitHub]
Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou, Changye Li, Hantao Lou, Juntao Dai, Yunhuai Liu, Yaodong Yang.
ACL 2025, Best Paper Award.
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Jiayi Zhou*, Jiaming Ji*, Juntao Dai, and Yaodong Yang.
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI, AI Alignment Track), 2025, Oral, Top 6%.
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research [GitHub]
Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang.
Journal of Machine Learning Research (JMLR), 2024.
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [Website] [GitHub]
Jiaming Ji*, Borong Zhang*, Jiayi Zhou*, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, and Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS), 2023.
AI Alignment: A Comprehensive Survey [Website]
Jiaming Ji*, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou,
Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer,
Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao.
ACM CSUR, IF=28.0, 2025.
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang.
ACL 2025 Findings.
Reviewer for NeurIPS and ICLR.