Jiayi Zhou's Homepage


Jiayi Zhou (周嘉懿) Email: gaiejj@outlook.com
Hello! I’m a first-year PhD student at the Institute of Artificial Intelligence, Peking University, advised by Prof. Yaodong Yang. I previously conducted research on. My past research has focused on safe reinforcement learning, with an emphasis on algorithm libraries and environment construction. Now, I am broadening my focus to the fields of AI Safety and Alignment. Recently: AI Alignment: The current significant progress in AI systems, represented by Large Language Models (LLMs), does not stem from a deeper understanding of algorithms or models, but merely from the scaling up. This phenomenon may lead to a deviation of AI systems from human intentions and values, bringing about considerable safety risks. I am actively considering how to improve the trustworthiness, transparency, and safety of AI systems from three aspects: architecture, algorithms, and evaluation. Efficient Alignment with Rich Feedback: I am interested in designing more informativeh reward functions to improve alignment efficiency. In my previous research on Safe RL, I focused on designing optimization methods to ensure that agents can balance multi-dimensional reward functions, such as those related to utility and safety. With the current popularity of RLHF, I am now paying attention to how reward models can provide richer language feedback beyond scalar scores. I plan to expand my research into more impressive areas, such as large multimodal models.
Google Scholar \| GitHub

News

2024.12.14 Our paper: Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback have accepted by AAAI 2025 (Oral) (AI Alignment Track).

2024.09.15 Our framework: OmniSafe have accepted by JMLR 2024 (The most popular open-source Safe RL framework).

2023.10.30 Big News! We released AI Alignment: A Comprehensive Survey.

2023.09.27 Our benchmark: Safety-Gymnasium have accepted by NeurIPS 2023 (DB Track) (The most popular open-source Safe RL benchmark).

Publications

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

Jiayi Zhou*, Jiaming Ji*, Juntao Dai, and Yaodong Yang.

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI, AI Alignment Track), 2025, Oral, Top 6%.

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research [GitHub]

Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang.

Journal of Machine Learning Research (JMLR), 2024.

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [Website] [GitHub]

Jiaming Ji*, Borong Zhang*, Jiayi Zhou*, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, and Yaodong Yang.

Advances in Neural Information Processing Systems (NeurIPS), 2023.

AI Alignment: A Comprehensive Survey [Website]

Jiaming Ji*, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou,
Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer,
Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao.

Arxiv, 2024.

Language Models Resist Alignment [GitHub]

Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang.

Arxiv, 2024.

Reward Generalization in RLHF: A Topological Perspective

Tianyi Qiu*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang.

Arxiv, 2024.

News

Awards

Publications

Services