AIR-ML
Home
Research
News
Team
Project
Publication
Position
Contact
RLHF
GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis
we develop a novel framework for crafting generalizable backdoors in RLHF through emotion-aware trigger synthesis
Subrat Kishore Dutta
,
Yuelin Xu
,
Piyush Pant
,
Xiao Zhang
PDF
Cite
ArXiv
Cite
×