AIR-ML
Home
Research
News
Team
Project
Publication
Position
Contact
Article
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
We propose AutoDefense, a response-filtering based multi-agent defense framework that filters harmful responses from LLMs.
Yifan Zeng
,
Yiran Wu
,
Xiao Zhang
,
Huazheng Wang
,
Qingyun Wu
PDF
Cite
Code
ArXiv
Transferable Availability Poisoning Attacks
We propose an availability poisoning attack for generating transferable poisoned data across different victim learners.
Yiyong Liu
,
Michael Backes
,
Xiao Zhang
PDF
Cite
Code
ArXiv
«
Cite
×