Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 7 days ago • 39
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 14 days ago • 37
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14 • 29
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations Paper • 2311.08469 • Published Nov 14, 2023 • 10