Benchmark

Benchmark for evaluating persona-sensitive influencing in persuasive dialogues; under review at EMNLP 2026.

May 15, 2026

Benchmark suite and analysis framework for evaluating collaboration and competition among LLM Multi Agents; accepted to ACL 2025 Main.

Feb 15, 2025

Benchmark + agent framework for evaluating creative reasoning in room-escape environments; ACL 2025 main-conf paper.

Dec 15, 2024