Benchmark suite and analysis framework for evaluating collaboration and competition among LLM Multi Agents; accepted to ACL 2025 Main.
Feb 15, 2025
Benchmark + agent framework for evaluating creative reasoning in room-escape environments; ACL 2025 main-conf paper.
Dec 15, 2024