Benchmark

MultiAgentBench: Evaluating the Collaboration and Competition of LLM Agents

MultiAgentBench: Evaluating the Collaboration and Competition of LLM Agents

Benchmark suite and analysis framework for evaluating collaboration and competition among LLM Multi Agents; accepted to ACL 2025 Main.

Feb 15, 2025

EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents

EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents

Benchmark + agent framework for evaluating creative reasoning in room-escape environments; ACL 2025 main-conf paper.

Dec 15, 2024