Benchmark + agent framework for evaluating creative reasoning in room-escape environments; ACL 2025 main-conf paper.
Dec 15, 2024