hendrycks
APPS: Automated Programming Progress Standard (NeurIPS 2021)
bigcode-project
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI