A new study from Google DeepMind and several US universities shows that most benchmarks for AI-generated code don't really match what developers value. Instead of only checking whether code works, the ...