When Benchmarks Grow Up: Kudos to OpenAI (and why I still reach for Claude)
OpenAI just dropped something genuinely useful: GDPval, a benchmark that measures how AI models perform on real work. Documents, slides, spreadsheets, multimedia. The messy stuff actual humans ship every day. Not math quizzes. Not coding puzzles. Actual job tasks across 44 occupations in 9 GDP-heavy industries, assembled by domain pros