You're right, but I don't think we're getting an hour's worth of work out of single prompts yet. Usually it's an hour's worth of work out of 10 prompts for iteration. Now that's a day's wage for an hour of work. I'm certain the crossover will come soon, but it doesn't feel there yet.
5-10 years? The human panel cost/task is $17 with 100% score. Deep Think is $13.62 with 84.6%. 20% discount for 15% lower score. Sorry, what am I missing?
It’s not that I want to achieve world domination (imagine how much work that would be!), it’s just that it’s the inevitable path for AI and I’d rather it be me than then next shmuck with a Claude Max subscription.
$13.62 per task - so we need another 5-10 years for the price to run this to become reasonable?
But the real question is if they just fit the model to the benchmark.