Unfortunately, this only shows one side of the equation. Namely, the internet behemoth side.
If you're Google, Yahoo! or one of their friends, you can get away with relying just on correlations extracted directly from data. After all, you have all the data you could possibly want, and if you don't have, you can easily measure it in a straightforward way.
Everybody else, however, has to do a much better job of developing the right algorithms and insights to get the upper hand. The best way to do this of course, is to use whatever data you manage to scrape together.
Luckily, they also seem to recognize that sometimes data just isn't enough and ask for help. You've seen this in the Netflix prize, the AOL search log debacle and more recently in Microsoft's release of search logs for WSCD09.
The Netflix prize is a contest to discover how much you can do with pure data. I don't see how you can place it on the semantic web side of things when they don't do any tagging etc.
If you're Google, Yahoo! or one of their friends, you can get away with relying just on correlations extracted directly from data. After all, you have all the data you could possibly want, and if you don't have, you can easily measure it in a straightforward way.
Everybody else, however, has to do a much better job of developing the right algorithms and insights to get the upper hand. The best way to do this of course, is to use whatever data you manage to scrape together.
Luckily, they also seem to recognize that sometimes data just isn't enough and ask for help. You've seen this in the Netflix prize, the AOL search log debacle and more recently in Microsoft's release of search logs for WSCD09.