The title pretty much covers it. Twitter is more like an IM network or email list service than like a typical database-backed webapp; a relational database is just the wrong platform for large-scale messaging.
I agree, of course. But before the twitter bashing starts, I wanna say, I don't think that they foresaw messaging being the primary use-case. After all, you already have IM and SMS and a gillion other messaging systems. Twitter was conceived as a blogging platform, but smaller. Messaging was how you would post to twitter, not what you would do with twitter.
Of course, that was wrong. (How annoying are the twitter users who actually "answer the question: What are you doing?") But that kind of insight only comes out once your product is exposed to real humans. It's obvious, but only in hindsight.
As it turns out, when you make LiveJournal really small, you get Jabber.
these queries are not optimised at all. first of all, every page gives of MySQL warnings...and second, there are alot of SELECT count(*)...instead of SELECT count(1). This may not seem like it would increase speed alot but given the number of users and the amount of times that query is executed, it will speed up twitter by a bit.
There are a few other things that i have noticed...they should really clean up their sql
Most likely whoever wrote it cut their teeth on MySQL 4.0 or earlier which didn't have subselects.
They don't do this so much now, but back in the day (mid-late 90s) MySQL documentation was notorious for glossing over why they didn't have features. Foreign keys were "too slow". Transactions were "too slow", etc. If you need to rollback, store the previous values in memory in your own code, they told everyone (then quietly added the feature and changed the docs).
Why roundtrip to the db for the same user info again? I have seen this type of behavior in data access "frameworks"... the optimization would be to reduce the roundtrip to the db...
It's really amazing what poorly-written frameworks have done to the minds of people. When I teach classes on DBIx::Class (the Perl ORM), my students are shocked that $foo->bar->baz actually generates the SQL to follow the two relations as opposed to the "easy" "get list of results, then run query on each result, then return an array...". Actually, $foo->bar->baz, by itself, doesn't touch the database. The database is only contacted when you actually try to get information out of the result object. (And of course, the database is only queried once.)
Not AFAICS; count( * ) and count( 1 ) are trivially equivalent, and this doesn't depend on the storage engine being used. It is true that count( * ) without a WHERE clause is much faster in MyISAM than in Innodb (or Postgres), but that is a different optimization (count( * ) => metadata lookup, not count( 1 ) => count( * )).
Mentioned elsewhere in this thread, but count() and count(1) mean basically the same thing and are equivalent.
There is a nice optimisation in MYISAM table that makes count() from table <with no where clause> almost instant as its stored as meta-data on the table - in InnoDB, Oracle or Postgres this optimisation doesn't exist ...
It was basically a semi-transparent box in the top left of the screen which showed you the query as well as which DB served the data (for me they came from db007, and the friend data from some shards), and two links, one for a popdown box with info about the query, and another with the file trace infomation.
uptime was much better, and reaction was much quicker and to the point. with recent switchover to Google infra they are almost non-stop -- only Google is not that stupid to pay for your messaging ;)