See Twitter's SQL.

ojbyrne · on July 7, 2008

It seems like twitter is taking fail to new levels every day.

boredguy8 · on July 7, 2008

I hate to upmod something so derisive, but when you have truth on your side...

IsaacSchlueter · on July 7, 2008

They really weren't kidding when they said that they'd mistakenly built it as a CMS instead of as a messaging platform.

shimon · on July 7, 2008

The title pretty much covers it. Twitter is more like an IM network or email list service than like a typical database-backed webapp; a relational database is just the wrong platform for large-scale messaging.

IsaacSchlueter · on July 7, 2008

I agree, of course. But before the twitter bashing starts, I wanna say, I don't think that they foresaw messaging being the primary use-case. After all, you already have IM and SMS and a gillion other messaging systems. Twitter was conceived as a blogging platform, but smaller. Messaging was how you would post to twitter, not what you would do with twitter.

Of course, that was wrong. (How annoying are the twitter users who actually "answer the question: What are you doing?") But that kind of insight only comes out once your product is exposed to real humans. It's obvious, but only in hindsight.

As it turns out, when you make LiveJournal really small, you get Jabber.

jonknee · on July 7, 2008

And sadly it's not even close to large-scale yet. They only have a few hundred thousand users.

immad · on July 7, 2008

Did anyone get a screenshot before they disabled it?

Jax · on July 7, 2008

Yeah... screenshot please!!! And I'm also wondering how this ink was found

thomasswift · on July 7, 2008

here's a screenshot. i didn't include the table stuffs. http://skitch.com/thomasswift/xef3/twitter

dcurtis · on July 7, 2008

What are they using to analyze the queries like this? It looks pretty useful.

jzell · on July 7, 2008

They are using query reviewer

http://code.google.com/p/query-reviewer/

socksandsandals · on July 7, 2008

I dimed this out and had it disabled. Twitter doesn't need people poking through its' SQL. Imagine what we'd find if we were poking through yours...

amarcus · on July 7, 2008

these queries are not optimised at all. first of all, every page gives of MySQL warnings...and second, there are alot of SELECT count(*)...instead of SELECT count(1). This may not seem like it would increase speed alot but given the number of users and the amount of times that query is executed, it will speed up twitter by a bit. There are a few other things that i have noticed...they should really clean up their sql

sharksandwich · on July 7, 2008

The search page looks particularly sloppy. Searching 'john' yields 27 queries, including a number that look redundant

Grabbing the users

SELECT * FROM `users` WHERE (users.id in (<list of user ids>))

followed by queries for each user id in that list

SELECT * FROM `users` WHERE (`users`.`id` = <user id>)

Maybe there's a reason for doing that, but if there is, I can't think of it

gaius · on July 7, 2008

Most likely whoever wrote it cut their teeth on MySQL 4.0 or earlier which didn't have subselects.

They don't do this so much now, but back in the day (mid-late 90s) MySQL documentation was notorious for glossing over why they didn't have features. Foreign keys were "too slow". Transactions were "too slow", etc. If you need to rollback, store the previous values in memory in your own code, they told everyone (then quietly added the feature and changed the docs).

on July 7, 2008

[deleted]

senthil_rajasek · on July 7, 2008

Why roundtrip to the db for the same user info again? I have seen this type of behavior in data access "frameworks"... the optimization would be to reduce the roundtrip to the db...

jrockway · on July 7, 2008

s/frameworks/poorly-written frameworks/.

It's really amazing what poorly-written frameworks have done to the minds of people. When I teach classes on DBIx::Class (the Perl ORM), my students are shocked that $foo->bar->baz actually generates the SQL to follow the two relations as opposed to the "easy" "get list of results, then run query on each result, then return an array...". Actually, $foo->bar->baz, by itself, doesn't touch the database. The database is only contacted when you actually try to get information out of the result object. (And of course, the database is only queried once.)

there · on July 7, 2008

count(*) is optimized by mysql

http://www.mysqlperformanceblog.com/2007/04/10/count-vs-coun...

jey · on July 7, 2008

Good, because that would be really pathetic to not perform the simplest dead code elimination.

amarcus · on July 7, 2008

Only for MyISAM tables...its a different story with Innodb: http://www.mysqlperformanceblog.com/2006/12/01/count-for-inn...

neilc · on July 7, 2008

Not AFAICS; count( * ) and count( 1 ) are trivially equivalent, and this doesn't depend on the storage engine being used. It is true that count( * ) without a WHERE clause is much faster in MyISAM than in Innodb (or Postgres), but that is a different optimization (count( * ) => metadata lookup, not count( 1 ) => count( * )).

fendale · on July 7, 2008

Mentioned elsewhere in this thread, but count() and count(1) mean basically the same thing and are equivalent.

There is a nice optimisation in MYISAM table that makes count() from table <with no where clause> almost instant as its stored as meta-data on the table - in InnoDB, Oracle or Postgres this optimisation doesn't exist ...

wallflower · on July 7, 2008

Boy this demystifies the social animal that is twitter. The test database is very small.

ashleyw · on July 7, 2008

Wish I'd taken a screenshot now.

It was basically a semi-transparent box in the top left of the screen which showed you the query as well as which DB served the data (for me they came from db007, and the friend data from some shards), and two links, one for a popdown box with info about the query, and another with the file trace infomation.

Quite intresting, shame they have removed it.

dmoney · on July 7, 2008

I give up. How do I enable the analyzer?

immad · on July 7, 2008

I think you missed it. It seems to have disappeared.

jamesjyu · on July 7, 2008

Seems like this was a mistake by twitter: http://getsatisfaction.com/twitter/topics/staging_twitter_co...

Some profile pages are being linked to the staging server.

jamesjyu · on July 7, 2008

Okay it's down now. They put up the whale page on staging and probably just disabled the plugin.

bluelu · on July 7, 2008

Thanks to Twitter's chief architect for giving me a good laugh :)

st3fan · on July 7, 2008

Wouldn't a working service be more interesting than this thing? Or are they expecting us to fix their code?

wallflower · on July 7, 2008

Too bad we can't run 'explain plan' on them :)

ojbyrne · on July 7, 2008

Explain Plan? Isn't that oracle? I ask because there is an "EXPLN" link that gives you what I think you're asking for.

wallflower · on July 7, 2008

Saw that after I posted that, thanks.

newt0311 · on July 7, 2008

postgres has something similar. Pretty nice for benchmarking and optimizing queries.

wensing · on July 7, 2008

Postgres: EXPLAIN ANALYZE.

0x44 · on July 7, 2008

In MySQL it's just "EXPLAIN <YOUR QUERY>".

Jax · on July 7, 2008

Before being acuired by Google, does anyone know Jaiku's uptime and other stats? Were they too this bad as Twitter?

silpol · on July 8, 2008

uptime was much better, and reaction was much quicker and to the point. with recent switchover to Google infra they are almost non-stop -- only Google is not that stupid to pay for your messaging ;)

bdotdub · on July 7, 2008

Yeesh. This does not seem safe.

At least maybe (big maybe) some good will come of this?

bprater · on July 7, 2008

Mmmm... this should be juicy hacker fun for a few days!

jmorin007 · on July 7, 2008

Days? I think we're already into negative minutes...

immad · on July 7, 2008

I wonder how many minutes to techcrunching...

on July 7, 2008

[dead]

Jax · on July 7, 2008

Nachiket, That should have been "Hello Babloo"...

Anyway... I didnt understand that... "Kolega to bolega". What does it mean? Open the bottle and then it will talk?