Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
See Twitter's SQL. (twitter.com/dpn)
49 points by ptm on July 7, 2008 | hide | past | favorite | 43 comments


It seems like twitter is taking fail to new levels every day.


I hate to upmod something so derisive, but when you have truth on your side...


They really weren't kidding when they said that they'd mistakenly built it as a CMS instead of as a messaging platform.


The title pretty much covers it. Twitter is more like an IM network or email list service than like a typical database-backed webapp; a relational database is just the wrong platform for large-scale messaging.


I agree, of course. But before the twitter bashing starts, I wanna say, I don't think that they foresaw messaging being the primary use-case. After all, you already have IM and SMS and a gillion other messaging systems. Twitter was conceived as a blogging platform, but smaller. Messaging was how you would post to twitter, not what you would do with twitter.

Of course, that was wrong. (How annoying are the twitter users who actually "answer the question: What are you doing?") But that kind of insight only comes out once your product is exposed to real humans. It's obvious, but only in hindsight.

As it turns out, when you make LiveJournal really small, you get Jabber.


And sadly it's not even close to large-scale yet. They only have a few hundred thousand users.


Did anyone get a screenshot before they disabled it?


Yeah... screenshot please!!! And I'm also wondering how this ink was found


here's a screenshot. i didn't include the table stuffs. http://skitch.com/thomasswift/xef3/twitter


What are they using to analyze the queries like this? It looks pretty useful.


They are using query reviewer

http://code.google.com/p/query-reviewer/


I dimed this out and had it disabled. Twitter doesn't need people poking through its' SQL. Imagine what we'd find if we were poking through yours...


these queries are not optimised at all. first of all, every page gives of MySQL warnings...and second, there are alot of SELECT count(*)...instead of SELECT count(1). This may not seem like it would increase speed alot but given the number of users and the amount of times that query is executed, it will speed up twitter by a bit. There are a few other things that i have noticed...they should really clean up their sql


The search page looks particularly sloppy. Searching 'john' yields 27 queries, including a number that look redundant

Grabbing the users

SELECT * FROM `users` WHERE (users.id in (<list of user ids>))

followed by queries for each user id in that list

SELECT * FROM `users` WHERE (`users`.`id` = <user id>)

Maybe there's a reason for doing that, but if there is, I can't think of it


Most likely whoever wrote it cut their teeth on MySQL 4.0 or earlier which didn't have subselects.

They don't do this so much now, but back in the day (mid-late 90s) MySQL documentation was notorious for glossing over why they didn't have features. Foreign keys were "too slow". Transactions were "too slow", etc. If you need to rollback, store the previous values in memory in your own code, they told everyone (then quietly added the feature and changed the docs).


[deleted]


Why roundtrip to the db for the same user info again? I have seen this type of behavior in data access "frameworks"... the optimization would be to reduce the roundtrip to the db...


s/frameworks/poorly-written frameworks/.

It's really amazing what poorly-written frameworks have done to the minds of people. When I teach classes on DBIx::Class (the Perl ORM), my students are shocked that $foo->bar->baz actually generates the SQL to follow the two relations as opposed to the "easy" "get list of results, then run query on each result, then return an array...". Actually, $foo->bar->baz, by itself, doesn't touch the database. The database is only contacted when you actually try to get information out of the result object. (And of course, the database is only queried once.)



Good, because that would be really pathetic to not perform the simplest dead code elimination.


Only for MyISAM tables...its a different story with Innodb: http://www.mysqlperformanceblog.com/2006/12/01/count-for-inn...


Not AFAICS; count( * ) and count( 1 ) are trivially equivalent, and this doesn't depend on the storage engine being used. It is true that count( * ) without a WHERE clause is much faster in MyISAM than in Innodb (or Postgres), but that is a different optimization (count( * ) => metadata lookup, not count( 1 ) => count( * )).


Mentioned elsewhere in this thread, but count() and count(1) mean basically the same thing and are equivalent.

There is a nice optimisation in MYISAM table that makes count() from table <with no where clause> almost instant as its stored as meta-data on the table - in InnoDB, Oracle or Postgres this optimisation doesn't exist ...


Boy this demystifies the social animal that is twitter. The test database is very small.


Wish I'd taken a screenshot now.

It was basically a semi-transparent box in the top left of the screen which showed you the query as well as which DB served the data (for me they came from db007, and the friend data from some shards), and two links, one for a popdown box with info about the query, and another with the file trace infomation.

Quite intresting, shame they have removed it.


I give up. How do I enable the analyzer?


I think you missed it. It seems to have disappeared.


Seems like this was a mistake by twitter: http://getsatisfaction.com/twitter/topics/staging_twitter_co...

Some profile pages are being linked to the staging server.


Okay it's down now. They put up the whale page on staging and probably just disabled the plugin.


Thanks to Twitter's chief architect for giving me a good laugh :)


Wouldn't a working service be more interesting than this thing? Or are they expecting us to fix their code?


Too bad we can't run 'explain plan' on them :)


Explain Plan? Isn't that oracle? I ask because there is an "EXPLN" link that gives you what I think you're asking for.


Saw that after I posted that, thanks.


postgres has something similar. Pretty nice for benchmarking and optimizing queries.


Postgres: EXPLAIN ANALYZE.


In MySQL it's just "EXPLAIN <YOUR QUERY>".


Before being acuired by Google, does anyone know Jaiku's uptime and other stats? Were they too this bad as Twitter?


uptime was much better, and reaction was much quicker and to the point. with recent switchover to Google infra they are almost non-stop -- only Google is not that stupid to pay for your messaging ;)


Yeesh. This does not seem safe.

At least maybe (big maybe) some good will come of this?


Mmmm... this should be juicy hacker fun for a few days!


Days? I think we're already into negative minutes...


I wonder how many minutes to techcrunching...


[dead]


Nachiket, That should have been "Hello Babloo"...

Anyway... I didnt understand that... "Kolega to bolega". What does it mean? Open the bottle and then it will talk?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: