Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but what happens if your JSON data size grows so large that it can't fit on a single machine? Multi-master replication or sharding is a terrible pain in any RDBMS (at least according to my research and trials).


At the end of 2013, Stack Overflow worked on one SQL server (plus a redis server for caching). The rest of Stack Exchange runs on another SQL server.[0]

For the most part, for most projects, worrying about multi-master replication is going to be pointless. You can always put some data in a distributed K/V (or document) store and point to that from your SQL if you need to.

[0] http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-s...


What I meant was a data-size that was too large for a machine. Adding arbitrarily large JSON to your table could expand the data-size to be too big for one machine, or even too big for block storage. Plus you might not want it all in block storage. My point is that an RDBMS can be better served storing the relational data alone with a separate DB Engine for the potentially massive JSON data.


How many use cases really exceed a single, multi-terabyte machine? Your argument is true for any data type and format not just JSON. If you have more than a few terabytes then you have "big data" and most solutions, including Mongo, probably won't work.


You're storing exactly the same data whatever you're storing it in. The point is that there's rather few use cases where your primary database is going to be more than a few terabytes, which is usually easy to handle with a single machine + some caching. I pointed to Stack Overflow as an example of a large site which still manages to keep its entire database on one machine.


Why does the format the data is stored in matter more than the data itself? JSON or columns in rows - it's not fundamentally different.


It has nothing to do with the format. I'm using JSON as an example of something that could be large. I'm only talking about separating the "could become massive" columns/data/whatever from the "small, relational data".


Sure, but how many people genuinely have data that big?


Well one thing we store is HTML content and "MS Word" like document data. We also store hundreds of revisions for all of those documents. I wouldn't want to use an RDMBS for this because (a) its not relational, but also (b) backup/replication/load distribution would be too painful. A system like CouchDB can be spread out over n-machines, any of them write-capable.


This is exactly what SharePoint does on top of boring, battle tested SQL Server.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: