I think if you use a lot of database features, sqlite isn’t really powerful enou...

tptacek · on Jan 19, 2023

Which features would those be?

luckylion · on Jan 19, 2023

Concurrent inserts. There are applications where you don't care about that, but that excludes a lot of web-apps, or things running with multiple processes.

If that changes, I'd agree with your point, but currently that's a big constraint.

cldellow · on Jan 19, 2023

Maybe I'm using SQLite wrong, but I have a website scraper. It's written in python and so uses processes for concurrency to avoid GIL issues.

On my laptop, 16 processes accessing the same SQLite db file can do something like 5,000 write transactions/second. That seems like plenty for many purposes?

A caveat: it's in WAL mode, with synchronous mode set to NORMAL. This means that it's not fsyncing the WAL on each individual transaction commit. Instead, they get synced in batches roughly every 200ms. If you pull the power plug on the server, it may lose some amount of writes that happened in that time. Terminating the process is fine.

Yeah, yeah, we're giving up some amount of durability. In practice, not very much (200ms of transactions) and only very rarely (when the server dies). That seems like a fine trade-off for many backend processes and many web apps. You could tune this, as well: for the transactions that you _really_ want durability on, run a checkpoint before rendering your response to the user.

tptacek · on Jan 19, 2023

How exactly does that exclude a lot of web apps? Most web apps are very read heavy. And SQLite serializes writes to individual databases, not across all of them. It's trivial in SQLite to query across multiple databases.

Further, some form of write serialization is common to most database clusters: the most common cluster architecture is single write leader, multiple write replicas, which have the effect of serializing writes as well. Those clusters are all also taking advantage of the read-heaviness of applications.

syastrov · on Jan 19, 2023

I did an experiment of switching an app which uses Postgres to use SQLite. I ran into issues where I had a long running transaction in one process doing writes and another process doing reads and writes. As soon as the second process tried to do any writes, I would often get the queries running into locks and aborting. Otherwise, it worked great for reads and sped up the app a lot as there were lots of N+1 queries issues.

This is also after playing with settings about WAL mode.

It would not be a viable solution to split these tables into separate databases to avoid the concurrency issue. Also, most of the issues involve writes to the same tables.

Is there honestly any way to get that working well with current SQLite or do we need to wait for hctree or begin concurrent to mature?

I would be concerned that to get this working in its current state would require a large refactoring of the app which would require a lot of effort and would introduce limitations that make coding more difficult.

luckylion · on Jan 19, 2023

Many web apps also write a lot. Are you suggesting to have individual databases per process and then read from all of them and aggregate the data you want in memory? That would sound too much like I'm building a database, which I don't want to do.

I'm not talking about clusters and replication, just a single database on a single server, like Wordpress for example. It can handle large amounts of concurrent users if you're using InnoDB (which has been the default since how long?) and nobody will even notice that there's anyone else because it's not locking the table (or the whole database) to write, and you don't have to wait or deal with frequent busy timeouts.

simonw · on Jan 19, 2023

Most SQLite writes can be measured in microseconds. If your app is doing less than 1,000 writes a second you probably won't even notice that SQLite is serializing your writes rather than applying them concurrently.

luckylion · on Jan 19, 2023

I must've been doing something wrong then, I've regularly hit longer locks as soon as the DB grew, to the point that it was not usable for me with multiple processes writing in the same DB because it was constantly locked by one.

Having one writer and multiple readers was fine, but even having a dozen writers who update something a few times per second quickly led to the DB becoming the bottleneck. I tried different modes, but saw no difference so I just gave up on using it concurrently.

tptacek · on Jan 19, 2023

I spent a bunch of time wrestling with this last year (and into this one), and the general solution I settled on was to serialize writes through a single writer; multiple reader threads and multiple writer threads (or processes) is not a fun configuration to put SQLite in.

There were 500MB-40G databases, probably ~1.5 updates per second at peak load?

sqlcommando · on Jan 19, 2023

I think "web app" is too broad a category to have this kind of discussion over.. for a lot of web apps 1.5 updates per second could be considered a reasonable limit. Many people work on web apps that do 100x that and don't consider themselves to be working at any particular level of scale. 1.5K updates/sec is not uncommon either, in different settings -- high-traffic internal web apps can hit that in a single company, and the people working in those contexts don't consider that too special either, not to mention public facing websites like GH where many people start their careers without having seen anything different.

tptacek · on Jan 19, 2023

I'm not benchmarking SQLite, I'm relating the environment in which I was tuning SQLite concurrency, because I feel like it doesn't mean much to say "here's what worked for me" without at least some basic facts about what I was working on. SQLite itself can do tens of thousands of writes per second on commodity hardware without breaking a sweat.

ec109685 · on Jan 19, 2023

There's more to most applications than interactions with the database. Once your web app exceeds the capacity of a single machine, is SQLite a good choice for state?

russell_h · on Jan 19, 2023

I think he’s saying to shard writes by tenant, or something else that makes sense for your app - so you can route reads to an individual shard. Same as you’d do to shard any other database, if a little more DIY.

Speaking of Wordpress though, that’s a pretty great example of a web app that typically handles almost exclusively reads. I’ll bet almost any Wordpress instance would work great on SQLite.

tptacek · on Jan 19, 2023

WordPress is the archetypical application that SQLite would kill for, but for historical reasons, WordPress has always wanted MySQL. It's funny, because there's a whole movement towards static sites that is, in part, a reaction to how bad the n-tier architecture behind WordPress is!

luckylion · on Jan 19, 2023

They're working on supporting SQLite as a backend: https://make.wordpress.org/core/2022/12/20/help-us-test-the-...

tptacek · on Jan 19, 2023

It's crazy that this is only happening as of 2022, right? They could have kept a whole generation of web sites on WordPress by eliminating the MySQL dependency, and gained a performance and deployability advantage in the process. Even multitenant WordPress would have come out great with SQLite. I'm waiting for someone closer to WordPress to tell me how wrong I am about this, because I have to be missing something.

luckylion · on Jan 19, 2023

I'm not sure that it'll bring a lot of sites back to or keep them on WP, but it'll be great for testing. Much of it is always integration testing with WP because of how filters and actions work your code is rarely ever isolated, SQLite will make that much easier and faster.

It's also really not a big switch for the most part. I'm not aware of that many MySQL-specific things happening in WP. Things like fulltext indexes aren't in core, so it's really just using a different driver to get the basics working. I'm sure it would break a lot of plugins, but that's fine, they have platform requirements anyhow, adding "does work with sqlite" isn't a huge step and can likely be automated to a high degree (if the plugin never uses wpdb->query, ->get_results etc, it's compatible).

ec109685 · on Jan 19, 2023

And to be clear, this would be a single instance of the WordPress php application that all traffic would route to, and it would access the SQLite database on that node?

If that node failed, then the php application and database would be restored to a new machine and traffic would resume?

fierro · on Jan 19, 2023

won't long running write transactions block eachother? Not all apps can avoid the need for these kinds of transactions.