How Citus Executes Distributed Transactions on Postgres

mattb314 · on Nov 22, 2017

Does anyone know why they use deadlock detection rather than deadlock avoidance (the simplest avoidance algorithm being simply to always acquire locks in the same order)? I can't find a good reference right now, but I was under the impression that deadlock detection, especially distributed detection, is extremely costly and basically precludes high contention workloads, but maybe I'm confused here.

The only two reasons I could think of were 1. You can't know all the locks you want in advance (because the postgres api doesn't require it), or 2. Citus doesn't expect customers to have high contention workloads where deadlocks are likely (or at least they thing they can be reduced primarily to single node deadlocks).

Am I overestimating the cost of deadlock detection?

ozgune · on Nov 22, 2017

It's actually both of these reasons, with reason (1) being the primary one.

PostgreSQL allows interactive transaction blocks (meaning you don't have to submit all commands within a transaction block upfront). Citus extends Postgres and needs to provide the same semantics.

That said, we regularly evaluate different techniques on distributed deadlock detection and avoidance. We have an FAQ that discusses deadlock avoidance methods in the context of Postgres. In the link below, the last question on "How can a distributed database prevent distributed deadlocks?" provides more detail:

https://www.citusdata.com/blog/2017/08/31/databases-and-dist...

infogulch · on Nov 22, 2017

Does (can?) Citus optimize the case where the whole batch is visible up front?

mslot · on Nov 23, 2017

Not practically, except when it is a single-statement transaction.

Where deadlock prevention becomes useful/necessary is single UPDATE/DELETE statements that span across multiple nodes. When those are executed in parallel they could deadlock against each other due to non-deterministic execution order.

Citus currently uses predicate locks to avoid these deadlocks, but there's probably some room for improvement there. On the other hand, for Citus use cases UPDATE and DELETE across shards are mainly batch operations (e.g., delete old data), so there's not a strong need for it yet.

advisedwang · on Nov 22, 2017

Reason (1) is very likely. User code can send UPDATEs within a transaction in any order. While processing an early UPDATE, you can't know what later UPDATEs are coming, so can't get those locks.

mslot · on Nov 23, 2017

> Am I overestimating the cost of deadlock detection?

I think you might have them backwards. Deadlock avoidance becomes costly at high contention workloads (due to aggressive locks, retry loops). Deadlock detection only becomes costly at deadlocking workloads.

The only cost to the application is a very occasional deadlock error and a trace in the log on how the error occurred. You could then use this information to change the order in which you perform your transaction (e.g. always perform UPDATEs in account ID order), to make sure that the deadlock does not occur again.

Other than that, the CPU/network cost of deadlock detection is negligible, and it allows every transaction to run with as much concurrency as the system can attain.

mattb314 · on Nov 23, 2017

To be honest, I'm a little confused by your answer. Aren't high contention and deadlocking workloads essentially the same thing to a deadlock-detection db? I can imagine some patterns where that wouldn't necessarily be true, but they seem to go together in general. It also seems like your suggestion in the case of deadlocks is to modify your transactions to acquire locks in the same order, which is exactly what (many) avoidance systems do for you in the first place. What's the advantage of putting this burden on the user?

mslot · on Nov 24, 2017

> Aren't high contention and deadlocking workloads essentially the same thing to a deadlock-detection db?

High contention means that there are many concurrent transactions modifying the same rows. Deadlocking means transactions modify the same rows in opposite orders. The former definitely does not imply the latter. Most workloads are simple enough to never have any deadlocks.

> which is exactly what (many) avoidance systems do for you in the first place.

This might work for very simple database models which require the user to specify all the keys that are going to be modified in the transaction upfront (NoSQL). In PostgreSQL, a transaction can contain any number of statements in any order, results are returned at the end of each statement - before the next statement is known, and modifications can have complex, non-deterministic WHERE clauses.

The database has no way of knowing which rows will be affected by an UPDATE and DELETE in advance, and therefore it does not which row locks it needs to take in advance. Even if it first read and sort all the data, it still doesn't know what the next statement is going to be. The only way in which the database could avoid deadlocks is by taking very aggressive predicate locks on the table, which is obviously bad for concurrency. Another possibility is to use wound-wait, but this introduces retry delays when there is contention even when there are no deadlocks. Moreover, because results (e.g. number of rows modified) are returned after each statement, a transaction cannot restart from the beginning (since it might modify different rows the second time), so in practice wound-wait would have to unnecessarily abort transactions if you were using it to avoid deadlocks between multi-statement transactions.

It is generally much, much, much more efficient to just let UPDATEs run without predicate locks or retry delays. They will only get into an occasional deadlock if the user has transaction blocks that make changes in a different order. In those cases you can have deadlock detection save the day, and tell the user how to avoid running into it again.

> What's the advantage of putting this burden on the user?

If a transaction has multiple statements that modify the same rows in different order then the two choices are: reduce concurrency, or, detect deadlocks as they occur. The former is a severe pain that all transactions have to pay, the latter is an occasional, preventable pain that only deadlocking transactions pay. Clearly, the former would be the higher burden.

crunchyyoda · on Nov 23, 2017

> the simplest avoidance algorithm being simply to always acquire locks in the same order.

Sounds simple, but I remember being in the weeds once or twice with a custom application deadlocking (and not enjoying it much).

With ORMs, I guess they can take the reins in terms of which resources they update and in what order, but I'm curious what other people do in the absence of that? Manually worrying about the order of which tables get updated in all of your service calls seems like it'd get hard the larger the app becomes or the more developers involved.

jinqueeny · on Nov 23, 2017

TiDB and CockroachDB also support distributed transactions. The transaction model in TiDB (https://github.com/pingcap/tidb) is inspired by Google Percolator, It’s mainly a two-phase commit protocol with some optimizations. More info on the blog: https://github.com/pingcap/blog/blob/master/_posts/2016-10-1...

curiousDog · on Nov 22, 2017

Sounds like they're using the idea described in this Jim Gray, Lamport paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159...

Azure SQL DB has had the same HA coordinator built in for a couple of years now in their distributed transactions. And so has Spanner.

platform · on Nov 23, 2017

and so has MarkLogic, I think since 2003 (may be a bit later)

---

http://cdn.marklogic.com/wp-content/uploads/2016/09/ACID-Tra...

IamHWengineer · on Nov 22, 2017

So Citus is now ACID compliant? Do you guys have a whitepaper where you describe how you proved ACID compliance?

Thanks

conqrr · on Nov 22, 2017

Are you guys hiring? Wondering what kind of experience you require. I am a backend engineer with experience using these tools but not much building them but would love to work on such things.

ozgune · on Nov 23, 2017

Yes, we certainly are: https://www.citusdata.com/jobs

When you're sharing your resume, please feel free to mention this thread.