The reason I claim this would be simple is that PostgreSQL is *almost* already d...

krosaen · on Aug 29, 2012

Thanks for taking the time to elaborate, very interesting. I wonder if the sql db vendors or open source projects will take the next step to make querying against a transaction ID possible given the underlying implementation details bring it pretty close.

I also see Rich has made some interesting points elsewhere in this thread about consistent views being available outside of transactions and without need for coordination (within datomic) - seems more appropriate to comment directly there though.

Overall I think it's important to understand these nuances, and not view datomic as some revolutionary leap, even if I am excited about the project. I appreciate your insight into the power already within sql db engines.

saurik · on Aug 29, 2012

I am not certain whether your response comes from a reading of my comment before or after the paragraph I added that started with "for a more complete implementation", but if it was from before I encourage you to read that section: the ability to do the query is pretty much already there due to the xmin/xmax fields that PostgreSQL is already reifying.

saurik · on Aug 29, 2012

(edit: Apparently, nearly an hour ago, jeltz pointed out that PostgreSQL 9.2 actually has implemented nearly this identical functionality through the usage of exported snapshots, so I recommend people go read that comment and the linked documentation. However, my comment is still an example of the functionality working.)

(edit: Ah, but the feature as implemented actually saves a file to disk and thereby has a lot of server-side state: the way I've gone ahead and implemented it does not have this complexity; I simply take a single integer and store nothing on the server.)

http://news.ycombinator.com/item?id=4448472

> I wonder if the sql db vendors or open source projects will take the next step to make querying against a transaction ID possible given the underlying implementation details bring it pretty close.

For the hell of it, I just went ahead and implemented the "backdate a transaction" feature; I didn't solve the vacuum guarantees problem, however: I only made it so that a transaction can be backdated to another point in time.

To demonstrate, I will start with a very similar sequence of events to before. However, I am going to instead use txid_current_snapshot(), which returns the range (and an exception set that will be unused for this example) of transaction identifiers that are valid.

Connection 1:

    demo=# create table q (data int);
    CREATE TABLE
    demo=# begin; select txid_current_snapshot();
    BEGIN
    710:710:
    demo=# insert into q (data) values (0); commit;
    INSERT 0 1
    COMMIT
    demo=# begin; select txid_current_snapshot();
    BEGIN
    711:711:
    demo=# select xmin, xmax, data from q;
    710|0|0

Connection 2:

    demo=# begin; select txid_current_snapshot();
    BEGIN
    711:711:
    demo=# update q set data = 1; commit;
    UPDATE 1
    COMMIT
    demo=# select xmin, xmax, data from q;
    711|0|1

Connection 1:

    demo=# select xmin, xmax, data from q;
    710|711|0
    demo=# begin; select txid_current_snapshot();
    BEGIN
    712:712:
    demo=# select xmin, xmax, data from q;
    711|0|1

So far, this is the same scenario as before: I have two connections that are seeing different visibility to the same data, based on these snapshots. Now, however, I'd like to "go back in time": I want our first connection to be able to use the same basis for its consistency that we were using in the previous transaction.

Connection 1:

    demo=# set snapshot_txid = 711;
    SET
    demo=# select txid_current_snapshot();
    711:711:
    demo=# select xmin, xmax, data from q;
    710|711|0

This new variable, snapshot_txid, is something I created: it gets the current transaction's active snapshot and modifies it to be a range from that transaction id to that same id (I think a better version of this would take the exact same string value that is returned by txid_current_snapshot()).

From that previous basis, the row with the value 0 is visible, not the row with the value 1. I can, of course, go back to the future snapshot if I wish, in order to view the new row. (I am not yet certain what this will do to things writing to the database; this might actually be sufficient, however I feel like I might need to either mess with more things or mark the transaction read-only.)

Connection 1:

    demo=# set snapshot_txid = 712;
    SET
    demo=# select txid_current_snapshot();
    712:712:
    demo=# select xmin, xmax, data from q;
    711|0|1

campnic · on Aug 29, 2012

A couple questions about your example:

I noticed that the xmin changed from Connection 1 from 189018 -> 189028. Is that just a typo?

Is the concept of transaction in this regard a 'state of the entire db'? If a transaction included multiple modifications would they all get the same xmax? If so, I see this as a difference between the presentation and your example. The transaction is a modification to the entirety of the db and is a state of the db. In Hickey's presentation, he very clearly says that the expectation is that the transaction component of the datom is specific to an individual datom.

Since its been a while since I've worked on DBs, and even then I didn't know much, your demo has helped put it in perspective.

saurik · on Aug 29, 2012

Thank you so much for noticing that error. What happened is that I started doing it, and then made a mistake; I then redid the entire flow, but forgot I needed to re-copy/paste the first half as the transaction identifier would have changed. I have updated my comment to fix this.

Yes: the transaction is the state of the entire database. However, you can make your transactions as fine-grained as you wish: the reason to not do so, however, is that you are likely to end up with scenarios where you want to atomically roll-back many changes that other people using the database should not see in the case of a failure. You certainly then will at least want the ability to make multiple changes at once.

The tradeoff in doing so, however, is that you will need to make a new transaction, which will have a different basis. I agree: if you then have a need to be able to make tiny changes to individual items one at a time that need to be from a shared consistency basis and yet have no need to be atomic with respect to each other (which is the part I am going to be quite surprised by), then yes: you need the history query function to implement this.

I would be fascinated by a better understanding of that use case. Does he go into an explicit example of why that would be required later in the talk? (If so, I could try to translate that use case as best I can into a "traditional database" to see whether you really need that feature; if you do, it might be valuable to try to get something related to this design into PostgreSQL: I am starting to get a better understanding of the corner cases as I think more about it, and think I can come up with a proposal that wouldn't make this sound insane to the developers.)