Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Andy Pavlo CMU Crew has a great paper summarizing, comparing, and benchmarking different MVCC implementation techniques -- including garbage collection -- here: https://db.cs.cmu.edu/papers/2017/p781-wu.pdf


> YCSB: We modified the YCSB [14] benchmark ... The database contains a single table with 10 million tuples, each with one 64-bit primary key and 10 64-bit integer attributes

I wonder why they chose such an unrepresentative dataset size, ~84MB.


> I wonder why they chose such an unrepresentative dataset size, ~84MB.

Yo. This is me. The point of this paper was to evaluate the core MVCC algorithms under different contention scenarios. So you strip out all the internal features of the system that can influence performance that are unrelated to the experiments. This ensures that you can make it a true apples-to-apples comparison. And since everything is in memory, you don't need a large data set.

IIRC, we ran ran the same experiments with 100m tuples instead of 10m and it did not change the results.


Thanks for doing the sensitivity analysis. I struggle with that issue a lot -- how can I shorten my benchmark duration (use less data, run for less time) so I can get more work done.


Hm. Dr Pavlo comments here sometimes, maybe he can provide some illumination.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: