The Andy Pavlo CMU Crew has a great paper summarizing, comparing, and benchmarki...

menaerus · on July 7, 2023

> YCSB: We modified the YCSB [14] benchmark ... The database contains a single table with 10 million tuples, each with one 64-bit primary key and 10 64-bit integer attributes

I wonder why they chose such an unrepresentative dataset size, ~84MB.

apavlo · on July 7, 2023

> I wonder why they chose such an unrepresentative dataset size, ~84MB.

Yo. This is me. The point of this paper was to evaluate the core MVCC algorithms under different contention scenarios. So you strip out all the internal features of the system that can influence performance that are unrelated to the experiments. This ensures that you can make it a true apples-to-apples comparison. And since everything is in memory, you don't need a large data set.

IIRC, we ran ran the same experiments with 100m tuples instead of 10m and it did not change the results.

mdcallag · on July 8, 2023

Thanks for doing the sensitivity analysis. I struggle with that issue a lot -- how can I shorten my benchmark duration (use less data, run for less time) so I can get more work done.

cmrdporcupine · on July 7, 2023

Hm. Dr Pavlo comments here sometimes, maybe he can provide some illumination.