Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, reading between the lines, the SYSTEM TABLESAMPLE algorithm appears to be biased toward small values (in bytes, or whatever the on-disk encoding of Postgres values is).

If you choose random data pages (of fixed size), then it can fit more rows of small size than of big size.

"In the query above, SYSTEM is the name of the chosen sampling algorithm. The SYSTEM algorithm chooses a set of pseudo-random data pages, and then returns all rows on those pages, and has the advantage in running in constant time regardless of the size of the table. PostgreSQL 9.5 will also ship with the BERNOULLI sampling method, which is more rigorously random, but will take longer the larger the table is."



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: