Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, yes and no.

With spinning rust you have to wait for the sector you want to read to rotate underneath the read head. For a fast 10.000 RPM drive, a single rotation takes 6 milliseconds. This means that for random access the average latency is going to be 3 milliseconds - and even that's ignoring the need to move the read head between different tracks! Sequential data doesn't suffer from this, because it'll be passing underneath the read head in the exact order you want - you can even take the track switching time into account to make this even better.

SSDs have a different problem. Due to the way NAND is physically constructed it is only possible to read a single page at a time, and accessing a single page has a latency of a few nanoseconds. This immediately places a lower limit on the random read access time. However, SSDs allow you to send read commands which span many pages, allowing the SSD to reorder the reads in the most optimal way, and do multiple reads in parallel. This means that you only have to pay the random access penalty once - not to mention that you have to issue way fewer commands to the SSD.

SSDs try to make this somewhat better by having a very deep command queue: you can issue literally thousands of random reads at once, and the SSD will reorder them for faster execution. Unfortunately this doesn't gain you a lot if your random reads have dependencies, such as when traversing a tree structure, and you are still wasting a lot of effort reading entire pages when you only need a few bytes.



Interesting, thanks! So it sounds like it's not so much "random" I/O that's slow, but rather "unbatched" I/O or something like that?

Curious to hear your thoughts on this thread if you have time to share: https://news.ycombinator.com/item?id=33752870


> Unfortunately this doesn't gain you a lot if your random reads have dependencies, such as when traversing a tree structure,

So, this mean Btrees suffer? Which could be the most optimal layout for a database storage where only SSD matters?

I'm working in one that is just WAL-only and scanning all in each operation (for now!) and wanna see what I can do for improve the situation.


You really need an NVME interface to the SSD, though. SATA3 is the bottleneck for SATA SSDs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: