Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Every significant element of TCP, from its stream orientation to its requirement of in-order packet delivery, is wrong for the datacenter. It is time to recognize that TCP’s problems are too fundamental and interrelated to be fixed;

This seems like a pretty bad way to start a paper. It throws an extremely strong assumption into the room without backing it up by data.

Having worked in the Cloud/Datacenter space for lots of years, I really have a hard time describing any situations where TCP limited the performance of applications and not anything else. It doesn't matter that much if a slightly different networking stack could lower RTT for from 50us to 10us if the P99.9 latency of the overall stack is determined by garbage collection, a process being CPU starved, or being blocked on disk IO or another major fault. Those things can all be in the 100ms to 1s region, and are the real common sources of latency in distributed systems.

The main TCP latency problem that I've experienced over the years is SYN or SYN-ACK packets getting dropped to due overloaded links or CPU starvation, and the retry from the client only happening after 1s. Annoying, but one can work around a bit racing multiple connections. Besides the TCP handshake time there's also another round trip for setting up a TLS connection - sure. But both of those latencies are in practice worked around with connection pooling.

Speaking of TLS - I can't find a single reference to it in the paper. And talking about datacenter networking without mentioning TLS sees to miss something. Pretty much every security team in a bigger company will push on TLS by default - even if the datacenter is deemed a trusted zone. How does it matter if the TCP connection state is 2kB and the HOMA state is less, if my real TCP connection state is anyway at least another 32kB for TLS buffers plus probably megabytes for send and receive buffers plus whatever the application needs to buffer.

Last thing I would like to mention is that datacenter workloads are not "just messaging", and the boundary from messsaging to streaming is pretty fluid. What happens if the RPC calls fetches the content of a 10MB file? Is that still a message? If we treat it as such, it means that the stack needs to buffer the full messsage in memory at once, whereas with TCP (and e.g. HTTP on top) it can be streamed, with only the send buffer sizes being in memory. What about if it is 1MB? We could certainly argue that some applications just transfer a few bytes here and there, but I'm seriously not sure if I would label those as the majority of datacenter applications. And with the typical practice of placing a lot of metadata into each RPC call (> 5kB auth headers, logging data, etc) even the smallest RPC calls are not that small anyore.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: