(Caveat: this is a contentious opinion on Hacker News)
(Caveat #2: I'm not terribly familiar with such low level programming. Take anything I say with a grain of salt.)
It's my opinion that IO Completion Ports on Windows are superior to the approach taken by *nix and BSDs.
Instead of having the usermode application sleep and wake up, do some checks, etc., the application provides an entry point when an event occurs or data is available. Essentially, a callback for the kernel to use. The kernel then jumps directly to this, and can manage the threads involved, using a thread pool to balance requests. This gives much better utilization of threads than with poll/epoll/kqueue, but does place some other constraints on how the code is written.
The fundamental difference is that the Unix-kin is a readiness based model. They wake up a thread to tell it that it is ready to read an event. IOCP on Windows is a completion based model, and wakes up threads with the data (or error) already present in a data structure provided to the thread.
> a completion based model, and wakes up threads with the data (or error)
Which means that in this model you have to allocate and provide a buffer for that data long before the kernel is going to fill it. It's going to just sit there waiting, wasting memory. While in unix model you don't have to allocate a buffer until you know there is some data to copy from the kernel, which is easier for the user and much more efficient.
Completion model makes sense if your entire networking stack lives in userspace and you can allocate memory on the lowest layer, but pass it as a reference all the way up. Or if you at least can do syscall batching, to make operating on very small buffers efficient.
This exact same problem is also present in the concurrency model provided by Go. To read from the network you need to provide a buffer to read into, which means that a buffer has to be allocated for every goroutine (instead of just the goroutines that actually have data to read).
Hard to answer. Varnish and Nginx are similar in performance, but not similar at all in implementation. So, which one is better?
Perhaps the most comprehensive single place you can look to see different approaches and their respective strengths/weaknesses: http://www.kegel.com/c10k.html
This is far too simple of an answer. There are many different reasons to use Linux and not FreeBSD. In this one case FreeBSD might be better than Linux (I offer no opinion), but there are probably other more substantial reasons why one can't easily switch from Linux to FreeBSD.