For the past 10 years or so, I've been using an alternative to printf() that I l...

haberman · on Dec 21, 2011

> I'm going to tend to believe PHK if he thinks explicit structure packing is a win, noting at the same time how unlikely it is that any implementation of it is going to be a performance bottleneck compared with I/O.

I've spent a lot of my life writing parsers for various network formats, and one thing that I can say with authority is that at both my current company (Google) and at my previous company (Amazon) the CPU cost of parsing bytes off the network was noticeable enough to spend significant resources optimizing.

I haven't done benchmarks myself about eg. bitvectors, but I'm pretty sure I've heard that the performance of packed bitvectors is noticeably slower than non-packed. I also think the cost would be comparable to 64-bit math on 32-bit processors (ie. an overhead of 3-5 instructions per operation), which is a non-trivial cost.

tptacek · on Dec 21, 2011

In the sense that your parsing strategy influences your I/O strategy and, in particular, may incur extra copies, I buy this.

The idea that Amazon has cycle-optimized network parsing code, and that they did it for a significant practical benefit... I have no reason to doubt you, but I'd like to hear more.

I've done a fair bit of high performance network code (not for Amazon or Google, but, for instance, for code watching most of the Internet's tier-1 backbone networks on a flow-by-flow basis) and I'm not sure I could have won much by counting the cycles it took me to pull (say) an NBO integer out of a packet.

This stuff always makes me think about:

http://cr.yp.to/sarcasm/modest-proposal.txt

haberman · on Dec 21, 2011

> The idea that Amazon has cycle-optimized network parsing code, and that they did it for a significant practical benefit... I have no reason to doubt you, but I'd like to hear more.

I can speak better to Google, since it's my more recent experience. Google's internal data format is Protocol Buffers (and all the code is open-sourced, as you probably know). The C++ code that is generated to parse Protocol Buffers is fast (on the order of hundreds of MB/s) as a result of a lot of optimization. This has reached a rough ceiling of what I believe is possible with this approach (generated C++). Even so, Protocol Buffer parsing code shows up in company-wide CPU profiles, and certain teams in particular have performance issues where Protocol Buffer parsing is a significant concern for them.

To address these issues, I wrote a Protocol Buffer parser that improves performance in two ways:

- it is an event-based parser (like SAX) instead of the protobuf generated classes which are a more DOM-like approach (always parsing into a tree of data structures). With my parser you bind fields to callbacks, and you can parse into any data structure (or do pure stream processing).

- I wrote a JIT compiler that can translate a Protocol Buffer schema directly into x86-64 machine code that parses that schema. Without the intermediate C++ step, I can generate better machine code than the C++ compiler does. In an apples-to-apples test, I beat the generated C++ by 10-40%. If you do more pure stream parsing the win is even greater.

My protobuf work is open source: https://github.com/haberman/upb

> I'm not sure I could have won much by counting the cycles it took me to pull (say) an NBO integer out of a packet.

That's certainly different experience than mine. I don't know much about routers.

tptacek · on Dec 21, 2011

Bad-ass. I'm happy to be wrong if it solicits comments like this. :)

swah · on Dec 21, 2011

Is that on your 20%? I noticed you use Lua :)

mark_h · on Dec 21, 2011

+1 for Hanson's book in general (which I think I might have initially read on your recommendation anyway!). I don't do a huge amount of C programming these days, but it was a huge eye-opener.

phkamp · on Dec 21, 2011

The reason why struct packing would be a win is probably not performance, but code readability and less bugs.

Making be/le/native conversion a job for the programmer is not only error-prone and a waste of time.

The compiler could safely optimize the byte-swizzles away on non-arithmetic operations, whereas most programmers tend to covert everything before they start working on it.

__david__ · on Dec 21, 2011

Along those lines I wrote a wishful thinking, blue sky blog post a number of years ago: http://porkrind.org/missives/hardware-friendly-c-structures/

jetsnoc · on Dec 21, 2011

Thank you for the excellent book suggestion. I've only been writing in C for a few years now. I've been grappling for a better way to manage and abstract some of our utilities in our poorly written legacy application. Every linked list is custom, and every array malloced and freed on the fly -- it's so fragile that sometimes I feel paralyzed. It's been terribly wrong for a long time but I've personally lacked the experience and formal training to make it better through libraries and wrappers. About a year ago I picked up "Mastering Algorithms with C" and it was very helpful at teaching complex algorithms but never really helped me abstract our code base in to something portable and reusable.

I'm certain this book will help me tremendously. :P

tptacek · on Dec 21, 2011

It is a great, great, great book. Instantly and utterly useful.