For the past 10 years or so, I've been using an alternative to printf() that I lifted out of Hanson's _C Interfaces and Implementations_; in my tree, it's "fmt.c", and includes the family of fmt_<star> functions. They're just string processing code, and so are trivially portable to WinAPI, OS X, Linux, and FreeBSD.
This is so much of a win that I don't understand why everyone doesn't do it:
* Cross-platform counted/allocated/concatenated string semantics that with no annoying nits from one platform to another.
* "Native" support for printing IP addresses (%i), which gets rid of inet_ntoa and ilk, and binary.
* A registration system for adding new format codes, which is like being able to create Object#inspect functions in C code.
All of which is a roundabout way of saying I agree that we don't need to improve printf/scanf; we need to burn them with fire.
I'm going to tend to believe PHK if he thinks explicit structure packing is a win, noting at the same time how unlikely it is that any implementation of it is going to be a performance bottleneck compared with I/O.
> I'm going to tend to believe PHK if he thinks explicit structure packing is a win, noting at the same time how unlikely it is that any implementation of it is going to be a performance bottleneck compared with I/O.
I've spent a lot of my life writing parsers for various network formats, and one thing that I can say with authority is that at both my current company (Google) and at my previous company (Amazon) the CPU cost of parsing bytes off the network was noticeable enough to spend significant resources optimizing.
I haven't done benchmarks myself about eg. bitvectors, but I'm pretty sure I've heard that the performance of packed bitvectors is noticeably slower than non-packed. I also think the cost would be comparable to 64-bit math on 32-bit processors (ie. an overhead of 3-5 instructions per operation), which is a non-trivial cost.
In the sense that your parsing strategy influences your I/O strategy and, in particular, may incur extra copies, I buy this.
The idea that Amazon has cycle-optimized network parsing code, and that they did it for a significant practical benefit... I have no reason to doubt you, but I'd like to hear more.
I've done a fair bit of high performance network code (not for Amazon or Google, but, for instance, for code watching most of the Internet's tier-1 backbone networks on a flow-by-flow basis) and I'm not sure I could have won much by counting the cycles it took me to pull (say) an NBO integer out of a packet.
> The idea that Amazon has cycle-optimized network parsing code, and that they did it for a significant practical benefit... I have no reason to doubt you, but I'd like to hear more.
I can speak better to Google, since it's my more recent experience. Google's internal data format is Protocol Buffers (and all the code is open-sourced, as you probably know). The C++ code that is generated to parse Protocol Buffers is fast (on the order of hundreds of MB/s) as a result of a lot of optimization. This has reached a rough ceiling of what I believe is possible with this approach (generated C++). Even so, Protocol Buffer parsing code shows up in company-wide CPU profiles, and certain teams in particular have performance issues where Protocol Buffer parsing is a significant concern for them.
To address these issues, I wrote a Protocol Buffer parser that improves performance in two ways:
- it is an event-based parser (like SAX) instead of the protobuf generated classes which are a more DOM-like approach (always parsing into a tree of data structures). With my parser you bind fields to callbacks, and you can parse into any data structure (or do pure stream processing).
- I wrote a JIT compiler that can translate a Protocol Buffer schema directly into x86-64 machine code that parses that schema. Without the intermediate C++ step, I can generate better machine code than the C++ compiler does. In an apples-to-apples test, I beat the generated C++ by 10-40%. If you do more pure stream parsing the win is even greater.
+1 for Hanson's book in general (which I think I might have initially read on your recommendation anyway!). I don't do a huge amount of C programming these days, but it was a huge eye-opener.
The reason why struct packing would be a win is probably not performance, but code readability and less bugs.
Making be/le/native conversion a job for the programmer is not only error-prone and a waste of time.
The compiler could safely optimize the byte-swizzles away on non-arithmetic operations, whereas most programmers tend to covert everything before they start working on it.
Thank you for the excellent book suggestion. I've only been writing in C for a few years now. I've been grappling for a better way to manage and abstract some of our utilities in our poorly written legacy application. Every linked list is custom, and every array malloced and freed on the fly -- it's so fragile that sometimes I feel paralyzed. It's been terribly wrong for a long time but I've personally lacked the experience and formal training to make it better through libraries and wrappers. About a year ago I picked up "Mastering Algorithms with C" and it was very helpful at teaching complex algorithms but never really helped me abstract our code base in to something portable and reusable.
I'm certain this book will help me tremendously. :P
This is so much of a win that I don't understand why everyone doesn't do it:
* Cross-platform counted/allocated/concatenated string semantics that with no annoying nits from one platform to another.
* "Native" support for printing IP addresses (%i), which gets rid of inet_ntoa and ilk, and binary.
* A registration system for adding new format codes, which is like being able to create Object#inspect functions in C code.
All of which is a roundabout way of saying I agree that we don't need to improve printf/scanf; we need to burn them with fire.
I'm going to tend to believe PHK if he thinks explicit structure packing is a win, noting at the same time how unlikely it is that any implementation of it is going to be a performance bottleneck compared with I/O.