Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Minor point: sizeof(char) was defined to be 1 even in the original C89. "char" has always been a misspelling of "byte" in C.


sizeof(char) has to be 1, but char doesn't have to be 1 byte in size. It's more correct to say that sizeof(foo) returns a result relative to sizeof(char).


> but char doesn't have to be 1 byte in size

It is, but not required to be an octect. Quoting from the standard (n1570, 3.6):

byte ==== addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

Later in a comment (page 44 in pdf, note 49): A byte contains CHAR_BIT bits


char doesn't have to be 8 bits in size, true, but C basically doesn't give you any way to address memory smaller than a char.


> C basically doesn't give you any way to address memory smaller than a char

It does, via bit-fields.


Or via the bitwise operators. Still, it is not possible to take the address of a bitfield.


The difference may be 100% semantics, but when I am using a bit field, I am addressing the particular bits I am interested in via a name, and when I am shifting and masking bits, I am directly manipulating a piece of memory until its value represents the bits that I am interested in.


I prefer to think of "byte" as unsigned. Although I've never quite figured out why char is signed on most implementations, it would most naturally be unsigned too.


I suspect because signed has more undefined behaviour. C compilers will always choose the option that gives them most optimization freedom since that's what they're judged on.


What optimization opportunities does it give?


Add 128 to a signed char and the compiler is free to assume it is zero/false (because undefined behavior) OR assume the value is always greater than 127 (because undefined behavior). Or if it compiles it into machine code, the result may depend on register width since it may or may not store it back into memory. Resulting in a value either larger than 127 or mod 128 depending on register pressure since the compiler isn't obligated to AND 0xFF because Undefined Behavior.


> Add 128 to a signed char and the compiler is free to assume it is zero/false

To be honest, I hope compilers don't do such things. I would vastly prefer to see it run Tower of Hanoi simulation in Emacs at this point.

But evading bound-checking of 8bit math done in 32bit registers is totally reasonable (by the standard of usual UB optimizations), thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: