A question about printf strangeness
September 5th, 2009
| Tags: C++
Someone recently presented this program on a forum:
int main() { signed char x = 0xaa; printf("%x", x); return 0; }
They asked why this program produces the output “ffffffaa”, but if they changed the type of ‘x’ to an unsigned char, the program produces the output “aa”.
Here is my answer:
- The two’s complement system used by probably all of today’s systems to represent negative numbers represents negative numbers as if they were subtracted from zero, except that they “wrap around”. Therefore, -1 in two’s complement (assuming a 32-bit integer) is 0xffffffff, since it is 0 minus 1, wrapping around to the highest bit pattern of the data type. In fact, the reason two’s complement is generally used is that it makes addition and subtraction have no special cases, unlike one’s complement or using a separate sign bit.
- The sign is not shown in hexadecimal, typically. Therefore 0×1 is 1, and 0xffffffff is a 32-bit -1 (as opposed to writing -0×1), and 0xff is an 8-bit -1. Note that an 8-bit -1 (0xff) is the same thing as a 32-bit signed 255 (also 0xff) – you need to know the size of the type to interpret negative numbers.
- Because an 8-bit -1 is the same thing as a 32-bit 255, when the compiler needs to “promote” a value to a larger data type, it can’t simply copy the low bits to the larger data type’s low bits and zero the upper bits, because this would actually change the value! Instead, it must use sign extension. Therefore, when the compiler widens a signed integer which has the high bit set, it fills the extra high bits with 1s, and when it widens a signed integer which has the high bit clear, it fills the extra high bits with 0s.
- Widening from an unsigned type is easier – the compiler always zero-fills the extra high bits. Without signed-ness, there is no problem, since an unsigned 8-bit 0xff is the same value as an unsigned 32-bit 0xff.
- C and C++ have rules for promoting arguments to functions. Arguments smaller than the machine’s word size (which is int) should be promoted to int before passing to a function. So any function which takes a char or a short, is actually being passed an int, even though the function body itself will operate on it as a char or short.
- printf() doesn’t actually know the type of the arguments – it deduces them from the format string. Formats like “%x” and “%d” expect an int, so they extract an “int” from the stack. Since narrower integral and character types are promoted to int before being passed, you can use format specifiers like “%d” with char and short types, and they will usually work as expected.
This all means that the generated machine code for the program does something like this:
1. load a value 0xaa into an 8-bit register
2. sign extend 0xaa into a 32-bit register, making it 0xffffffaa in the new register.
3. push a pointer to the format string “%x” onto the stack
4. push the 0xffffffaa onto the stack
5. call printf
6. pop printf’s parameters off the stack
Leave a comment
| Trackback
