Valgrind and uninitialised reads debugging - take twoPosted by Michael Shuldman, Inferno Nettverk A/S, Norway on Mon Aug 11 14:51:40 MEST 2014
After the time I spent on documenting my previous approach to debugging uninitialised reads errors from Valgrind, I did not expect having to spend many hours once again understanding why Valgrind was printing the warning it did. It took Valgrind three years to prove me wrong.
==27040== Conditional jump or move depends on uninitialised value(s) ==27040== at 0x4A0B152: bcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==27040== by 0x4AAB3E: compareconfigs (shmemconfig.c:793) ==27040== by 0x4AE9FB: doconfigtest (shmemconfig.c:1114) ==27040== by 0x4B4F85: dotest (sockd.c:1639) ==27040== by 0x4AF141: main (sockd.c:189)
Line 793 of shmemconfig.c is this:
EQCHECK_PTR(a, b, internal.addrv, sizeof(*a->internal.addrv) * a->internal.addrc);
Basically what this does is a memcmp(3) of a->internal.addrv and b->internal.addrv. Since earlier in the code we had done the equivalent of copying a->internal.addrv over b->internal.addrv with memcpy(3), and we were now comparing the same amount of bytes we had previously copied from a to b, I did not immediately (or even within the first few hours) understand why Valgrind was complaining.
True, "addrv" is a struct with various members, and quite likely contains some padbytes, but a memcpy(3) would of course copy those padbytes too, since it was copying the whole of addrv, based on the size of the whole of addrv.
So "addrv" in object "a" and object "b" should be 100% identical, padbytes and all. Indeed, the only reason why EQCHECK_PTR() was called was to assert that "addrv" was the same in both objects.
But Valgrind complained.
After much debugging and experimenting with bzero(3) here and there, individual memcmp(3)'s, short and long memcmp(3)'s, I eventually understood the problem, and the reason for why Valgrind complained about "Conditional jump or move depends on uninitialised value(s)".
Yes, "addrv" in "a" and "b" is identical, and so are the padbytes that are not initialized in either object ("addrv" in "b" being an exact copy of "addrv" in "a"). And we never try to use the padbytes. And Valgrind is clever enough to, as they say in the faq, ignore padbytes as long as you do not use them: "As for eager reporting of copies of uninitialised memory values, this has been suggested multiple times. Unfortunately, almost all programs legitimately copy uninitialised memory values around (because compilers pad structs to preserve alignment) and eager checking leads to hundreds of false positives. Therefore Memcheck does not support eager checking at this time."
Our problem was however that when doing the memcmp(3) to assert that "addrv" in both "a" and "b" are identical, memcmp(3) of course will also read the padbytes part of "addrv" in "a", comparing it against the same padbytes part in object "b". And when we read the padbytes, uninitialised as they are in our code, Valgrind rightfully complains about us reading uninitialised values. Even if we never use them for anything but comparing against the same uninitialised values, copied to another object.
There are three simple ways to avoid this warning.