I think the point is that creating a collision when data + fileSize is hashed is...

gojomo · on April 13, 2011

...and if you took the same number of bits you were using to store the filesize, and instead stored that many bits of some independent secure hash, it'd be harder still.

Of course every extra bit that has to be matched makes collisions 'harder' but length bits are much weaker than other options, except insofar as they may already be available for other reasons.

justincormack · on April 12, 2011

And that was written before the md5 collisions were discovered. And no collision has yet been discovered for md5 for files of the same length, they are all extension attacks...

gojomo · on April 13, 2011

Absolutely false. Some MD5 collision-generators specifically find pairs of equally-lengthed inputs with the same hash. See for example hit #2 for [MD5 collisions]:

http://www.mscs.dal.ca/~selinger/md5collision/

'Extension attacks' are something else, which let you turn one collision into more, or create valid hashes for combinations of unknown text plus a chosen extension – not find an initial collision. See:

http://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_con...

The 'length extension' property can be helpful, once you find a collision based on 'random' nonsense, in extending that into two documents that are each meaningful-but-different and still colliding, as was done in this 2005 MD5 collision demonstration:

http://replay.waybackmachine.org/20050612011328/http://www.c...