Impact of metadata on Image Performance

sbierwagen · on Sept 6, 2016

  On an average, this kind of metadata occupies 16% of size 
  of the JPEG file.

Ho ho. You think that's bad? Back in 2011, Tumblr didn't strip metadata from avatar images. That results in some funny files, like this one: http://28.media.tumblr.com/avatar_c5ee131b70d0_40.png

That PNG has a 3325 byte IDAT chunk, and a 106022 iCCP chunk. The metadata is 3188% bigger than the image itself.

Personally, I think websites should strip metadata from thumbnails and resized images, but should also let you download the original, unmodified image, complete with original filename. Why?

Instagram and others always recompress and strip metadata when you submit an image. This results in shitpics-- images so mangled by recompression that they look like visual gravel: https://theawl.com/the-triumphant-rise-of-the-shitpic-e25d8e... This is a complete own goal, there's no technical reason this has to happen. Digital files aren't supposed to decay!

And, of course, stripping authorship tags would make the dream of automated attribution impossible: https://eev.ee/blog/2016/08/15/attribution-on-the-web/

inian · on Sept 6, 2016

I just looked at JPEG files for this..should look at PNG files too..hopefully things are much better than that image you posted haha

the8472 · on Sept 6, 2016

relevant xkcd: https://www.xkcd.com/1683/

soamv · on Sept 6, 2016

From my experience hosting a bunch of user-provided images:

1. Strip all metadata but provide downloads of originals somewhere

2. Keep it simple, just use imagemagick's convert to remove profiles (but don't use imagemagick for file type detection)

3. If the image has orientation exif tags, rotate the image to the right orientation (-auto-orient) before removing the exif profile.

4. Don't remove image profile data. Or convert to sRGB first.

huphtur · on Sept 6, 2016

ImageOptim is a handy little tool to strip all the metadata https://imageoptim.com/mac

laurent123456 · on Sept 6, 2016

There's some use to this metadata, for example gps coordinates to locate where it was taken, author info, camera parameters, etc. It might not be needed all the time, but it probably also shouldn't be stripped off all the time.

inian · on Sept 6, 2016

Yup this information is indeed useful for a lot of cases (for photo editing software, etc.)..But for images delivered on the web it makes sense to preprocess them to strip off the EXIF data since it is mostly not used by browsers.

tombrossman · on Sept 6, 2016

Exif data is particularly useful for preserving copyright metadata and (optionally) contact info for the photographer. Stripping too much metadata perpetuates the 'orphan works' problem and creates lots of photos floating around the internet that can never be used commercially, because the photographer cannot be identified.

More info here from the US Copyright office - choice quote: "For good faith users, orphan works are a frustration, a liability risk, and a major cause of gridlock in the digital marketplace." http://www.copyright.gov/orphan/

Also, the UK has effectively given everyone the green light to steal photos lacking metadata because it's basically too difficult to find the photographer. http://www.bbc.co.uk/news/technology-22337406

And from my perspective, I release many images CCO Public Domain with my email or name in the metadata, and I'm annoyed that this metadata is not preserved because it means people may be reluctant to re-use my images due to (non-existent) copyright concerns.

inian · on Sept 7, 2016

Yup, I have covered this use case in the article

wongarsu · on Sept 6, 2016

I think the article is lacking some nuance here. Just because the image is delivered via HTTP(S) doesn't mean that it's only intended use is viewing a browser. A lot of images are downloaded and used beyond that, and quite a few websites primarily serve images for use outside the browser.

inian · on Sept 6, 2016

Agreed, in those cases preserving the metadata might be the way to go depending on your use case. You just need to be aware of the trade off you are making..

taternuts · on Sept 6, 2016

The orientation value is used a fair amount by browsers and stripping it willy-nilly will likely result in some images being rotated in weird ways you're not expecting.

inian · on Sept 7, 2016

Support for this attribute is modern browsers is good but it is never used unless you visit the image directly via its URL or the like..It is going to be of no use when you embed the image in a webpage..Lot of times, I do a right-click -> open image in a new tab and see a differently oriented image and am actually confused by the behaviour..

Of course, if you are distributing the image URL directly, it is useful in that case..

steaminghacker · on Sept 6, 2016

Does Google index the metadata within images?

inian · on Sept 6, 2016

AFAIK Google doesn't use this data for indexing or SEO purposes..

eyelidlessness · on Sept 6, 2016

They do capture the information. I'd be shocked if it isn't considered in PageRank.

steaminghacker · on Sept 6, 2016

thanks. For web pages, I usually clear any existing metadata within images, then insert some simple (but correct) keywords about the image. Wondering if I'm wasting my time.