Exploring Metadata

Introduction

Many file types contain metadata: data about data. You have written files containing metadata. HTML is a "markup" language. You write markup tags to define how you want a browser to display content. The markup tags can be considered a form of visible metadata. We saw other examples of where metadata is used: Word documents and PDF files. How did we see this? These files can contain more metadata than data!

Image (JPEG) and mp3 files are rich with metadata, but this metadata is not visible.

"EXhangeable Image File (EXIF) format is a standard that specifies the formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other systems handling image and sound files recorded by digital cameras."
en.wikipedia.org/wiki/EXIF

Most digital cameras and smartphones write EXIF data to every image taken. An image file may contain EXIF camera data such as: the type of camera, whether flash was used, exposure time, orientation, shutter speed, and lens size.

This data is used by applications such as Picassa and Photoshop to edit digital images. Device manufacturers, software developers, and others are free to add or remove other information.

Photoshop writes EXIF data to files. It is one way to determine if a photo has been changed.

To the dismay of privacy advocates, some digital cameras and smartphones automatically add EXIF data for the current GPS location and/or unique device serial numbers. Some smartphones do not permit the user to turn off GPS tagging. This can have serious privacy implications.

GPS tagged images often go unnoticed by users. Those users who use social media like Facebook, Flickr, Instagr.am, twitpic, etc directly from their mobile device may be sending their GPS location. Most social networks like Facebook and Google+ now scrub (delete) the EXIF data before images are displayed to other users. Realize that once you have uploaded a file there is no getting it back or removing it.

Device serial numbers embedded in images may seem harmless. It can be argued that serial numbers are more dangerous than sending GPS data. It has become the norm that events such as protests, demonstrations, police abuse, are documented by the masses on their smartphones. When those images are uploaded and shared, it may be possible for authorities or bad actors to identify who took the photo through their serial numbers.

People can now be tracked through social media.

Tools to Examime Metadata

exiftool - a tool to examine exif data

Some EXIF data can be viewed on Windows by simply right-clicking on an image and selecting "properties". This does not show a complete list list of properties. We will look at all the EXIF data by using an application called exiftool.

exiftool lets you view and modify EXIF data in JPG files. You can download it to your own computer (Windows and Mac machines). CTS doesn't support it (sigh!), so we will run exiftool on moxie.
To do this, we need to log in to moxie (using PuTTy), and run it at the "command line".

Let's use exiftool...

  1. Get an image of your choosing (it must be a jpg/jpeg). Choose a OR b:
    1. If you have your cellphone with you, turn on GPS, take a picture, then email it to yourself. If you don't, ask someone in class to forward it to you from their email account. (Sending an MMS to an email address probably costs the same as sending an MMS to a mobile number.)
    2. Flickr does not remove all EXIF data. The site allows you to search images by the camera used to take the picture. Go to http://flickr.com/cameras/. Find a picture for some camera. Flickr lets you examine EXIF data.
  2. Use Cyberduck to upload your image file to moxie into your isc110 folder.
  3. Log into your moxie account using PuTTy.
  4. Move to your isc110 folder by typing: cd public_html/coursework/isc110
  5. Run exiftool by typing:
    exiftool the-name-of-your-jpg-file-including-the-file-extension
  6. Take the GPS information from your image file and find the location using Google Maps. The GPS coordinates need to be in 2 decimal degrees.
    If your GPS data looks something like 74 deg 12' 34.00 N, you need to convert it to decimal degrees.
    Use this site andrew.hedges.name/experiments/convert_lat_long to convert them, or do this simplecalculation:
      decimal_degree = deg + min/60 + sec/3600
    
      Use negative numbers for S and W
    
  7. Next, "scrub" (remove) the GPS information from your image file by typing:
    exiftool -gps:all= the-name-of-your-jpg-file-including-the-file-extension
  8. Run exiftool on the new file. How did the EXIF data change?

bvi - a tool to examine files in their binary form

bvi is an abbreviateion for binary visual editor. bvi displays the "raw" contents of a file in a human readable format. It looks at each byte and, where it can, it displays the ASCII symbol for each byte.

You can download it for your own use from bvi.sourceforge.net.
Take a look at the Quick Tutorial

Here is part of an mp3 file. Can you tell where this mp3 file came from?

00000000  49 44 33 03 00 00 00 02 4D 58 54 43 4F 4D 00 00 00 03 00 00 01 FF FE 54 49 54 32 00 00 00 6F 00 00 01 FF FE ID3.....MXTCOM.........TIT2...o.....
00000024  47 00 6C 00 6F 00 72 00 79 00 20 00 44 00 61 00 79 00 73 00 20 00 28 00 46 00 65 00 61 00 74 00 2E 00 20 00 G.l.o.r.y. .D.a.y.s. .(.F.e.a.t... .
00000048  42 00 72 00 75 00 63 00 65 00 20 00 53 00 70 00 72 00 69 00 6E 00 67 00 73 00 74 00 65 00 65 00 6E 00 20 00 B.r.u.c.e. .S.p.r.i.n.g.s.t.e.e.n. .
0000006C  41 00 6E 00 64 00 20 00 50 00 61 00 74 00 74 00 69 00 20 00 53 00 63 00 69 00 61 00 6C 00 66 00 61 00 29 00 A.n.d. .P.a.t.t.i. .S.c.i.a.l.f.a.).
00000090  54 43 4F 50 00 00 00 29 00 00 01 FF FE 32 00 30 00 30 00 39 00 20 00 52 00 65 00 66 00 6F 00 72 00 6D 00 20 TCOP...).....2.0.0.9. .R.e.f.o.r.m.
000000B4  00 52 00 65 00 63 00 6F 00 72 00 64 00 73 00 54 43 4F 4E 00 00 00 0B 00 00 01 FF FE 4A 00 61 00 7A 00 7A 00 .R.e.c.o.r.d.s.TCON.........J.a.z.z.
000000D8  54 50 45 31 00 00 00 21 00 00 01 FF FE 42 00 65 00 72 00 6E 00 69 00 65 00 20 00 57 00 69 00 6C 00 6C 00 69 TPE1...!.....B.e.r.n.i.e. .W.i.l.l.i
000000FC  00 61 00 6D 00 73 00 54 50 45 33 00 00 00 03 00 00 01 FF FE 54 41 4C 42 00 00 00 1F 00 00 01 FF FE 4D 00 6F .a.m.s.TPE3.........TALB.........M.o
00000120  00 76 00 69 00 6E 00 67 00 20 00 46 00 6F 00 72 00 77 00 61 00 72 00 64 00 43 4F 4D 4D 00 00 00 44 00 00 01 .v.i.n.g. .F.o.r.w.a.r.d.COMM...D...
00000144  65 6E 67 FF FE 00 00 FF FE 41 00 6D 00 61 00 7A 00 6F 00 6E 00 2E 00 63 00 6F 00 6D 00 20 00 53 00 6F 00 6E eng......A.m.a.z.o.n...c.o.m. .S.o.n
00000168  00 67 00 20 00 49 00 44 00 3A 00 20 00 32 00 30 00 39 00 34 00 37 00 38 00 31 00 33 00 37 00 54 52 43 4B 00 .g. .I.D.:. .2.0.9.4.7.8.1.3.7.TRCK.
0000018C  00 00 0D 00 00 01 FF FE 31 00 34 00 2F 00 31 00 34 00 41 50 49 43 00 00 A4 79 00 00 00 69 6D 61 67 65 2F 6A ........1.4./.1.4.APIC...y...image/j
000001B0  70 65 67 00 03 00 FF D8 FF E0 00 10 4A 46 49 46 00 01 01 01 00 60 00 60 00 00 FF DB 00 43 00 08 06 06 07 06 peg.........JFIF.....`.`.....C......


bvi version 1.3.2 Copyright (C) 1996-2004 by Gerhard Buergmann

Let's use bvi...

  1. Use Cyberduck to upload one of your own mp3 files to your isc110 folder
  2. Log onto moxie with PuTTy.
    Run bvi by typing: bvi the-name-of-your-mp3-file-including-the-file-extension
  3. Move around with the arrow keys
  4. To quit type q, or <ESC> :q<Enter key> If all else fails and it will not close type <Ctrl>+C.
  5. To save a (modified file) type :w name_of_the_new_(modified)_file
  6. To change the bytes in the file, type ":set memmove" then hit Enter key.
    You need to keep the bytes in their current locations, so you should only replace a byte with another byte. To do this:
    1. use the arrow keys to move to the byte you want replace
    2. type "r"
    3. replace the byte by typing in a new value

Demonstration

Show me your work in class so that I can check off that you've used exiftool and bvi.

Written Assignment

Part 1: The Scientific Method

You will use the Scientific Method to define and solve a problem - of your choice - about metadata in a file of your choosing.

  1. Ask a Question
    Define a problem you want to solve (a question you have) about the metadata in a particular file.
  2. Background Research
    You have already done the background research: you've looked at several tools that can be used to examine metadata. Summarize the tools. State which tool(s) you will use on your problem, and the reason(s) for your selection(s).
  3. Construct a Hypothesis
    State what you think the answer to your question will be.
  4. Test with an Experiment
    1. Write the steps to your experiment.
    2. Write the results of your experiment.
  5. Analyze and Draw Conclusions
    This section should contain information such as...
    What you determined ...
    Whether or not the results are consistent with your hypothesis...
    Should you have done something differently?...
    ...etc....
  6. Part 2

    Please answer these questions.

    1. Using exiftool vs bvi:Can you use exiftool on an mp3 file? Can these tools be used on other files, such as Word documents? PDF files? Does this seem useful?
    2. Why would someone use EXIF?
    3. What can be done with EXIF tagged images that is much more difficult or impossible without it?
    4. Does EXIF raise any privacy concerns? If so what can you do about it?
    5. How might iTunes can keep track of how many times you've shared an mp3 file?
    6. Amazon lets you store mp3 files. You are not charged for files you have purchased from them. How might they know this information?

    Use a word processor, double-spaced, for this assignment. You will write two documents: one for Part 1 (your scientific method experiment), and one for Part 2 (answers to questions). Save these Word files as PDF files (named metadata.pdf) Upload both PDF files to your isc110 page. Place a link to each of your files under in-class assignments. Name the first link "Metadata Experiment" and tthe second link "Metadata Questions and Answers These links should in an ordered list in your second in-class assignment when you browse your page. Write a one-paragraph description of this assignment that appears before the links. Do NOT include a link to this assignment from the course page.


    Written by B. Paretzky, Spring 2012
    Modified by E. Wenderholm, Spring 2013, Fall 2014