Open any JPEG taken with a smartphone and you're looking at more than pixels. Embedded in the file header is a detailed record: camera model, lens focal length, exposure settings, the exact GPS coordinates where the photo was taken, and a timestamp accurate to the second. Most people sharing photos on the internet have no idea this data exists, let alone that it's being transmitted with every image they upload.
Metadata — data about data — is embedded in virtually every file type. Images carry EXIF and IPTC. Audio files have ID3 tags. Documents store author names, revision histories, and creation timestamps. Video containers hold dozens of tagged atoms describing codecs, chapters, and subtitles. Even archives record filesystem permissions and original file paths.
Understanding metadata matters for three reasons: privacy (what are you accidentally sharing?), workflows (what information survives when you convert between formats?), and forensics (what can a file tell you about its origin?). This guide covers all three.
EXIF: The Photo Surveillance Standard
The Exchangeable Image File Format (EXIF) was created by the Japan Electronic Industries Development Association (JEIDA) in 1995 and standardized as JEITA CP-3451. It stores camera and shooting data inside JPEG, TIFF, and some RAW formats. EXIF data sits in the APP1 marker segment of a JPEG file, structured as TIFF IFD (Image File Directory) entries.
What EXIF Actually Stores
A typical smartphone photo's EXIF data includes:
| Category | Fields | Example Values |
|---|---|---|
| Camera | Make, Model, Software | Apple, iPhone 15 Pro, 18.3.2 |
| Lens | FocalLength, FNumber, MaxAperture | 6.86mm, f/1.78, f/1.78 |
| Exposure | ExposureTime, ISO, ExposureBias | 1/120s, ISO 64, 0 EV |
| Location | GPSLatitude, GPSLongitude, GPSAltitude | 37.7749°N, 122.4194°W, 12m |
| Time | DateTimeOriginal, DateTimeDigitized | 2026:03:15 14:30:22 |
| Image | ImageWidth, ImageHeight, Orientation | 4032, 3024, Rotate 90 CW |
| Thumbnail | Embedded JPEG thumbnail | 160x120 preview image |
The GPS data is precise enough to identify which room in a building you were in. The timestamp combined with GPS creates an exact record of where you were and when. The embedded thumbnail can be particularly insidious — some image editors modify the main image but leave the original thumbnail intact, meaning the pre-edit version persists in the file.
Privacy Implications
Posting an EXIF-intact photo of your home reveals your address. Posting photos from your workplace reveals your employer's location. A series of geotagged photos over time reveals your daily routine, commute, and frequented locations.
Major platforms strip EXIF on upload: Facebook, Instagram, Twitter/X, and WhatsApp all remove GPS data. But email attachments, direct file sharing, personal blogs, forums, and many CMS platforms do not. If you're uploading a JPEG directly to a web server, the EXIF data goes with it.
To strip EXIF: use exiftool -all= photo.jpg (Phil Harvey's ExifTool, the gold standard), or convert to a format that doesn't carry it. Converting JPEG to PNG via most tools will strip EXIF, since PNG uses a different metadata structure (tEXt/iTXt chunks) and most converters don't map EXIF fields to PNG equivalents.
IPTC and XMP: Editorial and Universal Metadata
IPTC-IIM (International Press Telecommunications Council, 1991) was designed for news photography workflows. It stores editorial information: caption, headline, keywords, copyright notice, credit line, source, and contact info. News agencies like AP and Reuters require IPTC metadata on every transmitted photo. IPTC data lives in the APP13 marker segment of JPEG files.
XMP (Extensible Metadata Platform, Adobe, 2001) is an XML-based metadata framework that can embed in nearly any file format. It's namespace-extensible, meaning anyone can define custom fields. Adobe applications use XMP extensively: Lightroom edit history, Photoshop layer information, Illustrator document settings. XMP lives in APP1 (alongside EXIF) for JPEG, in an XML packet for PDF, and as a sidecar .xmp file for formats that can't embed it.
The three standards overlap awkwardly. A JPEG can carry EXIF, IPTC, and XMP simultaneously, with conflicting values for fields like "date taken" or "caption." The Metadata Working Group (MWG, 2008) established priority rules: XMP takes precedence over IPTC, which takes precedence over EXIF, for conflicting fields.
ID3 and Audio Metadata
ID3v1 (1996) appended a fixed 128-byte tag to the end of MP3 files: 30 bytes each for title, artist, album, and comment, plus year (4 bytes), genre (1 byte from a fixed list of 80 genres), and track number. The 30-character limit and fixed genre list were immediately insufficient.
ID3v2 (1998) replaced this with a variable-length, extensible tag at the beginning of the file. It supports Unicode text, embedded album art (APIC frame), lyrics (USLT), replay gain (RGAD), and arbitrary custom frames. ID3v2.4 (2000) is the current version. A single album art image in ID3v2 can be 500KB or more — sometimes the metadata is larger than the audio itself for short tracks.
Other Audio Metadata Systems
| Format | Metadata System | Key Fields |
|---|---|---|
| MP3 | ID3v1, ID3v2 | Title, artist, album, track, year, genre, album art |
| FLAC | Vorbis Comment + PICTURE block | Arbitrary key=value pairs, embedded images |
| OGG/Opus | Vorbis Comment | Same as FLAC, standardized by Xiph.org |
| M4A/AAC | iTunes-style MP4 atoms (moov.udta.meta.ilst) | Title, artist, album art, tempo, gapless info |
| WAV | LIST INFO chunk, BWF (Broadcast Wave) | Limited; artist, title, comment. BWF adds originator, time reference |
| WMA | ASF Header Extension | WM/Title, WM/AlbumArtist, WM/Picture |
When converting audio formats, metadata mapping is format-dependent. FLAC to MP3 maps Vorbis Comment fields to ID3v2 tags. MP3 to OGG maps ID3v2 to Vorbis Comment. Most tools (FFmpeg, foobar2000) handle common fields automatically but may drop uncommon or custom tags.
Video Container Metadata
Video containers carry metadata at multiple levels: container-level (title, chapters, creation date), track-level (codec info, language, default track flags), and frame-level (timestamps, keyframe markers).
MP4/M4V uses a hierarchical atom/box structure rooted at moov. The udta (user data) atom holds title, artist, description. Chapter marks live in a dedicated text track. The moov atom also stores the presentation timestamp table — move or damage this atom and the entire file becomes unplayable.
MKV (Matroska, 2002) uses EBML (Extensible Binary Meta Language), a binary XML variant. MKV can store chapters with named segments, attachments (fonts for subtitles), tags in 20+ languages simultaneously, and cover art. It's the most metadata-rich video container in common use.
AVI (Microsoft, 1992) uses a RIFF chunk structure with minimal metadata support. It's one reason AVI feels outdated — you can't embed chapters, multiple subtitle tracks, or proper language tags. Converting AVI to MKV gains all these capabilities.
Document and Office Metadata
PDF metadata lives in the document information dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and optionally in an XMP metadata stream. PDF/A (archival) requires XMP. Critically, PDF metadata can also include the full text of removed-but-not-redacted content — simply drawing a black rectangle over text doesn't remove the underlying text object. Use proper redaction tools.
DOCX/XLSX/PPTX (Office Open XML) store metadata in docProps/core.xml (Dublin Core: title, creator, lastModifiedBy, revision count, creation/modification dates) and docProps/app.xml (application name, total editing time, word/page/paragraph counts). The revision count reveals how many times the document was saved. "Total Editing Time" reveals how long you spent on it. Track changes, comments, and hidden text may persist even after you think you've removed them.
ODT/ODS/ODP (OpenDocument) store similar metadata in meta.xml: initial creator, editing cycles, editing duration, document statistics. Since ODF is a ZIP archive of XML files, you can inspect this directly by unzipping the file.
When converting documents, most metadata transfers if the target format supports it. DOCX to PDF typically preserves title, author, and creation date. DOC to DOCX preserves everything since the internal structures are similar. DOCX to TXT strips all metadata — plain text has no metadata container.
What Survives Format Conversion
| Conversion | Metadata Behavior |
|---|---|
| JPG → PNG | EXIF typically stripped (PNG uses different metadata system). Some tools map to tEXt chunks. |
| JPG → WebP | EXIF preserved if encoder supports it (libwebp does since 0.3.0). XMP preserved. |
| PNG → JPG | tEXt/iTXt chunks lost. No EXIF to begin with unless manually added. |
| MP3 → FLAC | ID3v2 mapped to Vorbis Comment. Album art preserved as PICTURE block. |
| FLAC → MP3 | Vorbis Comment mapped to ID3v2. Cover art preserved as APIC frame. |
| DOCX → PDF | Title, author, dates usually preserved. Track changes and comments can leak into PDF. |
| MKV → MP4 | Container-level tags mostly lost. Chapter marks may transfer if tool supports it. |
| Any → TXT | All metadata lost. Plain text has no metadata container. |
| Any → ZIP/7Z | File metadata (timestamps, permissions) preserved. Internal file metadata untouched. |
The safest assumption: metadata loss is likely during any format conversion. If metadata preservation matters, verify the output explicitly.
How to Strip Metadata
If your goal is privacy, stripping metadata before sharing is essential. Here are reliable methods by file type:
- Images (any format):
exiftool -all= image.jpg— removes EXIF, IPTC, XMP, and ICC profiles. ExifTool handles JPEG, PNG, TIFF, WebP, HEIC, and more. - Images (conversion method): Convert to PNG and back — most converters don't preserve cross-format metadata. But this is lossy for JPEG content.
- Audio: FFmpeg with
-map_metadata -1strips all container and stream metadata. Or useid3v2 --delete-all song.mp3for MP3-specific stripping. - Documents: In Word: File → Info → Check for Issues → Inspect Document → Remove All. In LibreOffice: File → Properties → clear all fields. For PDF:
exiftool -all= document.pdfor use Ghostscript to reprocess. - Video:
ffmpeg -i input.mp4 -map_metadata -1 -c copy output.mp4strips container metadata without re-encoding.
Converting to plain text (DOCX to TXT, PDF to TXT) is the nuclear option — guaranteed metadata-free, but you lose all formatting.
Metadata is a double-edged sword. For photographers, it's an invaluable record of shooting conditions. For archivists, it's provenance documentation. For anyone sharing files publicly, it's a potential privacy leak that most people never think about.
The practical habit: before sharing any file publicly, check what metadata it carries. Use ExifTool, your OS's file properties dialog, or simply convert to a metadata-sparse format. And for files you're keeping, preserve the metadata — it's the only record of a file's history that travels with the file itself.