Open any JPEG taken with a smartphone and you're looking at more than pixels. Embedded in the file header is a detailed record: camera model, lens focal length, exposure settings, the exact GPS coordinates where the photo was taken, and a timestamp accurate to the second. Most people sharing photos on the internet have no idea this data exists, let alone that it's being transmitted with every image they upload.

Metadata — data about data — is embedded in virtually every file type. Images carry EXIF and IPTC. Audio files have ID3 tags. Documents store author names, revision histories, and creation timestamps. Video containers hold dozens of tagged atoms describing codecs, chapters, and subtitles. Even archives record filesystem permissions and original file paths.

Understanding metadata matters for three reasons: privacy (what are you accidentally sharing?), workflows (what information survives when you convert between formats?), and forensics (what can a file tell you about its origin?). This guide covers all three.

EXIF: The Photo Surveillance Standard

The Exchangeable Image File Format (EXIF) was created by the Japan Electronic Industries Development Association (JEIDA) in 1995 and standardized as JEITA CP-3451. It stores camera and shooting data inside JPEG, TIFF, and some RAW formats. EXIF data sits in the APP1 marker segment of a JPEG file, structured as TIFF IFD (Image File Directory) entries.

What EXIF Actually Stores

A typical smartphone photo's EXIF data includes:

CategoryFieldsExample Values
CameraMake, Model, SoftwareApple, iPhone 15 Pro, 18.3.2
LensFocalLength, FNumber, MaxAperture6.86mm, f/1.78, f/1.78
ExposureExposureTime, ISO, ExposureBias1/120s, ISO 64, 0 EV
LocationGPSLatitude, GPSLongitude, GPSAltitude37.7749°N, 122.4194°W, 12m
TimeDateTimeOriginal, DateTimeDigitized2026:03:15 14:30:22
ImageImageWidth, ImageHeight, Orientation4032, 3024, Rotate 90 CW
ThumbnailEmbedded JPEG thumbnail160x120 preview image

The GPS data is precise enough to identify which room in a building you were in. The timestamp combined with GPS creates an exact record of where you were and when. The embedded thumbnail can be particularly insidious — some image editors modify the main image but leave the original thumbnail intact, meaning the pre-edit version persists in the file.

Privacy Implications

Posting an EXIF-intact photo of your home reveals your address. Posting photos from your workplace reveals your employer's location. A series of geotagged photos over time reveals your daily routine, commute, and frequented locations.

Major platforms strip EXIF on upload: Facebook, Instagram, Twitter/X, and WhatsApp all remove GPS data. But email attachments, direct file sharing, personal blogs, forums, and many CMS platforms do not. If you're uploading a JPEG directly to a web server, the EXIF data goes with it.

To strip EXIF: use exiftool -all= photo.jpg (Phil Harvey's ExifTool, the gold standard), or convert to a format that doesn't carry it. Converting JPEG to PNG via most tools will strip EXIF, since PNG uses a different metadata structure (tEXt/iTXt chunks) and most converters don't map EXIF fields to PNG equivalents.

IPTC and XMP: Editorial and Universal Metadata

IPTC-IIM (International Press Telecommunications Council, 1991) was designed for news photography workflows. It stores editorial information: caption, headline, keywords, copyright notice, credit line, source, and contact info. News agencies like AP and Reuters require IPTC metadata on every transmitted photo. IPTC data lives in the APP13 marker segment of JPEG files.

XMP (Extensible Metadata Platform, Adobe, 2001) is an XML-based metadata framework that can embed in nearly any file format. It's namespace-extensible, meaning anyone can define custom fields. Adobe applications use XMP extensively: Lightroom edit history, Photoshop layer information, Illustrator document settings. XMP lives in APP1 (alongside EXIF) for JPEG, in an XML packet for PDF, and as a sidecar .xmp file for formats that can't embed it.

The three standards overlap awkwardly. A JPEG can carry EXIF, IPTC, and XMP simultaneously, with conflicting values for fields like "date taken" or "caption." The Metadata Working Group (MWG, 2008) established priority rules: XMP takes precedence over IPTC, which takes precedence over EXIF, for conflicting fields.

ID3 and Audio Metadata

ID3v1 (1996) appended a fixed 128-byte tag to the end of MP3 files: 30 bytes each for title, artist, album, and comment, plus year (4 bytes), genre (1 byte from a fixed list of 80 genres), and track number. The 30-character limit and fixed genre list were immediately insufficient.

ID3v2 (1998) replaced this with a variable-length, extensible tag at the beginning of the file. It supports Unicode text, embedded album art (APIC frame), lyrics (USLT), replay gain (RGAD), and arbitrary custom frames. ID3v2.4 (2000) is the current version. A single album art image in ID3v2 can be 500KB or more — sometimes the metadata is larger than the audio itself for short tracks.

Other Audio Metadata Systems

FormatMetadata SystemKey Fields
MP3ID3v1, ID3v2Title, artist, album, track, year, genre, album art
FLACVorbis Comment + PICTURE blockArbitrary key=value pairs, embedded images
OGG/OpusVorbis CommentSame as FLAC, standardized by Xiph.org
M4A/AACiTunes-style MP4 atoms (moov.udta.meta.ilst)Title, artist, album art, tempo, gapless info
WAVLIST INFO chunk, BWF (Broadcast Wave)Limited; artist, title, comment. BWF adds originator, time reference
WMAASF Header ExtensionWM/Title, WM/AlbumArtist, WM/Picture

When converting audio formats, metadata mapping is format-dependent. FLAC to MP3 maps Vorbis Comment fields to ID3v2 tags. MP3 to OGG maps ID3v2 to Vorbis Comment. Most tools (FFmpeg, foobar2000) handle common fields automatically but may drop uncommon or custom tags.

Video Container Metadata

Video containers carry metadata at multiple levels: container-level (title, chapters, creation date), track-level (codec info, language, default track flags), and frame-level (timestamps, keyframe markers).

MP4/M4V uses a hierarchical atom/box structure rooted at moov. The udta (user data) atom holds title, artist, description. Chapter marks live in a dedicated text track. The moov atom also stores the presentation timestamp table — move or damage this atom and the entire file becomes unplayable.

MKV (Matroska, 2002) uses EBML (Extensible Binary Meta Language), a binary XML variant. MKV can store chapters with named segments, attachments (fonts for subtitles), tags in 20+ languages simultaneously, and cover art. It's the most metadata-rich video container in common use.

AVI (Microsoft, 1992) uses a RIFF chunk structure with minimal metadata support. It's one reason AVI feels outdated — you can't embed chapters, multiple subtitle tracks, or proper language tags. Converting AVI to MKV gains all these capabilities.

Document and Office Metadata

PDF metadata lives in the document information dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and optionally in an XMP metadata stream. PDF/A (archival) requires XMP. Critically, PDF metadata can also include the full text of removed-but-not-redacted content — simply drawing a black rectangle over text doesn't remove the underlying text object. Use proper redaction tools.

DOCX/XLSX/PPTX (Office Open XML) store metadata in docProps/core.xml (Dublin Core: title, creator, lastModifiedBy, revision count, creation/modification dates) and docProps/app.xml (application name, total editing time, word/page/paragraph counts). The revision count reveals how many times the document was saved. "Total Editing Time" reveals how long you spent on it. Track changes, comments, and hidden text may persist even after you think you've removed them.

ODT/ODS/ODP (OpenDocument) store similar metadata in meta.xml: initial creator, editing cycles, editing duration, document statistics. Since ODF is a ZIP archive of XML files, you can inspect this directly by unzipping the file.

When converting documents, most metadata transfers if the target format supports it. DOCX to PDF typically preserves title, author, and creation date. DOC to DOCX preserves everything since the internal structures are similar. DOCX to TXT strips all metadata — plain text has no metadata container.

What Survives Format Conversion

ConversionMetadata Behavior
JPG → PNGEXIF typically stripped (PNG uses different metadata system). Some tools map to tEXt chunks.
JPG → WebPEXIF preserved if encoder supports it (libwebp does since 0.3.0). XMP preserved.
PNG → JPGtEXt/iTXt chunks lost. No EXIF to begin with unless manually added.
MP3 → FLACID3v2 mapped to Vorbis Comment. Album art preserved as PICTURE block.
FLAC → MP3Vorbis Comment mapped to ID3v2. Cover art preserved as APIC frame.
DOCX → PDFTitle, author, dates usually preserved. Track changes and comments can leak into PDF.
MKV → MP4Container-level tags mostly lost. Chapter marks may transfer if tool supports it.
Any → TXTAll metadata lost. Plain text has no metadata container.
Any → ZIP/7ZFile metadata (timestamps, permissions) preserved. Internal file metadata untouched.

The safest assumption: metadata loss is likely during any format conversion. If metadata preservation matters, verify the output explicitly.

How to Strip Metadata

If your goal is privacy, stripping metadata before sharing is essential. Here are reliable methods by file type:

  • Images (any format): exiftool -all= image.jpg — removes EXIF, IPTC, XMP, and ICC profiles. ExifTool handles JPEG, PNG, TIFF, WebP, HEIC, and more.
  • Images (conversion method): Convert to PNG and back — most converters don't preserve cross-format metadata. But this is lossy for JPEG content.
  • Audio: FFmpeg with -map_metadata -1 strips all container and stream metadata. Or use id3v2 --delete-all song.mp3 for MP3-specific stripping.
  • Documents: In Word: File → Info → Check for Issues → Inspect Document → Remove All. In LibreOffice: File → Properties → clear all fields. For PDF: exiftool -all= document.pdf or use Ghostscript to reprocess.
  • Video: ffmpeg -i input.mp4 -map_metadata -1 -c copy output.mp4 strips container metadata without re-encoding.

Converting to plain text (DOCX to TXT, PDF to TXT) is the nuclear option — guaranteed metadata-free, but you lose all formatting.

Metadata is a double-edged sword. For photographers, it's an invaluable record of shooting conditions. For archivists, it's provenance documentation. For anyone sharing files publicly, it's a potential privacy leak that most people never think about.

The practical habit: before sharing any file publicly, check what metadata it carries. Use ExifTool, your OS's file properties dialog, or simply convert to a metadata-sparse format. And for files you're keeping, preserve the metadata — it's the only record of a file's history that travels with the file itself.