Fonts are designed for universal language support. Inter ships with 2,548 glyphs covering Latin, Latin Extended, Greek, Cyrillic, and Vietnamese. Noto Sans JP has 17,000+ glyphs for Japanese, Chinese, and Korean characters. Your English-only blog uses maybe 200 of them.

Every unused glyph is dead weight — outline data, hinting instructions, and kerning pairs for characters that will never render on your pages. Subsetting strips this dead weight. The process is simple, the tooling is mature, and the results are dramatic. This guide covers the tools, the technique, the unicode ranges to keep, and the edge cases where subsetting backfires.

Why Full Font Files Are So Large

A font file's size is roughly proportional to its glyph count times glyph complexity. Each glyph stores Bezier curve control points, hinting instructions, and positioning data. Complex glyphs (CJK characters, Arabic contextual forms, ornamental ligatures) need more points than simple Latin letters.

Size by Character Set

FontFull (WOFF2)GlyphsLatin subset (WOFF2)Latin glyphsReduction
Inter132 KB2,54835 KB~25073%
Roboto68 KB1,29424 KB~23065%
Noto Sans300 KB4,600+28 KB~24091%
Noto Sans JP1.6 MB17,000+30 KB~25098%
Noto Sans CJK16 MB65,000+30 KB~25099.8%
Source Code Pro76 KB1,03628 KB~23063%

The pattern is clear: the more non-Latin glyphs a font contains, the more you save by subsetting. CJK fonts see the most dramatic reductions because thousands of complex ideographs each require many curve points.

Subsetting with pyftsubset

pyftsubset is part of the fonttools Python library — the same toolchain used by Google Fonts, Adobe, and most type foundries. It's the standard subsetting tool.

Installation and Basic Usage

# Install fonttools with Brotli support (for WOFF2 output)
pip install fonttools brotli

# Basic Latin subset as WOFF2
pyftsubset Inter-Regular.ttf \
  --output-file=inter-latin.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF" \
  --layout-features='*'

# Extended Latin subset (covers French, German, Spanish, etc.)
pyftsubset Inter-Regular.ttf \
  --output-file=inter-latin-ext.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF,U+0100-024F,U+0259,U+1E00-1EFF,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD" \
  --layout-features='*'

Critical Flags

  • --layout-features='*' — Preserves ALL OpenType layout features (ligatures, kerning, small caps). Without this flag, pyftsubset strips features not used by the retained glyphs, which often breaks kerning and ligatures. Always include this flag
  • --flavor=woff2 — Outputs WOFF2 directly. Without it, you get a TTF that you'd need to convert separately
  • --no-hinting — Strips TrueType hinting instructions. Saves 10-30% additional space. Safe for high-DPI displays; may degrade rendering at small sizes on low-DPI screens
  • --desubroutinize — For CFF-based OTF fonts, expands subroutines before subsetting. Sometimes produces smaller output for small subsets
  • --text-file=chars.txt — Subset to only the characters in a text file. Useful when glyphhanger generates a character list from your site

Automatic Subsetting with glyphhanger

glyphhanger crawls your website, extracts every character used in the rendered text, and generates a minimal subset containing only those characters:

# Install
npm install -g glyphhanger

# Crawl a site and output the character set
glyphhanger https://yoursite.com --spider

# Crawl and subset in one step
glyphhanger https://yoursite.com --subset=Inter-Regular.ttf --formats=woff2

# Output: Inter-Regular-subset.woff2 (only characters used on your site)

This produces the absolute smallest possible font for your site. The trade-off: you need to re-run glyphhanger whenever your content changes. If you add a French blog post using accented characters (e, a, c) that weren't in the original subset, those characters won't render in the web font. For dynamic sites, use a standard Latin subset instead of site-specific subsetting.

subfont: Build Tool Integration

subfont is a build tool that automatically subsets fonts as part of your build pipeline. It analyzes your HTML, extracts used characters, subsets the fonts, updates @font-face declarations, and outputs optimized files — all in one command:

npx subfont --inline-css --in-place dist/index.html

subfont is best for static sites where the build output contains all possible text. For dynamic content (CMS, user-generated text), it only captures whatever text exists at build time.

Unicode Ranges: What to Keep

The right unicode range depends on your site's languages. Here are the standard ranges:

Standard Subsetting Ranges

Range NameUnicode RangeCoversUse When
Basic LatinU+0000-007FASCII: A-Z, a-z, 0-9, basic punctuationEnglish-only, minimal
Latin-1 SupplementU+0080-00FFAccented Latin (e, u, n), punctuation, symbolsWestern European languages
Latin Extended-AU+0100-017FCentral/Eastern European (Polish, Czech, Croatian)EU audience
Latin Extended-BU+0180-024FAfrican languages, Romanian, Vietnamese additionsBroad multilingual
General PunctuationU+2000-206FEn/em dash, ellipsis, quotation marks, spacesAlways include
Currency SymbolsU+20A0-20CFEuro, Pound, Yen, RupeeE-commerce, financial
GreekU+0370-03FFGreek alphabetGreek content or math symbols
CyrillicU+0400-04FFRussian, Ukrainian, Bulgarian, etc.Cyrillic-language audience

English-only site: Basic Latin + Latin-1 Supplement + General Punctuation + Currency (~250 glyphs, 18-35 KB as WOFF2)

Western European: Add Latin Extended-A (~300 glyphs, 22-40 KB)

Pan-European: Add Latin Extended-B + Greek + Cyrillic (~600 glyphs, 35-55 KB)

Google Fonts "latin" slice: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD — this is the standard Latin subset that Google Fonts uses and a good default for most Western sites.

Automatic Subsetting Services

Some hosting platforms subset fonts automatically without manual tooling:

  • Google Fonts — Splits fonts by unicode-range and serves only the slices containing characters used on the page. This is CSS-level subsetting, not file-level, but the effect is similar
  • Cloudflare Fonts (beta) — When enabled, Cloudflare intercepts Google Fonts requests and rewrite them to serve self-hosted, optimized versions with automatic subsetting
  • Next.js next/font — Automatically subsets Google Fonts at build time and self-hosts them. The default subset is Latin
  • Netlify — Offers automatic font optimization that includes subsetting when serving Google Fonts

These automatic services work well for the common case (Google Fonts, Latin text) but give less control than manual subsetting. For custom fonts or non-Latin languages, use pyftsubset.

When NOT to Subset

Subsetting is not always appropriate. Skip it when:

  • Multilingual CMS with user-generated content: If users can write in any language (think WordPress with international authors), you can't predict which characters they'll use. Subsetting to Latin breaks non-Latin content. Use unicode-range splitting instead — load full character set support but in language-specific chunks
  • CJK content: Chinese, Japanese, and Korean each need thousands of characters. A "subset" of 5,000 CJK characters is still 300+ KB. For CJK, unicode-range splitting (Google Fonts uses ~100 slices) is more practical than static subsetting
  • Code editors and terminals: Monospace fonts in code environments need box-drawing characters (U+2500-257F), mathematical symbols, arrows, and other technical glyphs. Subsetting to Latin breaks code rendering
  • Icon fonts: Every glyph in an icon font is intentionally there. Subsetting removes icons you might use later. If you want fewer icon glyphs, use the icon library's built-in subsetting (FontAwesome's kit builder) or switch to SVG icons

Combining Subsetting with unicode-range CSS

The optimal approach combines file-level subsetting (smaller files) with CSS-level unicode-range (conditional loading):

/* Latin subset — always loaded for English content */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+2000-206F, U+20AC;
  font-display: swap;
}

/* Greek subset — loaded only if Greek characters appear */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-greek.woff2') format('woff2');
  unicode-range: U+0370-03FF;
  font-display: swap;
}

/* Cyrillic subset — loaded only if Cyrillic characters appear */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-cyrillic.woff2') format('woff2');
  unicode-range: U+0400-04FF;
  font-display: swap;
}

Each file is subsetted to contain only the characters in its unicode-range. The browser only downloads the files whose ranges match characters on the page. An English page loads 28 KB. A page mixing English and Russian loads 28 KB + 30 KB = 58 KB. A page with all three loads all three files. This is the approach Google Fonts uses internally, and it's the gold standard for multilingual font delivery.

Subsetting is the highest-impact, lowest-effort font optimization. Run pyftsubset once, save 65-98% of font weight, deploy the smaller files. For an English-only site using Inter, you go from 132 KB to 35 KB per weight. For four weights, that's 388 KB saved — the equivalent of removing several large images from your page load.

If your fonts are in TTF or OTF format, convert them to WOFF2 first, then subset. If they're already WOFF2, convert to TTF, subset with pyftsubset and --flavor=woff2 to get a subsetted WOFF2 directly. The process takes seconds and the results are permanent.