Font

Font Subsetting: Strip What You Don't Need

Published Mar 19, 2026 7 min read By ChangeThisFile Team

Quick Answer

Font subsetting removes unused glyphs from font files. A full Noto Sans CJK weighs 16 MB; subsetted to Latin-only, it's 30 KB — a 99.8% reduction. For most English-language sites, Latin subsetting cuts font files by 65-85%. Use pyftsubset to subset, always preserve OpenType layout features, and combine with unicode-range in CSS for multilingual sites.

Fonts are designed for universal language support. Inter ships with 2,548 glyphs covering Latin, Latin Extended, Greek, Cyrillic, and Vietnamese. Noto Sans JP has 17,000+ glyphs for Japanese, Chinese, and Korean characters. Your English-only blog uses maybe 200 of them.

Every unused glyph is dead weight — outline data, hinting instructions, and kerning pairs for characters that will never render on your pages. Subsetting strips this dead weight. The process is simple, the tooling is mature, and the results are dramatic. This guide covers the tools, the technique, the unicode ranges to keep, and the edge cases where subsetting backfires.

Why Full Font Files Are So Large

A font file's size is roughly proportional to its glyph count times glyph complexity. Each glyph stores Bezier curve control points, hinting instructions, and positioning data. Complex glyphs (CJK characters, Arabic contextual forms, ornamental ligatures) need more points than simple Latin letters.

Size by Character Set

Font	Full (WOFF2)	Glyphs	Latin subset (WOFF2)	Latin glyphs	Reduction
Inter	132 KB	2,548	35 KB	~250	73%
Roboto	68 KB	1,294	24 KB	~230	65%
Noto Sans	300 KB	4,600+	28 KB	~240	91%
Noto Sans JP	1.6 MB	17,000+	30 KB	~250	98%
Noto Sans CJK	16 MB	65,000+	30 KB	~250	99.8%
Source Code Pro	76 KB	1,036	28 KB	~230	63%

The pattern is clear: the more non-Latin glyphs a font contains, the more you save by subsetting. CJK fonts see the most dramatic reductions because thousands of complex ideographs each require many curve points.

Subsetting with pyftsubset

pyftsubset is part of the fonttools Python library — the same toolchain used by Google Fonts, Adobe, and most type foundries. It's the standard subsetting tool.

Installation and Basic Usage

# Install fonttools with Brotli support (for WOFF2 output)
pip install fonttools brotli

# Basic Latin subset as WOFF2
pyftsubset Inter-Regular.ttf \
  --output-file=inter-latin.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF" \
  --layout-features='*'

# Extended Latin subset (covers French, German, Spanish, etc.)
pyftsubset Inter-Regular.ttf \
  --output-file=inter-latin-ext.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF,U+0100-024F,U+0259,U+1E00-1EFF,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD" \
  --layout-features='*'

Critical Flags

--layout-features='*' — Preserves ALL OpenType layout features (ligatures, kerning, small caps). Without this flag, pyftsubset strips features not used by the retained glyphs, which often breaks kerning and ligatures. Always include this flag
--flavor=woff2 — Outputs WOFF2 directly. Without it, you get a TTF that you'd need to convert separately
--no-hinting — Strips TrueType hinting instructions. Saves 10-30% additional space. Safe for high-DPI displays; may degrade rendering at small sizes on low-DPI screens
--desubroutinize — For CFF-based OTF fonts, expands subroutines before subsetting. Sometimes produces smaller output for small subsets
--text-file=chars.txt — Subset to only the characters in a text file. Useful when glyphhanger generates a character list from your site

Automatic Subsetting with glyphhanger

glyphhanger crawls your website, extracts every character used in the rendered text, and generates a minimal subset containing only those characters:

# Install
npm install -g glyphhanger

# Crawl a site and output the character set
glyphhanger https://yoursite.com --spider

# Crawl and subset in one step
glyphhanger https://yoursite.com --subset=Inter-Regular.ttf --formats=woff2

# Output: Inter-Regular-subset.woff2 (only characters used on your site)

This produces the absolute smallest possible font for your site. The trade-off: you need to re-run glyphhanger whenever your content changes. If you add a French blog post using accented characters (e, a, c) that weren't in the original subset, those characters won't render in the web font. For dynamic sites, use a standard Latin subset instead of site-specific subsetting.

subfont: Build Tool Integration

subfont is a build tool that automatically subsets fonts as part of your build pipeline. It analyzes your HTML, extracts used characters, subsets the fonts, updates @font-face declarations, and outputs optimized files — all in one command:

npx subfont --inline-css --in-place dist/index.html

subfont is best for static sites where the build output contains all possible text. For dynamic content (CMS, user-generated text), it only captures whatever text exists at build time.

Unicode Ranges: What to Keep

The right unicode range depends on your site's languages. Here are the standard ranges:

Standard Subsetting Ranges

Range Name	Unicode Range	Covers	Use When
Basic Latin	U+0000-007F	ASCII: A-Z, a-z, 0-9, basic punctuation	English-only, minimal
Latin-1 Supplement	U+0080-00FF	Accented Latin (e, u, n), punctuation, symbols	Western European languages
Latin Extended-A	U+0100-017F	Central/Eastern European (Polish, Czech, Croatian)	EU audience
Latin Extended-B	U+0180-024F	African languages, Romanian, Vietnamese additions	Broad multilingual
General Punctuation	U+2000-206F	En/em dash, ellipsis, quotation marks, spaces	Always include
Currency Symbols	U+20A0-20CF	Euro, Pound, Yen, Rupee	E-commerce, financial
Greek	U+0370-03FF	Greek alphabet	Greek content or math symbols
Cyrillic	U+0400-04FF	Russian, Ukrainian, Bulgarian, etc.	Cyrillic-language audience

Recommended Subsets by Audience

English-only site: Basic Latin + Latin-1 Supplement + General Punctuation + Currency (~250 glyphs, 18-35 KB as WOFF2)

Western European: Add Latin Extended-A (~300 glyphs, 22-40 KB)

Pan-European: Add Latin Extended-B + Greek + Cyrillic (~600 glyphs, 35-55 KB)

Google Fonts "latin" slice: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD — this is the standard Latin subset that Google Fonts uses and a good default for most Western sites.

Automatic Subsetting Services

Some hosting platforms subset fonts automatically without manual tooling:

Google Fonts — Splits fonts by unicode-range and serves only the slices containing characters used on the page. This is CSS-level subsetting, not file-level, but the effect is similar
Cloudflare Fonts (beta) — When enabled, Cloudflare intercepts Google Fonts requests and rewrite them to serve self-hosted, optimized versions with automatic subsetting
Next.js next/font — Automatically subsets Google Fonts at build time and self-hosts them. The default subset is Latin
Netlify — Offers automatic font optimization that includes subsetting when serving Google Fonts

These automatic services work well for the common case (Google Fonts, Latin text) but give less control than manual subsetting. For custom fonts or non-Latin languages, use pyftsubset.

When NOT to Subset

Subsetting is not always appropriate. Skip it when:

Multilingual CMS with user-generated content: If users can write in any language (think WordPress with international authors), you can't predict which characters they'll use. Subsetting to Latin breaks non-Latin content. Use unicode-range splitting instead — load full character set support but in language-specific chunks
CJK content: Chinese, Japanese, and Korean each need thousands of characters. A "subset" of 5,000 CJK characters is still 300+ KB. For CJK, unicode-range splitting (Google Fonts uses ~100 slices) is more practical than static subsetting
Code editors and terminals: Monospace fonts in code environments need box-drawing characters (U+2500-257F), mathematical symbols, arrows, and other technical glyphs. Subsetting to Latin breaks code rendering
Icon fonts: Every glyph in an icon font is intentionally there. Subsetting removes icons you might use later. If you want fewer icon glyphs, use the icon library's built-in subsetting (FontAwesome's kit builder) or switch to SVG icons

Combining Subsetting with unicode-range CSS

The optimal approach combines file-level subsetting (smaller files) with CSS-level unicode-range (conditional loading):

/* Latin subset — always loaded for English content */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+2000-206F, U+20AC;
  font-display: swap;
}

/* Greek subset — loaded only if Greek characters appear */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-greek.woff2') format('woff2');
  unicode-range: U+0370-03FF;
  font-display: swap;
}

/* Cyrillic subset — loaded only if Cyrillic characters appear */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-cyrillic.woff2') format('woff2');
  unicode-range: U+0400-04FF;
  font-display: swap;
}

Each file is subsetted to contain only the characters in its unicode-range. The browser only downloads the files whose ranges match characters on the page. An English page loads 28 KB. A page mixing English and Russian loads 28 KB + 30 KB = 58 KB. A page with all three loads all three files. This is the approach Google Fonts uses internally, and it's the gold standard for multilingual font delivery.

Subsetting is the highest-impact, lowest-effort font optimization. Run pyftsubset once, save 65-98% of font weight, deploy the smaller files. For an English-only site using Inter, you go from 132 KB to 35 KB per weight. For four weights, that's 388 KB saved — the equivalent of removing several large images from your page load.

If your fonts are in TTF or OTF format, convert them to WOFF2 first, then subset. If they're already WOFF2, convert to TTF, subset with pyftsubset and --flavor=woff2 to get a subsetted WOFF2 directly. The process takes seconds and the results are permanent.

Key Takeaways

Latin subsetting reduces font files by 65-98% — Inter drops from 132 KB to 35 KB
Always use pyftsubset with --layout-features='*' to preserve kerning and ligatures
General Punctuation (U+2000-206F) and Currency (U+20AC) should always be included in subsets
glyphhanger crawls your site and generates the smallest possible character-specific subset
Combine file-level subsetting with unicode-range CSS for optimal multilingual delivery
Don't subset CJK fonts statically — use unicode-range splitting into ~100 slices instead
Re-run subsetting when content changes, or use a standard Latin range as a safe default

Frequently Asked Questions

Will subsetting break my font rendering?

Not if done correctly. Characters removed by subsetting simply fall back to the next font in your font-family chain (typically a system font). The remaining characters render identically to the full font. The risk is removing characters you actually use — for example, subsetting to ASCII breaks rendering of accented characters (e, u) in French or German text. Use a Latin + Latin Extended subset if you support Western European languages.

Does subsetting preserve kerning and ligatures?

Only if you include the --layout-features='*' flag in pyftsubset. Without this flag, pyftsubset strips OpenType layout features that reference removed glyphs, which often cascades to remove kerning tables and ligature rules even for retained characters. This is the most common subsetting mistake. Always use --layout-features='*' unless you specifically want to strip features for additional size savings.

How small can I make a font file?

The theoretical minimum is a font containing just the characters you actually render. For a typical English page, that's about 70-100 unique characters. A WOFF2 font subsetted to just those characters can be as small as 8-12 KB. In practice, a Latin subset (~250 characters, 18-35 KB) is the practical minimum because it handles any English content without per-page regeneration.

Can I subset a variable font?

Yes. pyftsubset handles variable fonts correctly — it strips glyph data and corresponding variation deltas for removed characters while preserving the variation axes and interpolation data for retained characters. The result is a smaller variable font that still supports the full weight/width/slant range. Use the same --layout-features='*' flag as with static fonts.

What's the difference between subsetting and unicode-range?

Subsetting (pyftsubset, glyphhanger) modifies the font file itself, physically removing glyph data. The resulting file is smaller. Unicode-range is a CSS feature that tells the browser which characters a font file covers, so it only downloads the file if those characters appear on the page. For best results, use both: subset each file to its target character range, then declare matching unicode-range in @font-face.

Should I subset Google Fonts?

If you self-host them, yes. Download the TTF from Google Fonts, subset with pyftsubset, convert to WOFF2, and serve from your domain. If you use Google Fonts via their CDN, they handle subsetting automatically by splitting fonts into unicode-range slices. But self-hosting with manual subsetting gives more control and eliminates the third-party request overhead.

How do I know which characters my site uses?

Use glyphhanger: 'glyphhanger https://yoursite.com --spider' crawls your site and outputs every character found in the rendered text. For static sites, this is comprehensive. For dynamic sites (CMS, user content), it only captures characters present at crawl time. For dynamic sites, use a standard language-based subset (Latin, Latin Extended, etc.) rather than a site-specific character list.

Does removing hinting save additional space?

Yes. Adding --no-hinting to pyftsubset strips TrueType hinting instructions, saving an additional 10-30% on top of character subsetting. Hinting improves rendering at small sizes on low-DPI screens (96 DPI monitors). On high-DPI displays (Retina, 4K), hinting has no visible effect. If your audience is primarily on modern devices, removing hinting is safe and saves meaningful bytes.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting