What Is HTML lang?
What Is the HTML lang Attribute?
The lang attribute on the <html> element declares the primary language of the page using a standardized BCP 47 code (en, fr, en-GB, zh-Hans, etc.). Screen readers use it to choose the correct pronunciation, browsers use it to suggest the right translation, and search engines use it to understand which audience the page should be served to.
Key Facts (TL;DR)
- WCAG criterion: WCAG 2.2 SC 3.1.1 Language of Page — Level A (the minimum legal bar in most jurisdictions).
- Where it goes:
<html lang="en">— on the root element, every page, every time. - Format: BCP 47. ISO 639-1 (2 letters) for language, optional ISO 3166-1 (2 letters) for region:
en,en-US,en-GB,pt-BR,zh-Hans. - Audit threshold: Audits flag missing
langAND invalidlangvalues as separate failures. - SEO impact: Google uses
langalongside content analysis to determine the language for ranking. A wrong code can surface your page to the wrong audience. - Inline use: Mixed-language content uses
langon inline elements too (<span lang="la">veni, vidi, vici</span>) so screen readers switch pronunciation mid-sentence.
Think of lang as the label on a record sleeve telling the player which speed to spin at. Without it, the needle still drops — but the music comes out garbled. A French page read aloud with English pronunciation rules is unintelligible, even though every letter is correct.
Why the lang Attribute Matters
lang is one of the cheapest accessibility wins on the web — a single attribute, zero performance cost, and four distinct payoffs:
- Screen reader pronunciation: Without
lang, screen readers fall back to the user's system language. NVDA, JAWS, and VoiceOver all switch voice profiles based onlang. A French page read with English phonetic rules is functionally incomprehensible to a blind user. - SEO and search ranking: Google uses
langtogether with content-language analysis to decide which language pool a page belongs to. A page set tolang="en"but written in Spanish will compete in the wrong queries — and lose. - Browser features: Translation prompts (Chrome's "Translate this page?"), hyphenation rules, font selection, quotation-mark glyphs, and date/number formatting all rely on
lang. Get it wrong and Chrome offers to translate French to French. - AI search and generative engines: Multilingual AI search systems (ChatGPT search, Perplexity, Google AI Overviews) use
langto decide which language pool a page belongs to when answering a query. A page mislabelled in English won't be cited for French queries even if its content is perfectly French. - Legal compliance: WCAG 2.2 SC 3.1.1 is Level A. EU, US (Section 508), UK, and Canadian accessibility regulations all require Level A as a minimum — making
langa legal requirement, not a best practice.
The Silent Failure
Missing lang is a silent failure for sighted users — the page looks identical. That's precisely why it's the most common accessibility violation on the web: nobody on the team notices it during QA. Industry analysis of automated accessibility scans consistently places "html element does not have a langattribute" in the top 10 most-flagged issues across all sites.
WCAG 2.2 SC 3.1.1 — Language of Page (Level A)
The Level A criterion states: "The default human language of each Web page can be programmatically determined.""Programmatically determined" is the operative phrase — assistive tech needs a code it can parse, not a human guess.
Audits check two distinct conditions:
- Presence: Does
<html>have alangattribute at all? - Validity: Is the value a valid BCP 47 tag?
en-EN,english, anden_USall fail validation even though they look reasonable.
<!-- BAD: no lang attribute -->
<!DOCTYPE html>
<html>
<head><title>About Us</title></head>
<body>...</body>
</html>
<!-- BAD: invalid BCP 47 (en-EN doesn't exist) -->
<html lang="en-EN">
<!-- BAD: underscore instead of hyphen -->
<html lang="en_US">
<!-- BAD: language doesn't match content -->
<html lang="en">
<body>
<h1>Bienvenue sur notre site</h1>
<p>Nous sommes ravis de vous accueillir.</p>
</body>
</html>
<!-- GOOD: valid 2-letter code -->
<!DOCTYPE html>
<html lang="en">
<head><title>About Us</title></head>
<body>...</body>
</html>
<!-- GOOD: language + region when meaningful -->
<html lang="en-GB">
<!-- GOOD: script subtag for Chinese -->
<html lang="zh-Hans"> <!-- Simplified -->
<html lang="zh-Hant"> <!-- Traditional -->Common BCP 47 Language Codes
BCP 47 is the format the W3C and IETF agreed on. The structure is language[-script][-region]. For most sites the 2-letter language code is enough; add a region only when pronunciation or content genuinely differs.
| Code | Language | When to use the regional variant |
|---|---|---|
en | English | Default for English content where region doesn't matter. |
en-US | American English | Spelling differs (color/colour); voice-over should use US accent. |
en-GB | British English | UK spelling, UK financial/legal context. |
fr | French | Default for French content. |
es | Spanish | Use es-ES vs es-MX only if content is regionalized. |
pt-BR | Brazilian Portuguese | Strongly recommended — pronunciation and vocabulary differ significantly from pt-PT. |
zh-Hans | Simplified Chinese | Mainland China, Singapore. |
zh-Hant | Traditional Chinese | Taiwan, Hong Kong. |
ar | Arabic | Pair with dir="rtl" for right-to-left rendering. |
ja | Japanese | Default — region rarely needed. |
How to Check Your lang Attribute
Detection is fast — every accessibility audit catches a missing lang. Validating that the value is correct (and matches the actual content) takes a little more care.
- Greadme deep scan — flags missing and invalid
langattributes alongside the rest of your WCAG audit, with a one-click fix that opens a GitHub PR with the corrected attribute. - Greadme crawler scan — checks every indexable page on your site, surfacing templates where
langwas forgotten on a single layout (often the cause of dozens of failing pages from one missing line). - Greadme AI visibility analyzer — verifies that AI search engines correctly identify your page's language so it surfaces in the right multilingual queries.
- Chrome DevTools → Elements panel — inspect
<html>and check thelangattribute is present and accurate. - Google Search Console → International Targeting report — confirms how Google interprets your site's language and surfaces hreflang errors that often correlate with bad
langvalues. - Screen reader spot-check — open the page in NVDA, JAWS, or VoiceOver and listen. Wrong pronunciation is unmistakable within the first sentence.
8 Practical Rules for Setting lang Correctly
1. Always Set lang on the Root html Element
Every page needs a lang on <html>. Setting it on <body> or a <div> instead does not satisfy WCAG 3.1.1.
<!DOCTYPE html>
<html lang="en">
<head>
<title>Welcome</title>
</head>
<body>...</body>
</html>2. Use the Shortest Valid Code That's Accurate
Prefer en over en-USunless region genuinely matters. Over-specifying can cause some assistive technologies to fall back to a generic voice if they don't carry the regional profile.
3. Mark Inline Foreign Phrases with lang on the Span
Mixed-language content needs inline lang. Screen readers switch pronunciation mid-sentence based on it.
<p>The Roman general declared
<span lang="la">veni, vidi, vici</span>
— "I came, I saw, I conquered."</p>
<p>The French phrase
<span lang="fr">c'est la vie</span>
means "that's life."</p>4. Keep lang in Sync with the Actual Content
If you translate a page, update lang. A French translation that still says lang="en" is worse than no lang at all — it actively misleads assistive tech.
5. Set lang Dynamically in Single-Page Apps
SPAs that switch languages at runtime must update document.documentElement.lang when the locale changes — otherwise screen readers continue using the original pronunciation rules.
// React example — keep <html lang> in sync with locale
import { useEffect } from 'react';
function LocaleSync({ locale }) {
useEffect(() => {
document.documentElement.lang = locale;
}, [locale]);
return null;
}6. Use Hyphens, Not Underscores
BCP 47 uses hyphens: en-US is valid, en_US is not. The underscore form comes from POSIX locale conventions and silently fails validation.
7. Don't Mark Proper Nouns or Code Blocks
Brand names, place names, and code samples shouldn't carry lang attributes. Tokyo, Nestlé, and a JavaScript snippet are not natural-language content the screen reader needs to switch voices for.
8. Pair RTL Languages with dir="rtl"
Arabic, Hebrew, Persian, and Urdu need both a language code and a direction attribute for the layout to render correctly.
<html lang="ar" dir="rtl">
<html lang="he" dir="rtl">Common lang Mistakes (and Fixes)
Problem: Missing lang Attribute Entirely
What's happening: The most common WCAG 3.1.1 failure. <html> has no lang at all — screen readers fall back to the user's system language and mispronounce everything.
Fix: Add lang="en" (or the appropriate code) to the root <html> element in your base template/layout. In Next.js: <html lang="en"> in the root layout.
Problem: Invalid Code Like en-EN
What's happening: en-EN doesn't exist (England's ISO 3166 code is GB, not EN). Audits flag it as invalid and screen readers ignore the region tag.
Fix: Use en (generic) or en-GB (British) — never en-EN. When in doubt, drop the region: en alone is always valid.
Problem: lang Doesn't Match the Content
What's happening: Page declares lang="en" but the body text is in French. Common on translated sites that copied a template without updating the root attribute.
Fix: Drive lang from the same locale variable that drives the content. If you have an i18n provider, hook the root layout into it so the two can never drift.
Problem: Inline Foreign Phrases Not Marked
What's happening: A page in English has French quotes, Latin mottos, or German place names with no inline lang. Screen readers read "Schadenfreude" as English phonetics — sounds nothing like the actual word.
Fix: Wrap the foreign phrase in <span lang="...">. WCAG SC 3.1.2 (Language of Parts) is Level AA and requires this for any non-trivial foreign-language content.
SC 3.1.1 vs SC 3.1.2 — Page vs Parts
WCAG splits language requirements into two criteria. Knowing which one applies clarifies what to fix first.
| Criterion | Level | Where lang goes | Trigger |
|---|---|---|---|
| SC 3.1.1 Language of Page | A | <html lang="..."> | Every page needs a primary language declared. |
| SC 3.1.2 Language of Parts | AA | Inline elements (span, p, blockquote) | Any passage in a different language from the page default. |
FAQ
Is the lang attribute required for SEO?
Not strictly required — Google can detect language from content alone — but strongly recommended. lang is one of several signals (along with hreflang, content language, and server location) Google uses to assign a page to a language pool. A wrong or missing langcan cost you visibility in your target market's search results.
What's the difference between lang and hreflang?
lang declares the language of the current page. hreflang (used in <link> tags or sitemaps) tells Google about alternate language versions of the same page. Both should be present on multilingual sites and they should agree with each other.
Should I use en or en-US?
Use en unless region genuinely matters for spelling, currency, or pronunciation. en is universally supported; en-US and en-GB are useful when the content is explicitly localized. Over-specifying can cause some screen readers to fall back to a generic English voice if they lack the regional profile.
Does the lang attribute affect AI search engines like ChatGPT and Perplexity?
Yes. Multilingual AI search systems use lang to decide which language pool a page belongs to when answering a query. A French page mislabelled lang="en" may not surface for French queries even if the content is perfect French. As AI search becomes more multilingual, accurate langincreasingly determines whether you're cited at all.
What happens if lang is missing?
Three things break. Screen readers use the user's system language to pronounce everything (often badly wrong). Browsers can't offer accurate translation prompts. Search engines must guess the language from the body text — sometimes wrong, especially on short pages or pages with mixed content. Plus, you fail WCAG 2.2 SC 3.1.1 Level A, which has legal implications under EU EAA, US Section 508, UK Equality Act, and similar laws.
Do I need lang on every iframe and embed?
The iframe's own document needs lang on its own <html>. The parent document's lang doesn't cascade through the iframe boundary. If you control the embedded content, set lang there too. If you don't (third-party widgets), it's out of your hands — focus on labelling the iframe with a clear title instead.
Can a single-page app keep lang updated when the user switches language?
Yes — and it must. Update document.documentElement.lang whenever the locale changes. Most i18n libraries (next-intl, react-intl, i18next) expose a hook or callback for this; wire it to the root element on every locale change.
Conclusion
The lang attribute is one line of HTML that decides whether a blind user hears your content correctly, whether Chrome offers the right translation prompt, and whether your page surfaces in the right language pool on Google and AI search. WCAG 2.2 SC 3.1.1 makes it Level A — non-negotiable for legal compliance — and BCP 47 makes the format predictable: a 2-letter code, optionally followed by a region or script.
The fix is almost always trivial; the hard part is finding every template that's missing it. Run a Greadme deep scan to surface missing and invalid lang attributes across your site, and to catch the inline-language failures that automated tools usually miss.
