What Is HTML lang? Complete Guide (2026)

Saar Twito8 min read
Saar Twito
Saar TwitoFounder & SEO Engineer

Hi, I'm Saar - a software engineer, SEO specialist, and lecturer who loves building tools and teaching tech.

View author profile →

What Is the HTML lang Attribute?

The lang attribute on the <html> element declares the primary language of the page using a standardized BCP 47 code (en, fr, en-GB, zh-Hans, etc.). Screen readers use it to choose the correct pronunciation, browsers use it to suggest the right translation, and search engines use it to understand which audience the page should be served to.

Key Facts (TL;DR)

  • WCAG criterion: WCAG 2.2 SC 3.1.1 Language of Page — Level A (the minimum legal bar in most jurisdictions).
  • Where it goes: <html lang="en"> — on the root element, every page, every time.
  • Format: BCP 47. ISO 639-1 (2 letters) for language, optional ISO 3166-1 (2 letters) for region: en, en-US, en-GB, pt-BR, zh-Hans.
  • Audit threshold: Audits flag missing lang AND invalid lang values as separate failures.
  • SEO impact: Google uses lang alongside content analysis to determine the language for ranking. A wrong code can surface your page to the wrong audience.
  • Inline use: Mixed-language content uses lang on inline elements too (<span lang="la">veni, vidi, vici</span>) so screen readers switch pronunciation mid-sentence.

Think of lang as the label on a record sleeve telling the player which speed to spin at. Without it, the needle still drops — but the music comes out garbled. A French page read aloud with English pronunciation rules is unintelligible, even though every letter is correct.

Why the lang Attribute Matters

lang is one of the cheapest accessibility wins on the web — a single attribute, zero performance cost, and four distinct payoffs:

  • Screen reader pronunciation: Without lang, screen readers fall back to the user's system language. NVDA, JAWS, and VoiceOver all switch voice profiles based on lang. A French page read with English phonetic rules is functionally incomprehensible to a blind user.
  • SEO and search ranking: Google uses lang together with content-language analysis to decide which language pool a page belongs to. A page set to lang="en" but written in Spanish will compete in the wrong queries — and lose.
  • Browser features: Translation prompts (Chrome's "Translate this page?"), hyphenation rules, font selection, quotation-mark glyphs, and date/number formatting all rely on lang. Get it wrong and Chrome offers to translate French to French.
  • AI search and generative engines: Multilingual AI search systems (ChatGPT search, Perplexity, Google AI Overviews) use langto decide which language pool a page belongs to when answering a query. A page mislabelled in English won't be cited for French queries even if its content is perfectly French.
  • Legal compliance: WCAG 2.2 SC 3.1.1 is Level A. EU, US (Section 508), UK, and Canadian accessibility regulations all require Level A as a minimum — making lang a legal requirement, not a best practice.

The Silent Failure

Missing lang is a silent failure for sighted users — the page looks identical. That's precisely why it's the most common accessibility violation on the web: nobody on the team notices it during QA. Industry analysis of automated accessibility scans consistently places "html element does not have a langattribute" in the top 10 most-flagged issues across all sites.

WCAG 2.2 SC 3.1.1 — Language of Page (Level A)

The Level A criterion states: "The default human language of each Web page can be programmatically determined.""Programmatically determined" is the operative phrase — assistive tech needs a code it can parse, not a human guess.

Audits check two distinct conditions:

  • Presence: Does <html> have a lang attribute at all?
  • Validity: Is the value a valid BCP 47 tag? en-EN, english, and en_US all fail validation even though they look reasonable.
<!-- BAD: no lang attribute -->
<!DOCTYPE html>
<html>
  <head><title>About Us</title></head>
  <body>...</body>
</html>

<!-- BAD: invalid BCP 47 (en-EN doesn't exist) -->
<html lang="en-EN">

<!-- BAD: underscore instead of hyphen -->
<html lang="en_US">

<!-- BAD: language doesn't match content -->
<html lang="en">
  <body>
    <h1>Bienvenue sur notre site</h1>
    <p>Nous sommes ravis de vous accueillir.</p>
  </body>
</html>

<!-- GOOD: valid 2-letter code -->
<!DOCTYPE html>
<html lang="en">
  <head><title>About Us</title></head>
  <body>...</body>
</html>

<!-- GOOD: language + region when meaningful -->
<html lang="en-GB">

<!-- GOOD: script subtag for Chinese -->
<html lang="zh-Hans">  <!-- Simplified -->
<html lang="zh-Hant">  <!-- Traditional -->

Common BCP 47 Language Codes

BCP 47 is the format the W3C and IETF agreed on. The structure is language[-script][-region]. For most sites the 2-letter language code is enough; add a region only when pronunciation or content genuinely differs.

CodeLanguageWhen to use the regional variant
enEnglishDefault for English content where region doesn't matter.
en-USAmerican EnglishSpelling differs (color/colour); voice-over should use US accent.
en-GBBritish EnglishUK spelling, UK financial/legal context.
frFrenchDefault for French content.
esSpanishUse es-ES vs es-MX only if content is regionalized.
pt-BRBrazilian PortugueseStrongly recommended — pronunciation and vocabulary differ significantly from pt-PT.
zh-HansSimplified ChineseMainland China, Singapore.
zh-HantTraditional ChineseTaiwan, Hong Kong.
arArabicPair with dir="rtl" for right-to-left rendering.
jaJapaneseDefault — region rarely needed.

How to Check Your lang Attribute

Detection is fast — every accessibility audit catches a missing lang. Validating that the value is correct (and matches the actual content) takes a little more care.

  • Greadme deep scan — flags missing and invalid lang attributes alongside the rest of your WCAG audit, with a one-click fix that opens a GitHub PR with the corrected attribute.
  • Greadme crawler scan — checks every indexable page on your site, surfacing templates where lang was forgotten on a single layout (often the cause of dozens of failing pages from one missing line).
  • Greadme AI visibility analyzer — verifies that AI search engines correctly identify your page's language so it surfaces in the right multilingual queries.
  • Chrome DevTools → Elements panel — inspect <html> and check the lang attribute is present and accurate.
  • Google Search Console → International Targeting report — confirms how Google interprets your site's language and surfaces hreflang errors that often correlate with bad lang values.
  • Screen reader spot-check — open the page in NVDA, JAWS, or VoiceOver and listen. Wrong pronunciation is unmistakable within the first sentence.

8 Practical Rules for Setting lang Correctly

1. Always Set lang on the Root html Element

Every page needs a lang on <html>. Setting it on <body> or a <div> instead does not satisfy WCAG 3.1.1.

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Welcome</title>
  </head>
  <body>...</body>
</html>

2. Use the Shortest Valid Code That's Accurate

Prefer en over en-USunless region genuinely matters. Over-specifying can cause some assistive technologies to fall back to a generic voice if they don't carry the regional profile.

3. Mark Inline Foreign Phrases with lang on the Span

Mixed-language content needs inline lang. Screen readers switch pronunciation mid-sentence based on it.

<p>The Roman general declared
   <span lang="la">veni, vidi, vici</span>
   — &quot;I came, I saw, I conquered.&quot;</p>

<p>The French phrase
   <span lang="fr">c&apos;est la vie</span>
   means &quot;that&apos;s life.&quot;</p>

4. Keep lang in Sync with the Actual Content

If you translate a page, update lang. A French translation that still says lang="en" is worse than no lang at all — it actively misleads assistive tech.

5. Set lang Dynamically in Single-Page Apps

SPAs that switch languages at runtime must update document.documentElement.lang when the locale changes — otherwise screen readers continue using the original pronunciation rules.

// React example — keep <html lang> in sync with locale
import { useEffect } from 'react';

function LocaleSync({ locale }) {
  useEffect(() => {
    document.documentElement.lang = locale;
  }, [locale]);
  return null;
}

6. Use Hyphens, Not Underscores

BCP 47 uses hyphens: en-US is valid, en_US is not. The underscore form comes from POSIX locale conventions and silently fails validation.

7. Don't Mark Proper Nouns or Code Blocks

Brand names, place names, and code samples shouldn't carry lang attributes. Tokyo, Nestlé, and a JavaScript snippet are not natural-language content the screen reader needs to switch voices for.

8. Pair RTL Languages with dir="rtl"

Arabic, Hebrew, Persian, and Urdu need both a language code and a direction attribute for the layout to render correctly.

<html lang="ar" dir="rtl">
<html lang="he" dir="rtl">

Common lang Mistakes (and Fixes)

Problem: Missing lang Attribute Entirely

What's happening: The most common WCAG 3.1.1 failure. <html> has no lang at all — screen readers fall back to the user's system language and mispronounce everything.

Fix: Add lang="en" (or the appropriate code) to the root <html> element in your base template/layout. In Next.js: <html lang="en"> in the root layout.

Problem: Invalid Code Like en-EN

What's happening: en-EN doesn't exist (England's ISO 3166 code is GB, not EN). Audits flag it as invalid and screen readers ignore the region tag.

Fix: Use en (generic) or en-GB (British) — never en-EN. When in doubt, drop the region: en alone is always valid.

Problem: lang Doesn't Match the Content

What's happening: Page declares lang="en" but the body text is in French. Common on translated sites that copied a template without updating the root attribute.

Fix: Drive lang from the same locale variable that drives the content. If you have an i18n provider, hook the root layout into it so the two can never drift.

Problem: Inline Foreign Phrases Not Marked

What's happening: A page in English has French quotes, Latin mottos, or German place names with no inline lang. Screen readers read "Schadenfreude" as English phonetics — sounds nothing like the actual word.

Fix: Wrap the foreign phrase in <span lang="...">. WCAG SC 3.1.2 (Language of Parts) is Level AA and requires this for any non-trivial foreign-language content.

SC 3.1.1 vs SC 3.1.2 — Page vs Parts

WCAG splits language requirements into two criteria. Knowing which one applies clarifies what to fix first.

CriterionLevelWhere lang goesTrigger
SC 3.1.1 Language of PageA<html lang="...">Every page needs a primary language declared.
SC 3.1.2 Language of PartsAAInline elements (span, p, blockquote)Any passage in a different language from the page default.

FAQ

Is the lang attribute required for SEO?

Not strictly required — Google can detect language from content alone — but strongly recommended. lang is one of several signals (along with hreflang, content language, and server location) Google uses to assign a page to a language pool. A wrong or missing langcan cost you visibility in your target market's search results.

What's the difference between lang and hreflang?

lang declares the language of the current page. hreflang (used in <link> tags or sitemaps) tells Google about alternate language versions of the same page. Both should be present on multilingual sites and they should agree with each other.

Should I use en or en-US?

Use en unless region genuinely matters for spelling, currency, or pronunciation. en is universally supported; en-US and en-GB are useful when the content is explicitly localized. Over-specifying can cause some screen readers to fall back to a generic English voice if they lack the regional profile.

Does the lang attribute affect AI search engines like ChatGPT and Perplexity?

Yes. Multilingual AI search systems use lang to decide which language pool a page belongs to when answering a query. A French page mislabelled lang="en" may not surface for French queries even if the content is perfect French. As AI search becomes more multilingual, accurate langincreasingly determines whether you're cited at all.

What happens if lang is missing?

Three things break. Screen readers use the user's system language to pronounce everything (often badly wrong). Browsers can't offer accurate translation prompts. Search engines must guess the language from the body text — sometimes wrong, especially on short pages or pages with mixed content. Plus, you fail WCAG 2.2 SC 3.1.1 Level A, which has legal implications under EU EAA, US Section 508, UK Equality Act, and similar laws.

Do I need lang on every iframe and embed?

The iframe's own document needs lang on its own <html>. The parent document's lang doesn't cascade through the iframe boundary. If you control the embedded content, set lang there too. If you don't (third-party widgets), it's out of your hands — focus on labelling the iframe with a clear title instead.

Can a single-page app keep lang updated when the user switches language?

Yes — and it must. Update document.documentElement.lang whenever the locale changes. Most i18n libraries (next-intl, react-intl, i18next) expose a hook or callback for this; wire it to the root element on every locale change.

Conclusion

The lang attribute is one line of HTML that decides whether a blind user hears your content correctly, whether Chrome offers the right translation prompt, and whether your page surfaces in the right language pool on Google and AI search. WCAG 2.2 SC 3.1.1 makes it Level A — non-negotiable for legal compliance — and BCP 47 makes the format predictable: a 2-letter code, optionally followed by a region or script.

The fix is almost always trivial; the hard part is finding every template that's missing it. Run a Greadme deep scan to surface missing and invalid lang attributes across your site, and to catch the inline-language failures that automated tools usually miss.