UTF-8 is a variable-width Unicode encoding that represents every character in the Unicode standard, from basic Latin letters to Chinese, Arabic, emoji, and mathematical symbols. Every HTML document must declare it explicitly with <meta charset="UTF-8"> placed inside <head> within the first 1024 bytes of the document, or browsers may guess wrong and produce mojibake (garbled characters).
UTF-8 can be declared in three places. When more than one is present, browsers use this precedence order.
| Method | Where | Priority |
|---|---|---|
| HTTP Content-Type header | Server response: Content-Type: text/html; charset=utf-8 | 1 (highest) |
| Byte Order Mark (BOM) | First 3 bytes of file (EF BB BF) | 2 |
| <meta charset="UTF-8"> | Inside <head>, within first 1024 bytes | 3 |
| Browser heuristic guess | Fallback when nothing is declared | 4 (last resort) |
Add the meta tag as the first child of <head>:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
</head>For server-side configuration, send the charset in the Content-Type header:
# Apache (.htaccess)
AddDefaultCharset UTF-8
# Nginx
charset utf-8;
# Node.js / Express
res.setHeader('Content-Type', 'text/html; charset=utf-8');For databases that store user input, use utf8mb4 in MySQL to support 4-byte characters such as emoji:
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Yes. The HTML5 spec requires UTF-8 for new HTML documents and requires authors to declare it.
Browsers start parsing using a default encoding. If they encounter a charset declaration past 1024 bytes, they have to restart parsing, which is wasteful and may not happen at all in some implementations.
Mojibake is the garbled text that appears when bytes encoded in one encoding are interpreted as another. Common symptom: "café" rendering as "café".
Strictly no, but yes in practice. The meta tag handles cases where the page is saved offline, opened from disk, or served by misconfigured infrastructure.
UTF-8 uses 1-4 bytes per character and is ASCII-compatible. UTF-16 uses 2 or 4 bytes and is not ASCII-compatible. UTF-8 is the web standard.
MySQL's utf8 is a legacy 3-byte subset. utf8mb4 is the real 4-byte UTF-8 needed for emoji and supplementary plane characters. Always use utf8mb4.
Indirectly. Search engines can index content correctly, and users see correct characters. Misencoded titles and meta descriptions look unprofessional in search results. See our meta tags complete guide.
No. The HTML5 spec discourages BOMs in UTF-8 documents because they can break server-side scripts that expect raw bytes.
Yes, often badly. AI search engines (ChatGPT, Perplexity, Google AI Overviews) preferentially cite well-ranked, well-structured pages, and a missing or wrong charset produces mojibake that those systems cannot reliably parse. International, multilingual, and emoji-bearing content is at the highest risk: garbled bytes can cause AI extractors to skip your page entirely or quote it with corrupted characters, hurting both citation odds and brand perception.
Put <meta charset="UTF-8"> as the first child of <head> on every page, configure your server to send Content-Type: text/html; charset=utf-8, and use utf8mb4 in MySQL. Verify with document.characterSet and the "Properly defines charset" audit. Run a Greadme deep scan to verify the charset declaration across your entire site so non-ASCII content stays readable for users, search engines, and AI extractors alike.