Network Server Latency is the time your origin server spends processing a request before it begins streaming the response. It is the "thinking time" component of TTFB — what the server is actually doing after the network has finished delivering the request, but before any HTML leaves the building.
Think of TTFB as the kitchen-to-mouth chain. Network RTT is the waiter walking from the table to the kitchen. Server latency is the chef actually cooking. You can move the kitchen closer to the table (CDN), but if the chef takes 30 minutes to plate, dinner is still late.
TTFB is the sum of two fundamentally different things, and the fix for each is different.
| Aspect | Network RTT | Server Latency |
|---|---|---|
| What it is | Round-trip time for packets to travel between user and server | Server-side processing time after the request arrives |
| Bounded by | Physics (speed of light, distance, infrastructure) | Your code, database, and dependencies |
| Typical range | 20–250 ms (regional) to 400+ ms (intercontinental) | 10–1,000+ ms depending on workload |
| How to fix | CDN with edge nodes near users; HTTP/3 | Cache queries, optimize code, edge-render, scale horizontally |
| Floor (best you can do) | ~10–20 ms even with the best CDN | ~5 ms for a fully cached static response |
Diagnostic shortcut: If your TTFB is high but the same site is fast for users near your origin, you have a network problem (use a CDN). If TTFB is high everywhere, including from a server in the same data center, you have a server-latency problem (fix the backend).
Server latency is the sum of every step that runs after the request arrives and before the first byte leaves:
Example server-latency breakdown for a slow product page:
Routing + auth: 25 ms
Database query (product): 180 ms
Database query (related): 140 ms
Pricing API call: 120 ms
Template rendering: 45 ms
──────────────────────────────────
Server latency total: 510 ms (poor)Across most production sites we audit, 80% of server latency comes from 20% of the steps — almost always the database. Profile before optimizing. A 20-line code change to add an index, cache a query, or remove a synchronous API call typically beats hours of micro-optimization elsewhere.
$request_time (Nginx) or %D (Apache) — the time from receiving the request to writing the last byte. Filter the log for the slowest 1% of requests; that's your latency long tail.The single highest-leverage server-latency fix. A 200 ms uncached query repeated on every request becomes a 1–2 ms cache lookup once it's memoized. For read-heavy traffic, even a 30-second cache TTL can cut origin load by 90%.
Fix: Use Redis or Memcached for query caching. Cache at the appropriate granularity (per-query for shared data, per-user with short TTLs for user-specific data).
An unindexed query on a 100,000-row table can take 100–500 ms; the same query with the right index runs in < 5 ms. Missing indexes are the most common backend performance bug.
-- Audit slow queries
EXPLAIN ANALYZE SELECT * FROM orders
WHERE user_id = 42 AND status = 'paid';
-- Add the index it asks for
CREATE INDEX idx_orders_user_status
ON orders(user_id, status);An N+1 query happens when code fetches a list, then fires one extra query per item to load related data. Loading 50 products with 1 author each = 51 queries, and 100+ ms of accumulated latency. The fix is a single JOIN or batched fetch.
Fix: Use ORM eager loading (includes in Rails, select_related in Django, populate in Mongoose). Audit query logs for repetitive patterns.
A request that travels to a single regional origin pays full server latency on every visit. Edge runtimes run the same code at locations near every user — eliminating most of the network portion andusually outperforming a single origin's server latency.
Fix: For content-heavy pages, prefer Static Site Generation. For dynamic pages, move rendering to an edge runtime where possible.
Three sequential 100 ms API calls produce 300 ms of latency. The same three calls in parallel produce 100 ms. Most apps unwittingly serialize calls that have no dependencies on each other.
// Bad — sequential, 300 ms
const user = await fetchUser();
const orders = await fetchOrders();
const recs = await fetchRecommendations();
// Good — parallel, 100 ms
const [user, orders, recs] = await Promise.all([
fetchUser(),
fetchOrders(),
fetchRecommendations(),
]);Without a timeout, a slow third-party service drags your entire response. A recommendations API that occasionally takes 5 seconds will produce 5-second TTFBs for those visitors.
// Race the call against a hard timeout
const recs = await Promise.race([
fetchRecommendations(),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('timeout')), 200)
),
]).catch(() => []); // graceful fallbackServerless cold starts are a hidden source of long-tail latency. A function that runs in 80 ms when warm can take 1,500 ms cold. The first user after each quiet window suffers.
Fix: Use provisioned concurrency or scheduled keep-alive pings to keep functions warm. For latency-critical routes, prefer always-on container workloads or edge runtimes that have near-zero cold-start times.
Brotli compresses HTML 15–25% smaller than gzip at the same CPU cost. Smaller payload = fewer round-trips required to deliver = lower effective TTFB on slow connections.
Fix: Enable Brotli at the CDN edge (most do this with one toggle). Confirm with the Content-Encoding: br response header.
Under-provisioned servers spike latency under load (CPU saturation, swap, GC pauses). Over-provisioned servers waste money. Find the right size by load-testing at your real-world peak traffic.
Fix:Monitor CPU, memory, and I/O at p95 during peak. If any are above 80%, scale up. If all are below 30%, scale down — over-provisioning isn't free, but slow servers cost conversions.
The cheapest server-latency regression is the one caught at deploy time. Adding a slow ORM call in a PR is easy; finding it after launch is hard.
Fix:Add a synthetic test to CI that measures server latency on a representative endpoint and fails the build if p95 latency regresses > 50 ms. Track p95 in production via APM.
What's happening: A query against a growing table is doing a full scan because the column it filters on isn't indexed. Latency grows linearly with table size.
Fix: Run EXPLAIN ANALYZE on the query, add the index it suggests. For Postgres, pg_stat_statements ranks the worst queries automatically.
What's happening: Server-side code makes a blocking call to a third-party (recommendations, payments, social) to produce the HTML, paying that service's full latency on every render.
Fix: Move the call client-side (load the data after page render), or call it in parallel with a strict timeout and a graceful fallback. Cache responses where the data is shared across users.
What's happening: Your routes are infrequently visited and the function spins down between requests; every "first" visitor pays a 1–2 second penalty.
Fix: Provisioned concurrency, scheduled ping warmup, or move latency-critical routes to always-on workloads. Edge runtimes generally have negligible cold-start cost.
What's happening: A list page loads 50 items, and the ORM fires 51 queries (1 for the list, 50 for related data). Each is fast individually; the total is slow.
Fix: Use eager loading. Inspect query logs in development to spot patterns where N queries fire in a loop.
For dynamic origins, target ≤ 100 msat p75. Static origins served from cache should be under 50 ms. Above 300 ms, server latency starts dragging TTFB into the "needs improvement" range, which in turn pushes LCP into a failing Core Web Vital — directly affecting Google rankings.
TTFB includes everything from request initiation to first byte received: redirects, DNS, TCP, TLS, request transmission, and server processing. Server latency is just the "server processing" portion — the part you have direct control over. They're related, but reducing TTFB requires fixing both server latency and network round-trips.
Use database-level instrumentation: pg_stat_statementsfor Postgres, the slow query log for MySQL, or your APM's database panel. Sort by total time (not average), since "medium-slow but very frequent" queries usually consume more time than "rarely-run but slow."
Only if the CDN can serve the response from cache. A CDN that caches HTML at the edge can drop server latency to near zero. A CDN that only caches static assets won't help — your dynamic HTML still goes back to the origin.
Because they multiply server latency at the worst possible time — for the first user after a quiet period, who has no warning. A 50 ms warm response becomes 1,500 ms cold. Mitigation: provisioned concurrency for critical paths, edge runtimes for everything else.
Indirectly. Slow servers reduce crawl frequency for both Googlebot and AI crawlers — stale content gets surfaced in generative results. Slow server latency also drags TTFB and LCP into failing territory, which lowers traditional rankings, which lowers AI citation odds (since AI systems preferentially cite well-ranked pages).
Yes — server latency gates everything else. No amount of frontend tuning can rescue a page that takes 1,000 ms to deliver its first byte. Always start with TTFB and server latency, then move to LCP, CLS, and the rest.
Network Server Latency is the controllable half of TTFB — the part where your code, database, and dependencies decide how long the user waits. For most slow sites, 80% of server latency comes from a handful of unindexed queries, N+1 patterns, or synchronous third-party calls. Fix those three categories and origin response usually drops by 60–80%.
Run a Greadme deep scan to see your TTFB broken down into network and server components, identify the slowest origin routes, and get a prioritized fix list.