BFS vs DFS in Web Crawlers: Traversal Strategy Compared (2026)

Saar Twito9 min read
Saar Twito
Saar TwitoFounder & SEO Engineer

Hi, I'm Saar - a software engineer, SEO specialist, and lecturer who loves building tools and teaching tech.

View author profile →

What Is BFS vs DFS in a Web Crawler?

BFS (breadth-first search) and DFS (depth-first search) are the two fundamental graph-traversal strategies a web crawler can use to walk through a website. BFS uses a queue (FIFO) and visits every page at the current depth before moving deeper. DFS uses a stack(LIFO) and dives down one branch as far as it can before backtracking. Both have time complexity O(V + E), but they produce very different coverage order and memory profiles — which is why production SEO crawlers (including Googlebot and Greadme's Crawl Scan) use BFS with priority weighting rather than pure DFS.

Key Facts (TL;DR)

  • BFS uses a queue (FIFO). URLs discovered first are processed first, so shallower pages always finish before deeper pages start.
  • DFS uses a stack (LIFO) or recursion. The crawler dives all the way down one branch before touching siblings.
  • Same time complexity, different memory. BFS uses O(W) memory (graph width); DFS uses O(D) memory (graph depth). Most websites are wider than they are deep.
  • Real-world SEO crawlers use BFS with priority. Googlebot's exact strategy is not publicly documented, but published research and Google patents (e.g., US 6,285,999) describe priority-weighted breadth-first crawling using PageRank, sitemap signals, and last-modified data.
  • The 3-click rule maps to BFS. Keeping important pages within three clicks of the home page means a BFS crawler reaches them in its first two layers.
  • Pure DFS can get stuck. Without depth caps, infinite calendar pages or faceted-navigation parameter traps will pull a DFS crawler indefinitely deeper.

A Website Is a Directed Graph

Before comparing strategies, it helps to be precise about what a crawler walks through. A website is a directed graph: every page is a node and every internal link is a directed edge. The home page links to category pages, which link to product pages, which link back to other articles. There are loops, shortcuts, and dead ends. The crawler starts at one node — usually the home page — and the question is the order in which it visits the rest.

The traversal strategy is that ordering. It determines what the crawler sees first, how much memory it needs, whether it can fetch in parallel, and what a partial result looks like if the crawl is stopped early.

BFS vs DFS Compared Across Six Dimensions

DimensionBFS (queue, FIFO)DFS (stack, LIFO)
Order of discoveryLevel by level — depth 0, then 1, then 2One branch fully, then backtrack
Memory profileO(W) — width of the widest levelO(D) — maximum depth reached
Depth cap enforcementTrivial — refuse to enqueue past limitAwkward — branches race ahead unevenly
Parallel fetchingNative — pull N from queue, fetch concurrentlyFights the algorithm — defeats "depth-first"
Partial-result qualityImportant pages first (close to home)Lopsided — one deep branch, others untouched
Real-world SEO fitStandard for production crawlersRare; used for path-finding and tree walks

How BFS Works

BFS moves outward in layers. Starting at the home page (depth 0), the crawler visits every page directly linked from it (depth 1) before going any deeper. Only after every depth-1 URL has been processed does the crawler start on depth 2, and so on. The data structure that makes this work is a queue: URLs are added at the back and pulled from the front, so older (shallower) discoveries are always served first.

// Minimal BFS crawler in pseudocode
const queue = [{ url: startUrl, depth: 0 }];
const seen = new Set([startUrl]);

while (queue.length > 0) {
  const { url, depth } = queue.shift();      // FIFO — shallowest first
  const html = await fetch(url);
  const links = extractLinks(html);

  if (depth >= MAX_DEPTH) continue;
  for (const link of links) {
    if (seen.has(link)) continue;
    seen.add(link);
    queue.push({ url: link, depth: depth + 1 });
  }
}

How DFS Works

DFS takes the opposite approach: pick one outbound link and follow it immediately, then pick a link from that page and follow it, and keep diving until you hit a dead end. Only then does the crawler backtrack and try a different branch. The data structure is a stack — newly found URLs are pushed on top and the most recent one is always processed next, which naturally keeps descending.

// Minimal DFS crawler in pseudocode
const stack = [{ url: startUrl, depth: 0 }];
const seen = new Set([startUrl]);

while (stack.length > 0) {
  const { url, depth } = stack.pop();        // LIFO — deepest first
  const html = await fetch(url);
  const links = extractLinks(html);

  if (depth >= MAX_DEPTH) continue;
  for (const link of links) {
    if (seen.has(link)) continue;
    seen.add(link);
    stack.push({ url: link, depth: depth + 1 });
  }
}

DFS is often implemented with recursion — the language's call stack acts as the explicit stack — but the behavior is identical.

Why Production SEO Crawlers Use BFS

On real websites, proximity to the home page is a strong proxy for importance. The home page links to main categories; main categories link to top products and articles. A BFS crawler reaches all of those in its first one or two layers. That gives four practical advantages every site auditor cares about:

  1. Important pages are processed first. If a 500-page crawl hits a quota or timeout, the home page, top categories, and primary service pages are already covered. Under DFS, you might have a complete map of the 2018 blog archive and nothing else.
  2. Depth limits are clean.A BFS crawler enforces "don't enqueue URLs past depth N" and the entire site up to depth N is fully covered. Under DFS, one branch can race to the depth limit before sibling branches are even discovered.
  3. Parallel fetching is native. Pulling 10 URLs off the queue and fetching them concurrently cuts crawl time dramatically. DFS sequencing fights this; you can parallelize it but you give up the depth-first property.
  4. Progress reporting is meaningful."82% of depth-1 pages complete" is a real, intuitive statement under BFS. Under DFS, progress is dominated by whichever branch happens to be active.

What Real Crawlers Actually Do: BFS With Priority

Pure BFS is rarely used in production. Googlebot, Screaming Frog, Sitebulb, and Greadme's Crawl Scan all use a priority queue instead of a plain FIFO queue. The crawler still works breadth-first, but URLs are weighted by signals such as:

  • Inclusion in the XML sitemap
  • Number of internal links pointing to the URL
  • Last-Modified header or sitemap lastmod value
  • Estimated PageRank or domain-internal authority
  • Historical change frequency
  • HTTP response time on previous fetches

Google's exact algorithm is not publicly documented, but the company's patents (notably US 6,285,999, "Method for node ranking in a linked database") and academic publications from Google researchers describe priority-weighted breadth-first crawling. The intent is the same: cover the most important pages first, while still expanding outward layer by layer rather than getting lost in one branch.

What This Means for How You Structure Your Site

Because real crawlers are BFS-shaped, the depth of a page from the home page is a meaningful SEO signal. The widely cited 3-click rule— every important page should be reachable within three clicks of the home page — is essentially a heuristic for "stay inside the first two BFS layers". Concretely:

  • Flatten deep navigation. Five-level category trees push important pages into BFS layers the crawler may never reach within a single crawl budget.
  • Link key pages from the home page or main hubs. One internal link from a layer-1 page promotes the target into layer 2 — usually within crawl reach.
  • Cap parameter URLs. Faceted navigation (color × size × brand × sort) creates exponential URL growth at deep levels — exactly where a BFS crawler with a depth cap will give up.
  • Use sitemaps for orphan content. Sitemap inclusion is a priority signal; pages not reachable through links can still be crawled.

For more on making sure your important pages are actually reachable, see the guide on how to make your site crawlable.

When DFS Is Actually the Right Choice

DFS is not obsolete — it is just rarely the right fit for site auditing. It is the right tool when:

  • You need to find one specific node deep in a graph and you have a directional hint (e.g., maze solvers, dependency resolution).
  • The graph is extremely wide but shallow. A BFS queue on a 100,000-link homepage holds 100,000 URLs at once; DFS uses constant width.
  • You are walking a real tree, not a graph. Filesystem traversal or DOM walking, where the structure is genuinely hierarchical.
  • You are reconstructing a path between two known nodes rather than mapping coverage.

Common Mistakes When Building or Reasoning About Crawlers

Confusing crawl depth with click depth

Crawl depth (the BFS layer the URL was discovered in) and click depth (the shortest path from home in the navigation) usually match — but not always. Pages reachable only via in-content links may live deeper than expected.

Running DFS without a depth cap

On any site with paginated archives, calendars, or faceted navigation, an uncapped DFS will follow ?page=2 → ?page=3 → ?page=4... indefinitely before touching another section.

Treating BFS as "fair"

Plain FIFO BFS is fair to URLs in discovery order — but in production, you want priority-weighted BFS so that high-value URLs are not stuck behind 50,000 low-value parameter URLs that happen to have been discovered first.

Ignoring the deduplication set

Both BFS and DFS need a seen set to avoid revisiting URLs. On large sites this set itself dominates memory — far more than the queue or stack.

How Greadme's Crawl Scan Uses BFS

Greadme's Crawl Scan implements BFS with depth tracking and parallel batch fetching. The crawler keeps a queue of { url, depth } entries, processes them in FIFO order with priority adjustments, fetches batches concurrently, and enforces a per-scan depth cap. The result: the home page and primary navigation are audited within seconds, depth limits are honored cleanly, and progress is reported per BFS layer. To see what Crawl Scan detects across an entire site, read about Crawl Scan in Greadme.

FAQ

Does Googlebot use BFS or DFS?

Google has not published the full algorithm, but its patents and researcher publications describe priority-weighted breadth-first crawling. PageRank, sitemap signals, and freshness data all influence the priority queue. It is not pure BFS, and it is not DFS.

What is the time complexity of BFS and DFS?

Both are O(V + E), where V is the number of pages and E is the number of links. They visit every node once and examine every edge once. The difference is in memory, not time.

Why does memory differ between BFS and DFS?

BFS holds every URL at the current frontier in the queue, so it scales with the width of the widest level — O(W). DFS only holds the current path from the root to the deepest node, so it scales with depth — O(D). On most websites W is much larger than D.

Can BFS run out of memory on a real website?

Yes — pathologically wide pages (a homepage with tens of thousands of outbound links, or a sitemap index with millions of URLs) can blow up a naive BFS queue. Production crawlers handle this with disk-backed queues and per-host fetch limits.

How does the 3-click rule relate to BFS?

A page that is three clicks from the home page is at BFS depth 3. Most BFS-based SEO crawlers use depth caps in the 3–5 range during audits, so "within three clicks" is essentially "within the auditor's typical crawl reach".

Is iterative deepening DFS a useful crawler strategy?

Iterative deepening (running DFS with depth limit 1, then 2, then 3) gives BFS-like coverage order with DFS-like memory. It is theoretically interesting but rarely used in production crawlers because re-fetching pages is expensive on the open web.

Does the choice of BFS or DFS affect what gets indexed?

For a complete crawl, no — both eventually visit every reachable page. For a partial or budgeted crawl (which is the realistic case for both Googlebot and audit tools), BFS produces a more useful subset because it prioritizes pages near the home page, which tend to be more important.

Can a crawler combine BFS and DFS?

Yes — most production crawlers do. The outer strategy is BFS-with-priority for breadth coverage, but inside a single host the crawler may schedule URLs more like DFS to be polite to the host (one connection at a time, batched over short windows).

Conclusion

BFS and DFS are not competing best-of-breed algorithms — they are different tools for different jobs. For web crawling specifically, BFS with priority weighting wins because proximity to the home page is a useful proxy for importance, depth caps are clean to enforce, and parallel fetching is native. That is why every production SEO crawler — Googlebot, Screaming Frog, Sitebulb, and Greadme's Crawl Scan — is BFS-shaped under the hood.