BFS (breadth-first search) and DFS (depth-first search) are the two fundamental graph-traversal strategies a web crawler can use to walk through a website. BFS uses a queue (FIFO) and visits every page at the current depth before moving deeper. DFS uses a stack(LIFO) and dives down one branch as far as it can before backtracking. Both have time complexity O(V + E), but they produce very different coverage order and memory profiles — which is why production SEO crawlers (including Googlebot and Greadme's Crawl Scan) use BFS with priority weighting rather than pure DFS.
Before comparing strategies, it helps to be precise about what a crawler walks through. A website is a directed graph: every page is a node and every internal link is a directed edge. The home page links to category pages, which link to product pages, which link back to other articles. There are loops, shortcuts, and dead ends. The crawler starts at one node — usually the home page — and the question is the order in which it visits the rest.
The traversal strategy is that ordering. It determines what the crawler sees first, how much memory it needs, whether it can fetch in parallel, and what a partial result looks like if the crawl is stopped early.
| Dimension | BFS (queue, FIFO) | DFS (stack, LIFO) |
|---|---|---|
| Order of discovery | Level by level — depth 0, then 1, then 2 | One branch fully, then backtrack |
| Memory profile | O(W) — width of the widest level | O(D) — maximum depth reached |
| Depth cap enforcement | Trivial — refuse to enqueue past limit | Awkward — branches race ahead unevenly |
| Parallel fetching | Native — pull N from queue, fetch concurrently | Fights the algorithm — defeats "depth-first" |
| Partial-result quality | Important pages first (close to home) | Lopsided — one deep branch, others untouched |
| Real-world SEO fit | Standard for production crawlers | Rare; used for path-finding and tree walks |
BFS moves outward in layers. Starting at the home page (depth 0), the crawler visits every page directly linked from it (depth 1) before going any deeper. Only after every depth-1 URL has been processed does the crawler start on depth 2, and so on. The data structure that makes this work is a queue: URLs are added at the back and pulled from the front, so older (shallower) discoveries are always served first.
// Minimal BFS crawler in pseudocode
const queue = [{ url: startUrl, depth: 0 }];
const seen = new Set([startUrl]);
while (queue.length > 0) {
const { url, depth } = queue.shift(); // FIFO — shallowest first
const html = await fetch(url);
const links = extractLinks(html);
if (depth >= MAX_DEPTH) continue;
for (const link of links) {
if (seen.has(link)) continue;
seen.add(link);
queue.push({ url: link, depth: depth + 1 });
}
}DFS takes the opposite approach: pick one outbound link and follow it immediately, then pick a link from that page and follow it, and keep diving until you hit a dead end. Only then does the crawler backtrack and try a different branch. The data structure is a stack — newly found URLs are pushed on top and the most recent one is always processed next, which naturally keeps descending.
// Minimal DFS crawler in pseudocode
const stack = [{ url: startUrl, depth: 0 }];
const seen = new Set([startUrl]);
while (stack.length > 0) {
const { url, depth } = stack.pop(); // LIFO — deepest first
const html = await fetch(url);
const links = extractLinks(html);
if (depth >= MAX_DEPTH) continue;
for (const link of links) {
if (seen.has(link)) continue;
seen.add(link);
stack.push({ url: link, depth: depth + 1 });
}
}DFS is often implemented with recursion — the language's call stack acts as the explicit stack — but the behavior is identical.
On real websites, proximity to the home page is a strong proxy for importance. The home page links to main categories; main categories link to top products and articles. A BFS crawler reaches all of those in its first one or two layers. That gives four practical advantages every site auditor cares about:
Pure BFS is rarely used in production. Googlebot, Screaming Frog, Sitebulb, and Greadme's Crawl Scan all use a priority queue instead of a plain FIFO queue. The crawler still works breadth-first, but URLs are weighted by signals such as:
Last-Modified header or sitemap lastmod valueGoogle's exact algorithm is not publicly documented, but the company's patents (notably US 6,285,999, "Method for node ranking in a linked database") and academic publications from Google researchers describe priority-weighted breadth-first crawling. The intent is the same: cover the most important pages first, while still expanding outward layer by layer rather than getting lost in one branch.
Because real crawlers are BFS-shaped, the depth of a page from the home page is a meaningful SEO signal. The widely cited 3-click rule— every important page should be reachable within three clicks of the home page — is essentially a heuristic for "stay inside the first two BFS layers". Concretely:
For more on making sure your important pages are actually reachable, see the guide on how to make your site crawlable.
DFS is not obsolete — it is just rarely the right fit for site auditing. It is the right tool when:
Crawl depth (the BFS layer the URL was discovered in) and click depth (the shortest path from home in the navigation) usually match — but not always. Pages reachable only via in-content links may live deeper than expected.
On any site with paginated archives, calendars, or faceted navigation, an uncapped DFS will follow ?page=2 → ?page=3 → ?page=4... indefinitely before touching another section.
Plain FIFO BFS is fair to URLs in discovery order — but in production, you want priority-weighted BFS so that high-value URLs are not stuck behind 50,000 low-value parameter URLs that happen to have been discovered first.
Both BFS and DFS need a seen set to avoid revisiting URLs. On large sites this set itself dominates memory — far more than the queue or stack.
Greadme's Crawl Scan implements BFS with depth tracking and parallel batch fetching. The crawler keeps a queue of { url, depth } entries, processes them in FIFO order with priority adjustments, fetches batches concurrently, and enforces a per-scan depth cap. The result: the home page and primary navigation are audited within seconds, depth limits are honored cleanly, and progress is reported per BFS layer. To see what Crawl Scan detects across an entire site, read about Crawl Scan in Greadme.
Google has not published the full algorithm, but its patents and researcher publications describe priority-weighted breadth-first crawling. PageRank, sitemap signals, and freshness data all influence the priority queue. It is not pure BFS, and it is not DFS.
Both are O(V + E), where V is the number of pages and E is the number of links. They visit every node once and examine every edge once. The difference is in memory, not time.
BFS holds every URL at the current frontier in the queue, so it scales with the width of the widest level — O(W). DFS only holds the current path from the root to the deepest node, so it scales with depth — O(D). On most websites W is much larger than D.
Yes — pathologically wide pages (a homepage with tens of thousands of outbound links, or a sitemap index with millions of URLs) can blow up a naive BFS queue. Production crawlers handle this with disk-backed queues and per-host fetch limits.
A page that is three clicks from the home page is at BFS depth 3. Most BFS-based SEO crawlers use depth caps in the 3–5 range during audits, so "within three clicks" is essentially "within the auditor's typical crawl reach".
Iterative deepening (running DFS with depth limit 1, then 2, then 3) gives BFS-like coverage order with DFS-like memory. It is theoretically interesting but rarely used in production crawlers because re-fetching pages is expensive on the open web.
For a complete crawl, no — both eventually visit every reachable page. For a partial or budgeted crawl (which is the realistic case for both Googlebot and audit tools), BFS produces a more useful subset because it prioritizes pages near the home page, which tend to be more important.
Yes — most production crawlers do. The outer strategy is BFS-with-priority for breadth coverage, but inside a single host the crawler may schedule URLs more like DFS to be polite to the host (one connection at a time, batched over short windows).
BFS and DFS are not competing best-of-breed algorithms — they are different tools for different jobs. For web crawling specifically, BFS with priority weighting wins because proximity to the home page is a useful proxy for importance, depth caps are clean to enforce, and parallel fetching is native. That is why every production SEO crawler — Googlebot, Screaming Frog, Sitebulb, and Greadme's Crawl Scan — is BFS-shaped under the hood.