Imagine you just arrived in an unfamiliar city and you want to see as much of it as possible in a limited amount of time. You have two choices. You can stand at the main square, visit every street that branches off from it, and only then move outward to the next ring of streets. Or you can pick one street, follow it all the way to the end, turn down a side alley, follow that to its end, and keep going deeper until you hit a dead end — only then backtracking to try something else.
Both strategies will eventually cover the whole city, but the experience is completely different. In the first approach you quickly build a wide overview. In the second you get intimate with one neighborhood but might not even know other neighborhoods exist for hours. This is exactly the choice every web crawler faces when it starts walking through your site — and it is the difference between BFS and DFS.
Before we compare strategies, it helps to see what a crawler is actually walking through. A website is a graph: every page is a node, and every internal link is an edge that connects two nodes. The home page connects to the about page, which connects to the team page, which might link back to a blog post on the home page. Loops, shortcuts, dead ends — all of these exist on real sites.
The crawler's job is to start at one node (usually the home page) and discover as many reachable nodes as possible. The question is simply: in what order should it visit them? The answer to that question is the traversal strategy, and it shapes almost everything else about how the crawler behaves.
BFS moves outward in layers. The crawler starts at the home page (depth 0). It then visits every page directly linked from the home page (depth 1) before going any deeper. Only after all depth-1 pages are processed does it start on depth 2. Think of it like ripples expanding from a stone dropped in water — the wave covers every point at a given distance before reaching the next distance.
The data structure that makes this work is the queue, which follows the FIFO principle — First In, First Out. The crawler adds newly discovered URLs to the back of the queue and always pulls the next URL to process from the front. Because URLs discovered earlier (at shallower depths) were added to the queue earlier, they always get processed before deeper URLs.
Visit every sibling before any child — use a queue (FIFO) and process URLs in the order they were discovered.
DFS takes the opposite approach. Instead of covering a level completely, the crawler picks one link from the current page and immediately follows it. Then it picks a link from that page and follows it. The crawler keeps diving deeper until it hits a dead end or runs out of unvisited links, and only then does it backtrack to try a different branch.
The data structure here is the stack, which follows LIFO — Last In, First Out. Newly discovered URLs are pushed onto the stack, and the crawler always pops the most recently added URL to process next. Because "most recent" usually means "deepest in the current branch," the crawler naturally keeps descending.
In practice, DFS is often implemented with recursion, which uses the programming language's own call stack under the hood. But the effect is identical — one branch gets fully explored before another begins.
Dive all the way down one branch before touching the next — use a stack (LIFO) or recursion.
In theory, both algorithms eventually visit every reachable page and have the same time complexity. In practice, the difference shows up the moment you add real-world constraints — limited time, limited memory, depth caps, or parallel processing. Here is where the two strategies genuinely diverge:
With BFS: within moments, the crawler already has a broad picture of the site — the home page and every page directly linked from it are done. If you stop the crawl early, you still have meaningful coverage of the most important pages.
With DFS: the crawler might spend a long time inside one section (a deep blog archive, for example) before it even touches other top-level sections. If you stop early, your picture of the site is lopsided.
With BFS: enforcing a max depth is trivial — simply refuse to enqueue URLs whose depth would exceed the limit. The crawler will fully cover every level up to that depth.
With DFS: a depth limit is possible but feels less natural. The crawler can race to the depth limit in one branch before it has even discovered the other top-level branches.
With BFS: memory scales with the width of the graph. A home page linking to 10,000 product URLs means all 10,000 URLs sit in the queue at once.
With DFS: memory scales with the depth. That is usually smaller for websites, but deep recursive calls can still blow the call stack on pathological link chains.
With BFS: shallow pages get processed first. On most sites, those are the important ones — category hubs, main service pages, the latest blog posts. If the crawl is interrupted, you still captured what matters most.
With DFS: the crawler might get lost in a deep branch of older archive content before touching today's homepage features.
With BFS: the queue lends itself beautifully to parallel work. Pull ten URLs off the queue at once and fetch them concurrently. That single property can cut crawl time for a large site dramatically.
With DFS: the whole point is to follow one branch before the next, which fights against parallel execution. You can make it concurrent, but you lose the "focused depth" property that makes DFS attractive in the first place.
All of the above might make BFS sound universally better, but DFS absolutely has its place. It is the right call when:
Greadme's Crawl Scan uses BFS with depth tracking — a structure that matches what site auditing actually needs. The crawler maintains a queue of pending URLs, each tagged with its depth from the start page. URLs are processed in FIFO order, which means shallower pages always complete before deeper ones begin.
This shapes the user experience in several concrete ways:
When you run a Crawl Scan on a 500-page site, you expect your home page, main categories, and top service pages to be checked immediately — not left for last. BFS delivers this naturally. Even if the scan hits a limit before completing, the most important parts of the site are already covered.
Every crawl has a depth cap (for example, three levels from the start URL). With BFS, enforcing this is just "don't enqueue URLs past the limit." There is no risk of recursion accidentally going deeper, and no lopsided coverage where one branch races ahead of all others.
Instead of fetching one page at a time, the crawler pulls batches of URLs from the queue and fetches them in parallel. This dramatically reduces scan time on large sites and is a property that essentially only BFS supports cleanly.
Because BFS covers the site in layers, progress updates map onto something users can actually reason about — "80% of depth-1 pages done" is a real statement. Progress under DFS is much harder to express usefully because one deep branch can distort the whole picture.
If you want to see Crawl Scan in action and read about what it detects across an entire site, check out our guide on what Crawl Scan is in Greadme.
From a pure algorithmic standpoint, BFS and DFS are identical on time: both visit every node once and examine every edge once, giving a time complexity of O(V + E), where V is the number of pages and E is the number of links.
The difference is in memory. BFS uses O(W) memory, where W is the maximum width of the graph — the largest number of nodes that exist at any single depth. DFS uses O(D) memory, where D is the maximum depth. In plain terms: wide, flat sites favor DFS for memory, while deep, narrow sites favor BFS. For the vast majority of real websites the difference is minor, and the practical advantages of BFS (priority, depth control, parallelism) dominate the decision.
BFS and DFS are not just academic algorithms — they determine what a crawler sees first, how much memory it uses, whether you can run it in parallel, and what a partial result looks like when the crawl is interrupted. The simplest way to remember the difference: BFS uses a queue (FIFO) and sweeps level by level; DFS uses a stack (LIFO) and dives deep before backtracking.
For site auditing, BFS is almost always the right default. It prioritizes pages by their importance on the site (proximity to the home page is a remarkably good proxy for "matters more"), it plays well with depth limits, and it enables the parallelism that makes crawls finish in minutes instead of hours. That is why Greadme's Crawl Scan is built on BFS — it is the strategy that matches what users actually expect from a site audit.
Understanding the algorithm underneath your tools — whether you are using them or building them — is often the difference between "it works" and "it works well." Next time you watch a crawler march through your site, you will know exactly why it is walking the way it walks.
Greadme's Crawl Scan uses BFS with depth tracking and parallel batch fetching to audit your entire site efficiently — prioritizing the pages that matter most and surfacing issues across every page it finds.
Run a Crawl Scan on Your Site