Imagine you're a health inspector visiting a restaurant. You wouldn't just check the front counter and declare the place safe — you'd inspect the kitchen, the storage room, the refrigerators, the bathrooms, and every corner where problems might hide. A single clean surface tells you nothing about the overall condition.
Websites work the same way. Your homepage might be perfectly optimized, but what about the 200 other pages on your site? The blog post from 2021 with broken images? The product page with no meta description? The old landing page that returns a 404? Problems spread across a website are invisible when you only look at one page at a time.
Greadme's Crawl Scan solves this by automatically discovering and analyzing every page on your website. It follows internal links just like a search engine bot would, systematically checking each page for SEO issues, missing meta tags, accessibility problems, broken links, and more. You enter one URL and get a comprehensive health report for your entire site.
Understanding the crawling process helps you interpret your results and configure your scans for maximum value.
You provide a starting URL — typically your homepage — and configure the crawl settings. The crawler begins by fetching that page and extracting every internal link it finds.
The crawler follows each discovered internal link, visiting new pages and finding even more links. This recursive process continues until it reaches the configured page limit or has visited every discoverable page on your site. The crawler works with or without a sitemap — it systematically follows every internal link to find all content.
As the crawler visits each page, it performs a series of checks on the page's content, meta tags, headings, images, and links. Every issue found is recorded with its severity, location, and type.
Once the crawl is complete, all findings are aggregated into a comprehensive report showing site-wide patterns, issue counts, and individual page details.
Before crawling, the scanner reads your site's robots.txt file and respects its directives. If certain paths are disallowed, the crawler will skip them — just like Googlebot would. The results include a full robots.txt analysis showing what restrictions are in place and which paths are blocked.
Each page the crawler visits is evaluated against a comprehensive set of SEO and content quality checks. Here's what it looks for:
Images without alt text are inaccessible to screen readers and invisible to search engines. The crawler identifies every image missing alternative text across your entire site.
Alt text like "image1.jpg" or "photo" provides no value. The crawler detects alt text that is too generic to be useful for accessibility or SEO purposes.
The crawler identifies images served in older formats like PNG or JPEG that could be converted to modern formats like WebP for significant file size savings.
Pages without title tags or with titles that are too long/too short. The title tag is the single most important on-page SEO element — it appears in search results, browser tabs, and social shares.
Pages without meta descriptions lose control over their search result snippet. The crawler flags every page where the description is missing, too short, or too long.
Without canonical tags, search engines may index duplicate versions of your pages, diluting your SEO authority across multiple URLs.
Pages shared on Facebook, LinkedIn, or other platforms without OG tags display generic, unappealing previews. The crawler identifies pages missing og:title, og:description, og:image, and other essential OG tags.
Similar to OG tags, Twitter Card tags control how your content appears when shared on X/Twitter. Missing tags mean missed opportunities for engaging social previews.
The crawler validates the heading hierarchy on every page — checking for missing H1 tags, multiple H1s, skipped heading levels, and other structural issues that affect both SEO and accessibility.
Pages with very little content may be flagged as thin content, which can negatively impact search rankings. The crawler identifies pages that may need more substantial content.
The crawler detects pages returning 404 errors and, critically, identifies which other pages are linking to them. This lets you find and fix the broken links at their source, not just discover that dead pages exist.
The results are designed to give you both a bird's-eye view of your site's health and the ability to drill down into specific issues on specific pages.
At the top of your results, you'll see six key metrics that summarize the entire crawl:
The results include a dedicated section for your robots.txt configuration, showing:
It's surprisingly common for websites to accidentally block important content in robots.txt. If the Crawl Scan shows that critical pages or directories are disallowed, review your robots.txt file to ensure you're not unintentionally hiding content from search engines.
With potentially hundreds of pages in your results, effective filtering is essential. The Crawl Scan results include:
Clicking on any page in your results opens a detailed view showing:
When the crawler discovers a 404 page, it doesn't just tell you the page is broken — it tells you which other pages link to it. This is critical because fixing a 404 isn't about the dead page itself (it's already gone). It's about finding and updating every link that points to it. This "linked from" data saves you the detective work of tracking down broken link sources manually.
Crawl Scan offers several configuration options that let you tailor the analysis to your needs:
Control how many pages the crawler will analyze. Options range from 50 to 500 pages. For smaller sites, a lower limit is sufficient. For larger sites, increase the limit to ensure comprehensive coverage. Start with 100 pages if you're unsure — you can always run a follow-up crawl with a higher limit.
Crawl depth determines how many link-clicks deep the crawler will go from your starting page. A depth of 3 means the crawler will follow links up to three levels away from the starting URL. This is usually sufficient to discover most pages on a well-structured site.
Choose whether the crawler should also follow links to subdomains (like blog.example.com or shop.example.com). Enable this if your site uses subdomains for different sections that you want included in the audit.
Certain patterns only become visible when you analyze an entire site rather than individual pages. Here are the most common site-wide issues Crawl Scan uncovers:
Pattern: The same issue appears on dozens or hundreds of pages
What it means: When you see the same issue (like missing OG tags or duplicate H1 patterns) across many pages, it's usually caused by a template or layout component, not by individual page content. Fixing the template fixes every affected page at once.
Example: 150 out of 200 pages are missing twitter:image tags because the site's base template doesn't include Twitter Card meta tags.
Pattern: Important pages that aren't linked from anywhere
What it means: If the crawler can't find a page by following links, search engines probably can't either. Pages that exist but aren't linked from your navigation or content are effectively invisible.
How to detect: Compare the pages the crawler found with your sitemap or CMS page list. Any pages in your CMS that weren't found during the crawl are likely orphaned.
Pattern: Multiple pages linking to the same 404 URL
What it means: When a page is deleted or its URL changes without a redirect, every page that linked to it now has a broken link. The crawler's "linked from" data reveals these chains, showing you which active pages are directing users to dead ends.
Priority: Fix 404s with the most "linked from" pages first — they're causing the most broken user experiences.
Pattern: Newer pages are well-optimized while older pages have many issues
What it means: As teams learn and improve their practices, newer content is often better optimized. But older content doesn't improve on its own. The crawl reveals which sections of your site have been "left behind" and need attention.
For larger sites, working with crawl results inside the browser may not be enough. Crawl Scan supports exporting your complete results to CSV format with full Unicode support for international content.
The export includes:
This is particularly valuable for development teams that work from spreadsheets or project management tools. Import the CSV into your favorite tool, assign issues to team members, and track fixes systematically.
You can also generate shareable links to your crawl results, allowing team members or clients to browse the full interactive results without needing a Greadme account.
Run your first crawl with a generous page limit to understand the overall scope of issues. Then use the issue type filters to focus on one category at a time — fix all missing alt text first, then move to meta descriptions, then heading structure. This systematic approach is more efficient than fixing pages one by one.
If you see the same issue across many pages, find the shared template or component that's causing it. A single template fix can resolve issues on hundreds of pages simultaneously. Always look for patterns before diving into individual page fixes.
Websites change constantly — new content is added, old pages are deleted, plugins are updated, and templates evolve. Running a monthly crawl ensures you catch new issues early before they compound. A broken link that exists for a week is a minor inconvenience; one that persists for six months is an SEO problem.
The most effective audit workflow combines both scan types: use Crawl Scan to identify which pages have issues across your site, then use Deep Scan on your most important pages to get the full 100+ parameter analysis including performance metrics, schema validation, and AI-powered recommendations.
Broken pages are one of the most damaging issues for both user experience and SEO. Every 404 page is a dead end for users and a wasted opportunity for search engines. Use the "linked from" data to fix these systematically — either by restoring the content, setting up redirects, or updating the links on pages that point to them.
Greadme's crawler is designed to behave responsibly and respectfully toward the websites it analyzes:
If your site's firewall or bot protection blocks the crawl, you can allowlist GreadmeBot by adding it to your WAF or firewall rules. The crawler uses a clearly identifiable user-agent string, making it easy to distinguish from malicious bots. Visit the Greadme bot documentation page for specific allowlisting instructions.
The most dangerous website problems are the ones you don't know about. A homepage that looks perfect might coexist with dozens of broken links, hundreds of images missing alt text, and content pages that search engines can't properly understand. Without crawling your entire site, these issues remain invisible — silently eroding your SEO, accessibility, and user experience.
Crawl Scan transforms website maintenance from a guessing game into a data-driven process. By automatically discovering every page and systematically checking each one for issues, it gives you the complete picture that individual page audits can never provide.
The pattern-level insights are particularly powerful. When you can see that 60% of your pages are missing Open Graph tags, you know it's a template issue. When you discover that a deleted product page has 15 other pages linking to it, you know exactly where to focus your fixes. These site-wide patterns are only visible through systematic crawling.
Start with a crawl. Understand the landscape. Fix the biggest issues first. Then keep crawling regularly to maintain the health of your site as it grows and changes. Your website is alive — it needs regular check-ups, not just one-time fixes.
Run a Crawl Scan to automatically discover and analyze every page on your site. Find missing alt tags, broken links, SEO issues, and more — all in one comprehensive report with real-time progress tracking.
Start Your First Crawl Scan