Skip to main content

Posts

Showing posts from September, 2024

Google Revamps Entire Crawler Documentation

Google significantly changed the crawler documentation, resulting in a higher level of information density and tighter topical coverage Google has launched a major revam p of its Crawler documentation, shrinking the main overview page and splitting content into three new, more focused pages.  Although the changelog downplays the changes there is an entirely new section and basically a rewrite of the entire crawler overview page. The additional pages allows Google to increase the information density of all the crawler pages and improves topical coverage. What Changed? Google’s documentation changelog notes two changes but there is actually a lot more. Here are some of the changes: Added an updated user agent string for the GoogleProducer crawler Added content encoding information Added a new section about technical properties The technical properties section contains entirely new information that didn’t previously exist. There are no changes to the crawler behavior, but by creating thr

Why Google Indexes Blocked Web Pages

 Why Google Indexes Blocked Web Pages Google's John Mueller explains why disallowed pages are sometimes indexed and that related Search Console reports can be dismissed Google’s John Mueller answered a question about why Google indexes pages that are disallowed from crawling by robots.txt and why the it’s safe to ignore the related Search Console reports about those crawls. Bot Traffic To Query Parameter URLs The person asking the question documented that bots were creating links to non-existent query parameter URLs (?q=xyz) to pages with noindex meta tags that are also blocked in robots.txt. What prompted the question is that Google is crawling the links to those pages, getting blocked by robots.txt (without seeing a noindex robots meta tag) then getting reported in Google Search Console as “Indexed, though blocked by robots.txt. Takeaways: 1. Confirmation Of Limitations Of Site: Search Mueller’s answer confirms the limitations in using the Site:search advanced search operator fo