SearchDiggity: Avoid Bot Detection Issues by Leveraging Google, Bing, and Shodan APIs
Are you plagued by Google bot detection? Are your SearchDiggity scans almost immediately pausing, promising you that they’ll be “Auto-resuming in 15 minutes.”? Do you want to avoid the frustration resulting from the Google-Bot-Detection-Blues? Then you have come to the right place. We’ll show you how you can leverage the official APIs for Google, Bing, and SHODAN within SearchDiggity, so that you can avoid having your scans blocked when performing Google Hacking assessments.
Blocked by Bot Detection
Many of you have written me, asking for help because SearchDiggity was randomly “pausing” and you didn’t know why. You filled in your target domain names, you checked the boxes on the left, you clicked SCAN … and shortly thereafter you received an error message that looked something like this:
Scanning paused [8/28/2014 8:55:40 AM]. Google has detected bot activity. Auto-resuming in 15 minutes.
Here’s a closer look:
What does “Scanning paused…Auto-resuming in 15 minutes” mean?
Google has detected that you are running a program (e.g., SearchDiggity) to perform automated Google searches, as opposed to a human browsing www.google.com manually typing in search queries. When this happens, they block you for a short period of time (previously 14 minutes – hence the auto-resume time of 15 minutes).
Alternatively, Google lets you avoid waiting by presenting a CAPTCHA – giving you the opportunity to prove that you are, in fact, a human being… and not an automated computer program. If you could see through the eyes of SearchDiggity as it interacts with Google, you would see something like this:
http://ipv4.google.com/sorry/
For more information on this, check out:
Google.com - Unusual traffic from your computer
This causes SearchDiggity scans to pause for 15 minutes. There are a number of ways you can configure SearchDiggity to avoid this type of Google bot detection, such as altering scan speed or leveraging open web proxies to spread your queries across. However, this isn’t exactly easy for a novice user who simply wants to type in search criteria and click SCAN.
If you do want to explore these more complex options, check out SearchDiggity’s help file by going to Help -> Contents, as seen here:
Avoid the Hassle – Use the APIs
To completely avoid the hassle of constantly having your scans paused due to bot detection, you can configure SearchDiggity to use the official APIs provided by Google, Bing, and SHODAN instead of scraping.
Quick Overview of APIs
The below figure gives a high-level overview of the Google and Bing APIs used by SearchDiggity:
The Bing Search API lets you pay more to get more queries per month. It is also cheaper to get “web results only”. See below for example of pricing tiers:
Quick Links – Signing up for APIs
To use the official APIs, you’ll need to sign up and get yourself an API key for each service, i.e., Google, Bing, and SHODAN. The following links will take you where you need to go to sign up:
- Google API:
- Google - API Console
- Google - Custom Search JSON/Atom API
- Google - Custom Search JSON/Atom API - Pricing
- Google Custom Search ID:
- Enter this in SearchDiggity: !001280586187183383443:vcqkedkugeo
- Bishopfox.com - Bypassing Google CSE to get Full Web Search Results
- Google - Testing the API:
- Bing API:
- SHODAN API:
- Onetime fee of $19.
- http://www.shodanhq.com/anniversary
- http://www.shodanhq.com/data/addons
Drawbacks of Using the API
There’s a few downsides to this approach:
- Cost– Using the APIs cost money, while scraping is free. However, the cost is extremely small and in my opinion worth it to avoid the aforementioned hassles.
- Limited Results– the APIs limit the total number of results per query you can get when compared to the normal web browser interface. Also, it seems that the result sets themselves are smaller in total when comparing the API results to the web browser results for Google for instance.
SearchDiggity – Filling in the API Keys
Most tools within SearchDiggity are automatically set to use scraping for both Google and Bing searches. If you wish to turn this off and instead query Google and Bing through their APIs, it’s simple. Within the individual settings space for each tool, check the Disable Scraper check box to reveal the input fields for the API to use instead.
For GoogleDiggity, fill in the API key as shown below:
For BingDiggity, fill in the API key as shown below:
For ShodanDiggity, fill in the API key as shown below:
Additional Info – SearchDiggity Help File
For more details on how to set up SearchDiggity to use the official APIs provided by Google, Bing, and SHODAN, please refer to SearchDiggity’s help file by going to Help -> Contents, as seen here:
There are several sections providing detailed guidance on API usage, which are highlighted in yellow on the left in the image below:
In Conclusion
Leveraging the official search engine APIs enables you to kick off your SearchDiggity scans and walk away worry free. No need to keep checking back to make sure you weren’t blocked by bot detection. No sleepless nights, worrying about whether or not your scan results will be waiting for you when you check your laptop in the morning. By using the API, you can simply set it and forget it.
Subscribe to Bishop Fox's Security Blog
Be first to learn about latest tools, advisories, and findings.
Thank You! You have been subscribed.