SearchDiggity: Avoid Bot Detection Issues by Leveraging Google, Bing, and Shodan APIs

Robot inspecting failed stamp

Share

Are you plagued by Google bot detection? Are your SearchDiggity scans almost immediately pausing, promising you that they’ll be “Auto-resuming in 15 minutes.”?  Do you want to avoid the frustration resulting from the Google-Bot-Detection-Blues?  Then you have come to the right place.  We’ll show you how you can leverage the official APIs for Google, Bing, and SHODAN within SearchDiggity, so that you can avoid having your scans blocked when performing Google Hacking assessments.


Blocked by Bot Detection

Many of you have written me, asking for help because SearchDiggity was randomly “pausing” and you didn’t know why.  You filled in your target domain names, you checked the boxes on the left, you clicked SCAN … and shortly thereafter you received an error message that looked something like this:

Scanning paused [8/28/2014 8:55:40 AM]. Google has detected bot activity.
Auto-resuming in 15 minutes.
1.SearchDiggity-Error-Paused_Due_to_Bot_Detection
SearchDiggity – Error – Paused Due to Bot Detection

Here’s a closer look:

2.SearchDiggity-Error-Paused_Due_to_Bot_Detection-Closer_Look
SearchDiggity – Error – Paused Due to Bot Detection - Closer Look 

What does “Scanning paused…Auto-resuming in 15 minutes” mean?

Google has detected that you are running a program (e.g., SearchDiggity) to perform automated Google searches, as opposed to a human browsing www.google.com manually typing in search queries.  When this happens, they block you for a short period of time (previously 14 minutes – hence the auto-resume time of 15 minutes).

Alternatively, Google lets you avoid waiting by presenting a CAPTCHA – giving you the opportunity to prove that you are, in fact, a human being… and not an automated computer program.  If you could see through the eyes of SearchDiggity as it interacts with Google, you would see something like this:

http://ipv4.google.com/sorry/
3.Google_presenting_CAPTCHA_upon_bot_detection
Google presenting CAPTCHA upon bot detection

For more information on this, check out:
Google.com - Unusual traffic from your computer

This causes SearchDiggity scans to pause for 15 minutes.  There are a number of ways you can configure SearchDiggity to avoid this type of Google bot detection, such as altering scan speed or leveraging open web proxies to spread your queries across.  However, this isn’t exactly easy for a novice user who simply wants to type in search criteria and click SCAN.

If you do want to explore these more complex options, check out SearchDiggity’s help file by going to Help -> Contents, as seen here:

4.SearchDiggity-HelpFile
SearchDiggity - Help Menu

Avoid the Hassle – Use the APIs

To completely avoid the hassle of constantly having your scans paused due to bot detection, you can configure SearchDiggity to use the official APIs provided by Google, Bing, and SHODAN instead of scraping.

Quick Overview of APIs

The below figure gives a high-level overview of the Google and Bing APIs used by SearchDiggity:

5.SearchDiggity-Use_of_Google_and_Bing_API-Overview
SearchDiggity – Use of Google and Bing API - Overview

The Bing Search API lets you pay more to get more queries per month.  It is also cheaper to get “web results only”.  See below for example of pricing tiers:

6.BingSearchAPI-Pricing
Bing Search API – Web Results Only - Pricing

Quick Links – Signing up for APIs

To use the official APIs, you’ll need to sign up and get yourself an API key for each service, i.e., Google, Bing, and SHODAN.  The following links will take you where you need to go to sign up:

Drawbacks of Using the API

There’s a few downsides to this approach:

  • Cost– Using the APIs cost money, while scraping is free.  However, the cost is extremely small and in my opinion worth it to avoid the aforementioned hassles.
  • Limited Results– the APIs limit the total number of results per query you can get when compared to the normal web browser interface.  Also, it seems that the result sets themselves are smaller in total when comparing the API results to the web browser results for Google for instance.

SearchDiggity – Filling in the API Keys

Most tools within SearchDiggity are automatically set to use scraping for both Google and Bing searches.  If you wish to turn this off and instead query Google and Bing through their APIs, it’s simple.  Within the individual settings space for each tool, check the Disable Scraper check box to reveal the input fields for the API to use instead.

For GoogleDiggity, fill in the API key as shown below:

7.SearchDiggity-GoogleAPIkey
GoogleDiggity – API Key Entry Location

For BingDiggity, fill in the API key as shown below:

8.SearchDiggity-BingAPIkey
BingDiggity – API Key Entry Location
 

For ShodanDiggity, fill in the API key as shown below:

9.SearchDiggity-SHODANAPIkey
ShodanDiggity – API Key Entry Location

Additional Info – SearchDiggity Help File

For more details on how to set up SearchDiggity to use the official APIs provided by Google, Bing, and SHODAN, please refer to SearchDiggity’s help file by going to Help -> Contents, as seen here:

10.SearchDiggity-HelpFile
SearchDiggity - Help File

There are several sections providing detailed guidance on API usage, which are highlighted in yellow on the left in the image below:

11.SearchDiggity-Help_File-API_Sections
SearchDiggity - Help File - API Sections

In Conclusion

Leveraging the official search engine APIs enables you to kick off your SearchDiggity scans and walk away worry free.  No need to keep checking back to make sure you weren’t blocked by bot detection.  No sleepless nights, worrying about whether or not your scan results will be waiting for you when you check your laptop in the morning.  By using the API, you can simply set it and forget it.

Subscribe to Bishop Fox's Security Blog

Be first to learn about latest tools, advisories, and findings.


Francis brown

About the author, Francis Brown

Co-Founder and Board Member

Francis Brown, CISA, CISSP, MCSE, is the Co-founder and Board Member of Bishop Fox. Before founding Bishop Fox, Francis served as an IT Security Specialist with the Global Risk Assessment team of Honeywell International where he performed network and application penetration testing, product security evaluations, incident response, and risk assessments of critical infrastructure. Prior to that, Francis was a consultant with the Ernst & Young Advanced Security Centers and conducted network, application, wireless, and remote access penetration tests for Fortune 500 clients.

Francis has presented his research at leading conferences such as Black Hat USA, DEF CON, RSA, InfoSec World, ToorCon, and HackCon and has been cited in numerous industry and academic publications. Francis holds a Bachelor of Science and Engineering from the University of Pennsylvania with a major in Computer Science and Engineering and a minor in Psychology. While at Penn, Francis taught operating system implementation, C programming, and participated in DARPA-funded research into advanced intrusion prevention system techniques.

More by Francis

This site uses cookies to provide you with a great user experience. By continuing to use our website, you consent to the use of cookies. To find out more about the cookies we use, please see our Privacy Policy.