Htcap - web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes

Friday, April 22, 2016

htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes.

Htcap is not just another vulnerability scanner since it's focused mainly on the crawling process and uses external tools to discover vulnerabilities. It's designed to be a tool for both manual and automated penetration test of modern web applications.

The scan process is divided in two parts, first htcap crawls the target and collects as many requests as possible (urls, forms, ajax ecc..) and saves them to a sql-lite database. When the crawling is done it is possible to launch several security scanners against the saved requests and save the scan results to the same database.

When the database is populated (at least with crawing data), it's possible to explore it with ready-available tools such as sqlite3 or DBEaver or export the results in various formats using the built-in scripts.

Let's assume that we have to perform a penetration test against target.local, first we crawl the site:
$ htcap/ crawl target.local target.db
Once the crawl is done, the database (target.db) will contain all the requests discovered by the crawler. To explore/export the database we can use the built-in scripts or ready available tools.
For example, to list all discovered ajax calls we can use a single shell command:
$ echo "SELECT method,url,data FROM request WHERE type = 'xhr';" | sqlite3 target.db
Now that the site is crawled it's possible to launch several vulnerability scanners against the requests saved to the database. A scanner is an external program that can fuzz requests to spot security flaws.
Htcap uses a modular architecture to handle different scanners and execute them in a multi-threaded environment. For example we can run ten parallel instances of sqlmap against saved ajax requests with the following command:
$ htcap/ scan -r xhr -n 10 sqlmap target.db
Htcap comes with sqlmap and arachni modules built-in.
Sqlmap is used to discover SQL-Injection vulnerabilities and arachni is used to discover XSS, XXE, Code Executions, File Inclusions ecc.
Since scanner modules extend the BaseScanner class, they can be easly created or modified (see the section "Writing Scanner Modules" of this manual).
Htcap comes with several standalone scripts to export the crawl and scan results.
For example we can generate an interactive report containing the relevant informations about website/webapp with the command below.
Relevant informations will include, for example, the list of all pages that trigger ajax calls or websockets and the ones that contain vulnerabilities.
$ htcap/scripts/ target.db target.html
To scan a target with a single command use/modify the script.
$ htcap/scripts/ https://target.local


  1. Python 2.7
  2. PhantomJS v2
  3. Sqlmap (for sqlmap scanner module)
  4. Arachni (for arachni scanner module)

Download and Run
$ git clone htcap
$ htcap/
Alternatively you can download the latest zip here .
PhantomJs can be downloaded here . It comes as a self-contained executable with all libraries linked statically, so there is no need to install or compile anything else.
Htcap will search for phantomjs executable in the locations listed below and in the paths listed in $PATH environment varailbe:
  1. ./
  2. /usr/bin/
  3. /usr/local/bin/
  4. /usr/share/bin/
To install htcap system-wide:
# mv htcap /usr/share/
# ln -s /usr/share/htcap/ /usr/local/bin/htcap
# ln -s /usr/share/htcap/scripts/ /usr/local/bin/htcap_report
# ln -s /usr/share/htcap/scripts/ /usr/local/bin/htcapquick

You can find an online demo of the html report here and a screenshot of the database view here
You can also explore the test pages here to see from what the report has been generated. They also include a page to test ajax recursion .

In order to read the database it's possible to use the built-in scripts or any ready-available
sqlite3 client.

Generate the html report. (demo report available here )
$ htcap/scripts/ target.db target.html
List all pages that trigger ajax requests:
$ htcap/scripts/ target.db
    Request ID: 6
    Page URL:   http://target.local/dashboard
    Referer:    http://target.local/
    Ajax requests:
      [BUTTON txt=Statistics].click() -> GET http://target.local/api/get_stats
List all discovered SQL-Injection vulnerabilities:
$ htcap/scripts/ target.db "type='sqli'"
    C O M M A N D
    python /usr/local/bin/sqlmap --batch -u http://target.local/api/[...]

    D E T A I L S
    Parameter: name (POST)
        Type: error-based
        Title: PostgreSQL AND error-based - WHERE or HAVING clause
        Payload: id=1' AND 4163=CAST [...]

Search for login forms
SELECT referer, method, url, data FROM request WHERE type='form' AND (url LIKE '%login%' OR data LIKE '%password%')
Search inside the pages html
SELECT url FROM request WHERE html LIKE '%upload%' COLLATE NOCASE

Htcap features an algorithm able to crawl ajax-based pages in a recursive manner.
The algorithm works by capturing ajax calls, mapping DOM changes to them and repeat the process recursively against the newly added elements.
When a page is loaded htcap starts by triggering all events and filling input values in the aim to trigger ajax calls. When an ajax call is detected, htcap waits until it is completed and the relative callback is called; if, after that, the DOM is modified, htcap runs the same algorithm against the added elements and repeats it until all the ajax calls have been fired.
|                 |
|load page content|
|  interact with  |
|  new content    |<-----------------------------------------+
'--------,--------'                                          |
         |                                                   |
         |                                                   |
         |                                                   | YES
   ______V______             ________________          ______l_____
 /     AJAX      \ YES      |                |       /    CONTENT   \
{    TRIGGERED?   }-------->|   wait ajax    |----->{    MODIFIED?   }
 \ ______ ______ /          '----------------'       \ ______ _____ /
         | NO                                                | NO 
         |                                                   |
         |                                                   |
 ________V________                                           |
|                 |                                          |
|     return      |<-----------------------------------------+

$ htcap crawl -h
usage: htcap [options] url outfile
  -h               this help
  -w               overwrite output file
  -q               do not display progress informations
  -m MODE          set crawl mode:
                      - passive: do not intract with the page
                      - active: trigger events
                      - aggressive: also fill input values and crawl forms (default)
  -s SCOPE         set crawl scope
                      - domain: limit crawling to current domain (default)
                      - directory: limit crawling to current directory (and subdirecotries) 
                      - url: do not crawl, just analyze a single page
  -D               maximum crawl depth (default: 100)
  -P               maximum crawl depth for consecutive forms (default: 10)
  -F               even if in aggressive mode, do not crawl forms
  -H               save HTML generated by the page
  -d DOMAINS       comma separated list of allowed domains (ex *
  -c COOKIES       cookies as json or name=value pairs separaded by semicolon
  -C COOKIE_FILE   path to file containing COOKIES 
  -r REFERER       set initial referer
  -x EXCLUDED      comma separated list of urls to exclude (regex) - ie logout urls
  -p PROXY         proxy string protocol:host:port -  protocol can be 'http' or 'socks5'
  -n THREADS       number of parallel threads (default: 10)
  -A CREDENTIALS   username and password used for HTTP authentication separated by a colon
  -U USERAGENT     set user agent
  -t TIMEOUT       maximum seconds spent to analyze a page (default 300)
  -S               skip initial url check
  -G               group query_string parameters with the same name ('[]' ending excluded)
  -N               don't normalize URL path (keep ../../)
  -R               maximum number of redirects to follow (default 10)
  -I               ignore robots.txt

Subscribe via e-mail for updates!