Preserving Analysts’ Sanity by Automating Sinkhole Monitoring

by a vertex analyst | 2020/06/20

One of our (many) ongoing research tasks here at The Vertex Project involves keeping abreast of newly sinkholed domains. But rather than requiring analysts to manually re-query known sinkhole IPs and then ingest and tag recently sinkholed domains, we’ve automated the process, thereby freeing up the analysts to go geek out on more complex tasks (returning them to their natural habitat, as it were). We’ll share our process here, but keep in mind that we’ve simplified things a bit for the sake of this example.

Part 1: Model the Sinkhole

Let’s say we have identified an IP address that a security company (in this case, Arbor Networks) has been using to sinkhole malicious domains. Navigating to the URL in a web browser brings up static HTML content that identifies it as Arbor Networks’ dedicated sinkhole.


We’ll want to use the Storm wget command to download this page and capture it as a file:bytes node in our cortex, before we model and tag the inet:ipv4, inet:url, and inet:urlfile nodes to show their affiliation with Arbor Networks’ sinkhole infrastructure. We can do this with the following query:

[ inet:url= +#cno.infra.sink.hole.arbornet ]
| wget | -> inet:urlfile
[ +#cno.infra.sink.hole.arbornet=.seen ]
-> file:bytes
[ +#cno.infra.sink.hole.arbornet ]

For the tag on the inet:ipv4 node, we’ll want to include a date range of when Arbor Networks was using the IP as a sinkhole. Resources like domainIQ and RiskIQ’s PassiveTotal can help us to determine the earliest date of use, while we’ll put the current date as the end of the time frame since we just confirmed that the sinkhole is still active. When we’ve finished creating and tagging our nodes, they will look something like this:

        .created = 2018/05/21 23:19:43.087
        :asn = 33070
        :latlong = 29.4963,-98.4004
        :loc = us.tx.san antonio
        :type = unicast
        #cno.infra.sink.hole.arbornet = (2017/01/04 00:00:00.000, 2020/05/28 15:01:44.259)
        .created = 2018/08/28 01:54:25.399
        .seen = ('2018/09/13 22:23:43.885', '2020/05/28 15:01:44.259')
        :base =
        :ipv4 =
        :params =
        :path = /
        :port = 80
        :proto = http
inet:urlfile=('', 'sha256:f087aaf98f2e4cdccab1993...')
        .created = 2019/08/17 02:00:07.487
        .seen = ('2019/08/17 02:00:01.873', '2020/05/28 15:01:44.259')
        :file = sha256:f087aaf98f2e4cdccab1993c2301803b0ecfe1bf90633e1121d0a5038b163ce1
        :url =
complete. 3 nodes in 31 ms (161/sec).

Now that we know that Arbor Networks is using the IP as a sinkhole, we’ll want to ingest and tag the captured domains, which we can do by issuing a passive DNS (pDNS) query and filtering the results to include domains that resolved to the IP during the time that Arbor Networks has been using it as a sinkhole. We’ll also want to keep tabs on this IP going forward by checking back intermittently to make sure the company is still using the IP (let’s pretend that doing so is as simple as checking for the “This IP is a sinkhole” HTML message), updating the timestamp in our tag, and ingesting and adding newly sinkholed domains.

But we don’t want to spend our time manually checking on this IP and re-querying pDNS every week or so – we’ve got other things to do. Therefore, we’ll automate these tasks as a weekly cron job. We’ve broken the query out over Parts 2 and 3 for readability:

Part 2: Verify that the IP is still a Sinkhole

The first thing we’ll want our cron job to do is check whether Arbor Networks is still using the IP that we’ve identified as a sinkhole. If so, we’ll want to update the timestamp on the inet:ipv4 node to reflect the continued use. The part of the query responsible for this task will do the following:

Take the tagged sinkhole inet:url as input and retrieve the current web page (note that this query requires that the inet:url contain an IP address rather than a domain)

Check whether any changes have been made to the HTML content by using wget to download the current web page before pivoting to the inet:urlfile node. Since we downloaded the web page when we first identified the IP as a sinkhole, the inet:urlfile node is already modeled and tagged with #cno.infra.sink.html.arbornet. Using wget to download the page again will only update the .seen property of the node if the web page is still hosting the same content (remember that we’re simplifying things here). If the content of the page has changed, then using wget will result in the creation of a new inet:urlfile node with a new file hash in the :file property. This new node will not have the #cno.infra.sink.html.arbornet tag.

Set the .seen property as the variable $seen, then pivot back to the IP and use $seen to update the timestamp on the sinkhole tag. If the IP is still an active sinkhole, the timestamp will update to show the current date. Otherwise, it will remain unchanged.

The first part of the query will look something like this:

/* Lift all inet:url nodes tagged as arbornet sinkholes */

/* Retrieve the URL and update .seen time on inet:urlfile */
| wget |

/* Pivot to inet:urlfile which we’ve previously tagged with #cno.infra.sink.html.arbornet */
-> inet:urlfile +#cno.infra.sink.html.arbornet

 * Update sinkhole IP with times for sinkhole HTML (i.e., if HTML is still same sinkhole
 * message page, update IP timestamps to current date)
-> inet:url -> inet:ipv4
[ +#cno.infra.sink.hole.arbornet=$seen ]

Part 3: Query PDNS & Tag Recently Sinkholed Domains

Once we’ve verified that Arbor Networks is still using the IP as a sinkhole and have updated the timestamp of the IP to show the current date, we’ll want to identify and tag any domains that the company has sinkholed since we initially identified its use of the IP. We can do this in the second part of the cron job by querying RiskIQ’s passive DNS (pt.pdns) for the IP and filtering for domains that have resolved there while Arbor Networks has been using it as a sinkhole. Any domains that began resolving to Arbor Networks’ IP in the last seven days (since we launch our cron job once a week) will receive a tag that includes a timestamp showing when Arbor Networks sinkholed the domain.

 * If the IP is still a sinkhole and the timestamp includes the current
 * date, query pt.pdns for the sinkhole IPv4
+#cno.infra.sink.hole.arbornet@=(‘-5 min’, now) | pt.pdns |

/* Pivot to inet:dns:a records of domains seen resolving to the inet:ipv4 in the last week */
-> inet:dns:a +.seen@=(“-7 days”, now)

/* Save the inet:dns:a node’s .seen property as the variable $seen */

 * Pivot from the inet:dns:a nodes to the inet:fqdn nodes, tagging the domains
 * as sinkholed with the timestamp matching the time period when they resolved
 * to Arbor Networks’ IP
-> inet:fqdn [+#cno.infra.sink.hole.arbornet=$seen]

While we’ll stop here for the sake of keeping this example relatively simple, it’s worth noting that there are opportunities for further automation. Given that Arbor Networks almost certainly sinkholed these domains due to their inclusion in malicious activities, we could write automation designed to enrich the tagged domains with data from resources like domainIQ, Rapid7, and VirusTotal, among others. Pivoting off this data can not only give us a view into how the threat actors had used the sinkholed domains, but may also allow us to identify additional infrastructure that is still in use.

Part 4: Create the Cron Job to Execute the Query

Lastly, we’ll want to create the cron job that will run our query once a week. We can do this by executing the command below, which will schedule our query (with comments removed for readability) to run every Saturday at 01:30 am:

cron.add --day Sat --hour 1 --minute 30 { inet:url#cno.infra.sink.hole.arbornet
| wget |
-> inet:urlfile +#cno.infra.sink.html.arbornet $seen=.seen -> inet:url -> inet:ipv4
[ +#cno.infra.sink.hole.arbornet=$seen ] +#cno.infra.sink.hole.arbornet@=(‘-5 min’, now)
| pt.pdns |
-> inet:dns:a +.seen@=(“-7 days”, now) $seen=.seen -> inet:fqdn
[+#cno.infra.sink.hole.arbornet=$seen] }

Save an Analyst: Automate the Boring Stuff

While there are many benefits to incorporating automation, such as the ability to complete tasks at machine speed and in a consistent manner, the most important by far is that doing so will help keep your analysts sane. No matter what type of research you are engaged in, there are almost certainly some time-consuming and tedious, but necessary tasks. Why not automate those so that your analysts can focus on the more complex and (let’s face it) interesting work? Keep your analysts challenged and happy: automate the boring stuff.