Leveraging DNS Suffix Data for Threat Clustering

by: reign | 2023-05-08


Cyber threat actors commonly use domains (e.g., baddomain.com) as attack infrastructure to facilitate targeting, exploitation, and malware command and control (C2). Therefore, threat intelligence analysts often need to determine domain boundaries in order to accurately identify attacker infrastructure, characterize threats, cluster related activity, and inform security detection and response actions.

Presently, organizations and applications across multiple disciplines leverage public suffix data to determine Domain Name System (DNS) zone designations, and facilitate secure cross-domain operations. Given the data’s effectiveness in establishing domain boundaries, this data can also be extended to the threat intelligence discipline to help identify and correlate related threat actor infrastructure.

In this blog post we will explore how threat intelligence analysts can utilize public DNS suffix data to help distinguish DNS domain boundaries, and support threat clustering workflows. At the conclusion of the blog we will share our internally curated public suffix dataset to augment your current DNS suffix data, and help facilitate DNS analysis workflows using Synapse.

The DNS Namespace

Before we dive into threat clustering using public DNS suffix data, let’s discuss some key DNS components and terminology.

The DNS Hierarchy

The Domain Name System is a globally distributed system used to translate fully qualified domain names (FQDNs) to Internet Protocol (IP) addresses, and facilitate the exchange of information regarding a domain. It is organized into a hierarchical structure called the DNS namespace, which contains all Internet registered domains, and their associated subdomains. The namespace is structured as an inverted tree composed of the Root level domain, Top-level domains, domains, and optional subdomains. For example, the FQDN ns1.baddomain.com. can be decomposed into the following domains/levels:

  • The Root domain (.) is the first level of the DNS hierarchy, and serves as the starting point for the entire DNS namespace.

  • Top-level domains (TLD) (e.g., com) are the top-most domains within the DNS namespace.

  • Domain names (e.g., baddomain) are domains that have been registered using an accredited Internet registrar.

  • Subdomains (e.g., ns1) are child domains of registered domain names, and are controlled by the parent domain’s registrant. These are created to help organize domain resources using an intuitive nomenclature.

This hierarchical composition is what allows the DNS to operate in a decentralized manner, and delegate administrative control of domains to the registering entities.

Public DNS Suffixes and Zones

A public DNS suffix (e.g., .us, and .dc.us) is a domain that allows domains to be registered and controlled by an unaffiliated person or organization. When a domain is registered with a registrar, a domain is created under an existing public suffix within the DNS namespace, and a new zone is established with the DNS.

At Vertex we define a zone as a domain one level below a suffix that is controlled and managed by a single entity. A zone represents a unit of delegation, and gives the domain registrant unilateral control over the resources and infrastructure required to operate the domain. It effectively establishes domain boundaries, and distinguishes domains that are connected hierarchically but unrelated administratively. Using this definition, the domain .dc.us is a zone, however .us is not.

The Public Suffix List

Currently, threat analysts use multiple data sources to determine domain boundaries (e.g., DNS SOA records), as there is no single system of record for all public suffix data. However, one data source that can be used to augment these data sources and inform infrastructure clustering decisions is the Public Suffix List (PSL).

The Public Suffix List is a Mozilla-initiated, and community-maintained list of all known and reported public and private DNS suffixes. The list was originally created to provide a programmatically accessible way for web browsers to determine domain boundaries, and enforce cross-domain security policies (e.g., restrict HTTP cookie use across non-related domains). However, increasingly vendors and applications that rely on domain ownership data to enforce their security policies have also adopted the PSL as an authoritative publix suffix data source.

Extending the PSL to Threat Intelligence

Similarly, the PSL can also be extended to the threat intelligence discipline and incorporated into analysis workflows designed to designate DNS zones and cluster related attacker infrastructure. Domains that threat actors use for cyber operations typically fall into one of the following categories:

  1. legitimately registered and owned,

  2. legitimately created under zones that they did not register or control (e.g., dynamic DNS domain),

  3. illegitimately created under zones that they did not register, but control (e.g., domain created under a compromised zone), or

  4. legitimate domains or zones that they have wholly compromised (e.g., DNS hijacking)

Identifying the domain category and boundary helps ensure that the domain is properly contextualized (e.g., malicious vs non-malicious), and that the contextualization is properly scoped. That is if a legitimate website has been hijacked as a part of a strategic web compromise (SWC), only the subdomain/portion of the site actually utilized by the actors should be deemed malicious, not the entire zone.

Leveraging the PSL for Threat Clustering Workflows

Let’s assume an analyst is providing threat intelligence support to an active Incident Response (IR). During the investigation, an incident responder observes malware C2 traffic between an internal DNS server and the domain recursive.dnsupdate.info, and identifies a spearphishing email from the email address admin@secure.dnsupdate.info.

In order to characterize the domains and recommend an appropriate course of action, the analyst needs to determine the domain boundaries for each subdomain and add the malicious domains to a new or existing threat cluster. One way to achieve this is by leveraging the PSL.

Below is a snippet from the PSL. Every non-empty uncommented line indicates a public suffix.

// United Gameserver GmbH : https://united-gameserver.de
// Submitted by Stefan Schwarz <sysadm@united-gameserver.de>
virtualuser.de
virtual-user.de

// Upli : https://upli.io
// Submitted by Lenny Bakkalian <lenny.bakkalian@gmail.com>
upli.io

// urown.net : https://urown.net
// Submitted by Hostmaster <hostmaster@urown.net>
urown.cloud
dnsupdate.info

Using what we know about publix suffixes and zones, we know that every domain created under a public suffix is a zone and administratively controlled by a separate entity. So given that the domain (dnsupdate.info) is listed in the PSL, the threat analyst would be able to determine that: 1. dnsupdate.info is a legitimate public suffix and zone, 2. dnsupdate.info is not inherently malicious, 3. recursive.dnsupdate.info and secure.dnsupdate.info are two distinct zones registered under dnsupdate.info, and 4. recursive.dnsupdate.info and secure.dnsupdate.info are not administratively controlled by the same entity that controls dnsupdate.info.

If dnsupdate.info is not a dynamic DNS (DDNS) domain, then the analyst can then leverage WHOIS registration data, DNS SOA records, and other technical artifacts collected from the customer’s environment to attempt to determine if the two subdomains are controlled by the same entities and affiliated with the same cluster of activity.

PSL Caveat

While the PSL is a great source for public and private DNS suffix data, it is not exhaustive and therefore should not be used as your sole source of suffix intelligence. At the time of this writing, amending the PSL, requires Top-level domain registries and private domain owners that support free subdomain registration to submit a Git pull request to have their revisions approved and merged. Given that not all suffix registries and domain owners submit their domains to the PSL, not all public suffixes are represented in the data set. Additionally, there are numerous second-level domains that are not technically public suffixes, but effectively operate like suffixes in that they allow unaffiliated entities to establish and control subdomains (e.g., wordpress.com and tumbler.com).

To represent and account for DNS suffixes identified by your organization, you can set the FQDN node's inet:fqdn:issuffix secondary property value to True as shown below for the wordpress.com inet:fqdn node.

_images/wordpress.png

Don’t Suffix in Silence … Use Synapse

To ingest the PSL data into your Cortex, you can either use the Synapse-PSL Power-Up or Storm code as shown in the Using Storm Code section below.

Using the Synapse-PSL Power-Up

The Synapse-PSL Power-Up allows Synapse Enterprise users to ingest the PSL and optionally customize how the ingested data are tagged and represented in the system. By default, the Power-Up's psl.ingest command downloads the Public Suffix List from publicsuffix.org, creates inet:fqdn nodes with :issuffix=True secondary property values for each listed suffix, and subsequently applies the tag ``#rep.psl.suffix"`.

_images/psl_ingest.gif

If you prefer not to have the suffixes tagged or want to use a custom tag, you can use the psl.ingest --no-tag command option, or the psl.setup.tag command respectively.

Using Storm Code

Synapse Community users who do not have access to the Synapse-PSL Power-Up, can use the Storm code below (or a variation thereof) to ingest the Public Suffix List. This code effectively operates in the same manner as the Synapse-PSL psl.ingest command, as it:

  1. downloads the PSL from https://publicsuffix.org/list/public_suffix_list.dat,

  2. reads in each line in the file, skipping any line starting with “!”, “//”, or “*”,

  3. creates an inet:fqdn for each listed FQDN, sets its :issuffix property value to True, and

  4. applies the tag #rep.psl.suffix with a timestamp set to the current time (i.e., now).

// download PSL
$reply = $lib.axon.wget("https://publicsuffix.org/list/public_suffix_list.dat")

// read each line of PSL and skip comments
for $line in $lib.axon.readlines($reply.hashes.sha256) {
    $line = $line.strip()
    if (not $line
        or $line.startswith("!")
        or $line.startswith("//")
        or $line.startswith("*")) { continue }

    // create inet:fqdn nodes
    [ inet:fqdn=$line :issuffix=$lib.true +#rep.psl.suffix=now ]
}

DNS Suffix Context Is Key

As threat intelligence analysts we often utilize multiple data sources to enrich and contextualize FQDN indicators. The PSL is one of many data sources that can be leveraged to contextualize public and private DNS suffix data. While not comprehensive, when used in concert with other third-party and internal DNS suffix intelligence, it is a great way to augment existing indicator context, and effectively analyze and correlate threat data in Synapse!