What is a Threat Cluster?

by thesilence | 2024-04-16

Much of cyber threat intelligence (CTI) is dedicated to tracking threat groups. Attackers are people, and people have habits in the way they do things. If we can identify specific attackers, we can better understand their motives and anticipate their actions based on past behavior.

Public reporting is filled with discussions of "Lazarus" or "Volt Typhoon" or the latest ransomware group. We take it for granted and accept as fact that these names refer to particular sets of threat actors, collectively responsible for specific activity. And yet - once upon a time, there was no "APT28" ("Advanced Persistent Threat 28"), much less "APT 28 is Russian GRU Unit 26165". Similarly, at one time we didn't know that "Conti" was a new criminal group. So how did we get here?

Threat Clusters

In CTI, every threat group starts out as a threat cluster. A threat cluster is a set of related malicious activity presumed to be carried out by an individual or group of individuals acting in concert.

Network defenders (and CTI analysts) work from the bottom up - our starting point is typically the evidence of an attack, whether a phishing email, exploit attempt, new malware sample, or security incident. We know someone is responsible for the activity, but rarely have enough information to know who when we start our investigation.

Instead, we use the set of indicators associated with the activity as a stand-in for the unknown group responsible. The threat cluster is represented by the indicators or other evidence, and is commonly given a name or designator for easy reference.

Creating a threat cluster (sometimes called an "UNC" or "unknown" group, "temp" group, "dev" group, or similar) is a way to identify and group any activity that we want to note and track going forward, no matter how small or incidental. As Visi Stark famously said, as analysts we should be "handing out UNCs like candy"! We may proactively research our threat cluster to try and expand the set of related indicators, or we can simply create the threat cluster just in case we encounter any related activity in the future.

It is important to remember that threat clusters are preliminary - we don't have enough information to say much about the cluster yet. Creating the cluster still allows us to flag the activity, even if we don't know what (or more specifically who) it is.

Tracking Threat Clusters in Synapse

To create a threat cluster in Synapse, we first need to model the activity. Note that there is no minimum size - a cluster can consist of a single node if that node is important for us to track. For our example, we'll use a public blog by ESET on activity by "Evasive Panda", but the starting point for a threat cluster can just as easily be something seen on your own network. (In fact, let's assume one of our internal users fell victim to one of the watering hole attacks described by ESET.) We've added the relevant indicators from the incident to Synapse and enriched them with various Power-Ups.

Once the data (nodes) are in Synapse, we need to tag them to indicate they are part of our threat cluster - which means we need a naming convention for our clusters (and associated tags). Whatever convention you choose, we recommend using names that are:

Generic. Some naming conventions are meant to reflect a threat group's country of origin or goals/purpose (e.g., "espionage" vs. "criminal"). We don't know anything about our new cluster, so we want to avoid encoding any assumptions in the name.
Easy to generate. If we're constantly creating new threat clusters for observed activity, we don't want to agonize over choosing names - especially as many names will simply be discarded as clusters link and merge over time.

For our example we'll use a generic numbering system - T (for "threat cluster") followed by the next number in our sequence (there are pros and cons to this approach, but it works for illustrative purposes). Let's say our next cluster number is T936.

When we create our cluster, we want to note the source or seed for the cluster. In other words, what indicator (or small set of indicators) led us to create this cluster in the first place? Noting the seed will help our future selves - if we can't recall where a cluster came from, or need to recall why we later attributed additional activity to the same cluster, knowing our origin helps us retrace our analytical steps.

We'll use the initial file downloaded by the watering hole attack as the seed for our example. Based on Vertex's tag conventions, we'll tag the file:bytes node with #cno.threat.t936.own.seed:

We can then tag additional indicators associated with the watering hole attack (such as the URL the file was downloaded from, the URL of the malicious script that triggered the attack, and the associated FQDNs) with #cno.threat.t936.own:

Finally, we can create a threat cluster node (risk:threat) to record additional information about our cluster. We'll use the gen.risk.threat command so we can easily set additional properties in the Research Tool:

Our threat cluster node (risk:threat) is linked to the associated indicators via the risk:threat:tag property, which lists the base tag (cno.threat.t936) that we used to annotate related nodes. We can easily navigate from our threat cluster to the associated indicators:

risk:threat:org:name=t936 :tag -> syn:tag -> *

...or from our indicators to our threat cluster:

#cno.threat.t936 -> # | uniq | -> risk:threat:tag

Now we've created our threat cluster and we can move on! Right?

Not So Fast...

There are two additional points to keep in mind when creating and working with threat clusters. The first is that a threat cluster should actually be a cluster. If a threat cluster represents "related malicious activity", then the nodes we associate with the cluster should actually be related - that is, if the nodes are viewed as a graph, they should be contiguous (connected).

Unfortunately, if we lift our tagged nodes in Synapse and view them in Force Graph mode, our nodes are not connected:

(The nodes in purple are tagged as part of T936; the nodes in gold were reported by ESET. The nodes in gray are a subset of additional linked nodes shown by the Force Graph display algorithm.)

Based on the nodes we've tagged so far, the graph does not show how our malicious file is related to either the watering hole site or the download URL. Imagine your future self (or your current coworker!) looking at this data - it is not at all clear why someone chose to associate these (apparently unrelated) indicators to the same threat.

If the nodes associated with our threat cluster are not connected, it means we have one or more problems with our cluster:

We have not modeled all of the relevant activity. Synapse is missing some evidence (additional nodes) that would illustrate these relationships and link our cluster.
We have not tagged all of the nodes that should make up the cluster. The nodes exist, but we failed to associate them with our cluster.
There is a gap in Synapse's data model. We do not have the ability to represent something that would link the nodes and need to extend the data model.
We have mis-attributed some activity to this cluster. Our disconnected nodes are not actually connected at all, and represent unrelated sets of activity.

To ensure that our cluster is connected, we'll tag some additional nodes related to the watering hole attack (such as the inet:urlfile node showing that our malicious file:bytes node was hosted on the threat cluster's download URL). Once we've tagged the additional nodes in the attack chain, our threat cluster is fully connected:

In reviewing our threat cluster, there may be additional related nodes that we can tag to expand the cluster. These nodes may include the hashes associated with the file:bytes nodes, the FQDN devicebug.com (which we assess is attacker-controlled), and so on:

Tip

If, when you tag a node, there are additional nodes that you always want to tag (for example, when you tag a file:bytes node, always tag the associated file hashes), you can use triggers to automate this process.

The second point to keep in mind when working with threat clusters relates to a secondary use for our threat cluster tags. The tag #cno.threat.t936.own is used to identify and group all of the nodes we associate with our threat cluster or that provide evidence of our cluster's activity. But the tag also provides context for individual nodes. If a coworker queries Synapse for the FQDN kagyumonlam.org and sees the #cno.threat.t936.own tag, they will know the FQDN is associated with cluster T936.

There is a small problem though - that FQDN is a legitimate web site that T936 compromised. The site (specifically, the modified jquery.js file hosted on the site) was malicious while the watering hole attack was active. But T936 does not own the FQDN or the website, and the site is otherwise legitimate. We don't want our coworker (or our customer, or our detection system) mistaking kagyumonlam.org for a malicious site. How can we show that the site is associated with T936 but not inherently bad?

Vertex uses tags to distinguish between something a threat cluster "owns" (controls or uses exclusively) and something they "use". "Use" can refer to:

a legitimate resource (like the compromised website) that the cluster takes advantage of; or
a generic resource that our threat cluster uses (and that is not necessarily malicious) and which may also be used by other clusters (like the freely available version of ADFind, or a legitimate executable used to sideload a malicious DLL).

To make this clear to anyone viewing our threat cluster data, we'll update our tags to distinguish between components T936 owns (the malicious JavaScript and executable files, the FQDN devicebug.com and its subdomains) and things they use (the compromised website, URL, and FQDN; the file names associated with the malicious files). We can still see all of the nodes associated with our cluster by lifting the tag #cno.threat.t936, but can distinguish between nodes the cluster owns and uses with the respective sub-tags.

The image below shows our cluster T936 nodes updated with "use" tags. The original "own" tags are displayed in purple; the "use" tags are displayed in a lighter purple-pink:

After analyzing and tagging additional nodes, we can use Force Graph mode to compare our threat cluster (T936) with ESET's "Evasive Panda" reporting:

Our cluster T936 (in purple and light purple) is the large cluster in the center. Nodes reported by ESET as "Evasive Panda" are in gold and include partial overlaps with our cluster, as well as several additional IOCs around the outside of the graph. Note that, given the data available to us, we are currently unable to link T936 to ESET's additional reporting.

The Threat Cluster Lifecycle

Once created, we let our threat cluster sit and see what happens! In some cases, we may never encounter that cluster again, or learn anything more about it. In most cases though, threat clusters will grow organically over time. We may proactively expand a cluster by researching it. Or, we may encounter additional activity that ties in with our existing cluster.

As a cluster grows, it may eventually touch or overlap with another set of activity (another cluster, or a known threat group). The images below show another threat cluster (T937, in aqua / light aqua) that we created based on the second watering hole attack from ESET's report (which used the web sites monlamit.com and tibetnews.net). T937 has some overlap with another cluster (T942, in orange / light orange) that we created from a malicious ZIP archive we identified on VirusTotal.

We can also view the nodes that overlap (that are tagged as owned or used by both clusters) in Tabular mode:

When this happens, we need to review our data! For example:

Do the two clusters represent a single set of activity that we are finally able to tie together? We may need to merge the two into a single cluster (or merge our cluster into an existing threat group).
Is the overlap an error? Did we mis-attribute some activity or mis-tag some data? We may need to untangle our analysis to make sure it is accurate.

Finally, a cluster may eventually graduate and be promoted to a threat group. This process can take time (even years!), depending on how frequently we are able to observe (or collect) information about the group. A cluster is promoted when:

It has remained reasonably self-contained over time. We may continue to link new activity to the cluster, but the cluster does not overlap with other known activity.
We have collected sufficient information about the cluster to have a robust picture of its operations, motivations, and targets - that is, we have a detailed understanding of the group's activity (even if we do not have an attributed identity for the group).

From Cluster to Group (and to Org)

In Synapse, promoting a threat cluster to a threat group involves creating an organization (ou:org node) to represent the threat group. The new group (ou:org) and our original cluster are linked via the cluster's risk:threat:org property, and the risk:threat:tag value is still the tag we will use to annotate nodes associated with our group. (Although at this point we may rename our new threat group - T936 may finally become "Sparkling Unicorn" - and update the :tag property and all tagged nodes accordingly.)

By creating an organization to represent our threat group, we can use additional parts of Synapse's data model to capture more information about the group, including information that will support more strategic analysis. For example:

Use ou:goals and ou:campaigns to represent the group's objectives and efforts to meet those objectives.
Use ou:vitals to capture our assessments of the group's size, staffing, budget, revenue, etc.
Use ps:contact and / or ou:position nodes to represent any members of the group (personas or known individuals) and their roles.
Use the econ:* or biz:* portions of the model to record transactions made by the group, such as the purchase of infrastructure or the sale of tools, data, or access.
Link the group more closely to attributed activity such as attacks, compromises, or campaigns. (We attribute activity to a threat cluster by applying the associated tag, which can easily be removed or changed while the cluster is still temporary. We attribute activity to a threat group by setting a property - such as risk:compromise:attacker - to the threat group organization's main ps:contact value (the ou:org:hq value).)

Conclusion

Threat clusters are a way to identify, group, and track malicious activity. Synapse's data model and tag structure make it easy for us to create, track, merge, and - over time - promote threat clusters to threat groups.

Threat clusters are not limited to only those threats reported by CTI organizations or security companies. We can (and should!) create and use clusters to track activity we observe within our own network or organization. Because Synapse gives us the flexibility to track both our reporting and other reporting, we can also easily compare and contrast our clusters and assessments with those of other organizations. More importantly, creating and tracking clusters allows us to link and expand our understanding of threats over time - information we can use to improve our defensive, detection, and response capabilities.

Download the .nodes file of data associated with this blog to examine the data in your instance of Synapse.

Tip

You can request a demo instance here. (Be sure to fork a view before importing the data!)

To learn more about Synapse, join our Slack Community, check out our videos on YouTube, and follow us on Twitter.

What is a Threat Cluster?

Threat Clusters

Tracking Threat Clusters in Synapse

Not So Fast...

The Threat Cluster Lifecycle

From Cluster to Group (and to Org)

Conclusion

Learn More

Connect

Get Started