From Code Families to Software Ecosystems: Documenting Relationships Between Tools and Other Resources
by savage | 2025-02-03
In Categorizing Software with Code Families, we described The Vertex Project’s method of creating code families to identify tools based on the same or highly similar, meaningful source code, and detailed how to represent that information in Synapse. However, while creating a code family allows us to define a specific tool based on identified code, we also need to represent additional context about that piece of software, such as how it relates to and leverages resources that are not a part of the defined code family. This additional context can facilitate investigation and response by helping us know what other files or resources to look for should we come across a tool in our network. Here, we’ll describe The Vertex Project’s methodology for capturing these relationships in Synapse, and how doing so documents additional context pertaining to our code family and its use in operations.
Code Families and Software Ecosystems
We generally think of our methodology for identifying and tracking software as having two main components: the code family and the software ecosystem. With code families, we can categorize software down to what we’ve determined to be shared, meaningful code. This means we can precisely define what any one code family consists of, as well as identify files that are samples of that code family. And because this is Synapse, we’re able to document that code family with a risk:tool:software
node, and then tag the code family samples with an associated tag.
The other key component of our methodology is the software ecosystem, which typically accompanies a code family. Analysts can create a software ecosystem to identify resources that are not part of the code family itself, but are related to it. While a code family consists of files containing executable code, a software ecosystem may contain related resources ranging from files to FQDNs. I can create a code family to define a backdoor, and an accompanying software ecosystem to identify lure files, command-and-control (C2) infrastructure, and execution behavior related to and used in conjunction with the backdoor. Creating a software ecosystem provides a way to document those relationships and associated resources, while still keeping those other indicators distinct from the code family itself.
In certain instances, analysts may opt to create a software suite to specifically document relationships between a code family and other files using tags. As an example, an analyst may want to create a tag tree to identify files that operate in conjunction with a particular code family. However, in the majority of cases, our methodology focuses on creating a code family and an accompanying software ecosystem.
Creating a Software Ecosystem for the Carrotstick Backdoor
For an example of a software ecosystem, let’s return to the example of our Carrotstick backdoor. We previously created a code family for Carrotstick, and subsequently identified the samples of that code family within Synapse by tagging the file:bytes
and related hash nodes with #cno.code.carrotstick
. However, while we were able to document the backdoor itself, our research into Carrotstick's use and functionality uncovered context related to its delivery, execution behavior, and infrastructure, which we wanted to capture as related to Carrotstick. We could do this through creating a corresponding Carrotstick software ecosystem.
Note
You can view this data in the Vertex Intel-Sharing Instance. New to the Intel-Sharing cortex? Learn about it and request access here.
Whereas we used #cno.code.carrotstick
to tag samples of the Carrotstick code family, we used #cno.mal.carrotstick
and #cno.rel.carrotstick
to tag nodes representing elements of the Carrotstick software ecosystem.

We applied the #cno.mal.carrotstick
tag to nodes representing malicious indicators associated with the Carrotstick code family ecosystem, such as:
The
file:bytes
and associated hash nodes of the Carrotstick code family (these are also tagged with#cno.code.carrotstick
);The
file:bytes
and associated hash nodes of the Javascript files that download the Carrotstick backdoor;The
it:exec:url
,it:exec:file:add
,it:exec:file:write
, andit:exec:proc
nodes documenting the Javascript files downloading the Carrotstick backdoor;The
inet:url
andinet:ipv4
nodes representing the URL and IP addresses used to host the Carrotstick backdoor, and which the backdoor communicated with for C2; and,The
file:filepath
nodes documenting a Carrotstick backdoor or Javascript file and the filepath where it was located.
In addition to encompassing the Carrotstick code family and related malicious resources, the Carrotstick ecosystem also includes resources that are related to Carrotstick but neither malicious nor unique to Carrotstick. We tagged these nodes with #cno.rel.carrotstick
to capture that we’ve seen them previously in relation to Carrotstick, although they are not malicious nor always indicative of the Carrotstick family. Whereas we might think of a #cno.mal.carrotstick
tag as a red flag warning us that the node to which it is applied is malicious and part of the Carrotstick ecosystem, the #cno.rel.carrotstick
tag is a yellow flag cautioning us to look for further context. Some examples of nodes that we have tagged with #cno.rel.carrotstick
include file:base
and file:path
nodes documenting filenames and filepaths we’ve seen used for Carrotstick samples.
Just as with creating threat clusters in Synapse, the Carrotstick software ecosystem should be contiguous in the graph, with the supporting evidence clearly documented. We can view the nodes tagged with #cno.mal.carrotstick
and #cno.rel.carrotstick
by lifting them by the corresponding tag, as shown here:

Code Families, Software Suites, and Software Ecosystems
In this and two previous blog posts, we described The Vertex Project analysts’ current approach to identifying software within Synapse. We took advantage of the flexibility Synapse offers through its data and analytical models to incorporate an approach involving granular identification of tools based on meaningful code overlaps, while allowing for some flexibility due to resource constraints. With our approach we’re not only able to create and document code families based on shared source code, but also track relationships between files and other resources through software ecosystems and suites. We can optionally continue to incorporate additional levels of abstraction, eventually working our way from examining specific files to categorizing them based on code, identifying their associated software ecosystems and relationships with other resources, to their use in campaigns, compromises, and other activity.