Using Spotlight Extractors for Arbitrary Data
by thesilence | 2023-11-20
Spotlight Tool Overview
Synapse's Spotlight Tool simplifies the process of extracting analytically relevant information from prose reports. With Spotlight, users can load a PDF document or have Spotlight retrieve content from a URL and convert it to PDF format. Users can then review and process the report's content in Spotlight.
Using Synapse's extensible scrape library, Spotlight automatically recognizes and creates many common indicators of compromise (IOCs), such as hashes, IPv4 addresses, and domains (to name a few). Power-Ups may extend these capabilities; for example, if the Synapse-MITRE-ATTACK Power-Up is installed, Spotlight can recognize references to MITRE ATT&CK elements such as techniques or groups and create the corresponding nodes.
Anything extracted via Spotlight is automatically linked to the document's media:news
node using a -(refs)>
(for "references") light edge. This makes it easy to identify all of the nodes referenced in a given report, or to take any node and find all of the reports that reference it.
Capturing Additional Data
If you want to extract (and link) information beyond what Spotlight can automatically identify, you can highlight the relevant text in the report and tell Spotlight what kind of node to create using the right-click context menu.
Many common forms such as file names (file:base
) and threat group names (ou:name
) are available from this Quick Forms menu by default. You can also add your own commonly used forms by modifying your Spotlight preferences, or use the other menu option to specify a form on the fly. (The other option is also useful if you need to edit your highlighted text - such as to remove "defanging" characters, spaces, or line breaks - before creating a node).
Using Spotlight Extractors
The ability to highlight text to create nodes works well when the text is the exact value (primary property) of the node you want to create. Unfortunately you can't use this method to create composite nodes or guid nodes. It may be perfectly clear to you that when you highlight "CVE-2023-36932" you want to create a risk:vuln
guid node and set the :cve
property to that value, but this process is less clear to Synapse. (There are many cases where Synapse can help to "do what you mean", but Synapse can't read your mind.)
For these situations you can create Spotlight Extractors - small bits of custom Storm that tell Spotlight how to create a particular node from your highlighted text. (Note that Spotlight Extractors differ from the Spotlight Table Extractor which helps you create nodes from tables within a report.)
You can create and manage Extractors (and any custom forms you add to your Quick Forms menu) from the Your Settings dialog, which you access from the Preferences option of Spotlight's main hamburger menu (or from the Your Settings section of the Optic Top Bar).
Creating Guid Nodes
Extractors are particularly useful for creating guid nodes from highlighted text. Spotlight's text highlighting makes it easy for you to create and link the names of objects, such as an ou:name
node for a threat cluster or company name. It is harder to create and link the "thing itself" (the risk:threat
or ou:org
referenced by the ou:name
) when the "thing" is a guid node. There are many cases where you want a closer tie between the report and the object vs. the name of the object. A Storm query such as:
media:news:title="yep, more bad stuff happening" -(refs)> risk:threat
...is easier and more precise than:
media:news:title="yep, more bad stuff happening" -(refs)> ou:name -> risk:threat
The second query may return multiple threats (risk:threat
nodes) from various reporting organizations that all use the same name for the threat they report on.
Spotlight defines some variables that you can use with your extractor code:
$raw
: the highlighted text "as is";
$text
: the "clean" version of the highlighted text (e.g., leading/trailing whitespace removed, line breaks replaced with spaces, etc.); and
$news
is the guid of the report'smedia:news
node.
When creating guid-based forms, you can take advantage of Storm's gen.* commands to deconflict the node you create based on its secondary properties. The following extractor creates a risk:threat
node from a highlighted threat cluster name (ou:name
). Note that the gen.risk.threat command deconflicts risk:threat
nodes using both the threat name and the reporter name. The extractor obtains the reporter name from the media:news
node's :publisher:name
value, so this value needs to be set (e.g., through Spotlight's Document details menu) for the extractor to work (it's a good reminder to add things like the publisher, publication date, and title).
media:news=$news
$reporter=:publisher:name
yield { gen.risk.threat $text $reporter }
-media:news
Note that a Spotlight extractor must generate exactly one node; in the extractor above, we need to drop (filter out) the media:news
node that we lifted in order to retrieve the :publisher:name
property (to use with the gen.risk.threat
command). Once we drop the media:news
node, the Storm will return only our newly generated risk:threat
.
You add the extractor to your Spotlight preferences in the Your Settings dialog:
Your custom extractor is now available from your context menu and you can easily create the threat cluster:
Once you've created the risk:threat
node, you can view the node details in Spotlight, query it in the Research Tool, or view the node in the Vertex-Threat-Intel Power-Up Workflow to fill in additional properties if necessary:
Notice that the threat cluster is also linked to the associated Spotlight report:
Creating Composite Nodes
You can also use extractors to create composite (comp) nodes. As an example, a report may include email header information (such as subject lines) for recent spear phishing campaigns. You want to capture the headers to look for related activity. An inet:email:header
node is a composite node made from the header name (such as Reply-To) and the header's value. The following extractor tells Spotlight how to create an inet:email:header
node for an email subject from the highlighted text:
[ inet:email:header=(subject,$text) ]
By highlighting the text of the Subject line, you can easily create the inet:email:header
node:
You can create extractors to account for more complex use cases (like creating arbitrary email headers) or the need to "clean up" text. The following extractor creates an inet:email:header
node from an arbitrary header in <name>: <content>
format and also tidies the text string by removing "defang" characters and replacing "smart quotes" with "straight quotes":
($name,$value) = $text.replace('[.]','.').replace('“','"').replace('”','"').split(": ",maxsplit=1)
[ inet:email:header=($name,$value) ]
Conclusion
Spotlight Extractors are a powerful way for users to easily create more complex nodes and link them to the reporting in which they appear. This feature saves considerable time and effort vs. creating these custom nodes manually, allowing analysts to focus on analysis instead of hand-crafting the data they need.
Appendix - Additional Examples
We've included some additional examples of Spotlight Extractors below that can be used "as is" or modified to meet your needs.
Malware or Software Family
Similar to the risk:threat
example above, this extractor uses the highlighted text (a malware family or tool name) and the media:news:publisher:name
as input to a gen.*
command (gen.risk.tool.software) to create a risk:tool:software
node and link it to the media:news
node. However, this example also links the name of the malware / software (it:prod:softname
) to the media:news
node:
media:news=$news
$reporter=:publisher:name
[ +(refs)> { [ it:prod:softname=$text ] } ]
yield { gen.risk.tool.software $text $reporter }
-media:news
Vulnerability
This extractor creates (or lifts) and links a risk:vuln
node using a highlighted CVE number (it:sec:cve
). This extractor uses the gen.risk.vuln command (which deconflicts on the reporter name and CVE number) and explicitly uses nist as the reporter in order to retrieve information about the vulnerability from the NIST NVD database with the Synapse-NIST-NVD Power-Up.
gen.risk.vuln $text nist |
$cve=risk:vuln:cve | nist.nvd.cve.byid
Country
This extractor lifts and links an existing pol:country
node using a highlighted country name. This extractor expects a set of pol:country
nodes to already exist in Synapse (i.e., from a previous one-time ingest of country data).
geo:name=$text -> pol:country
SHA256 / File
SHA256 hashes often "wrap" in reports due to their length. Spotlight can't recognize the SHA256 because of the line wrap (line break). With this extractor you can highlight the wrapped text and the extractor will create a hash:sha256
node using the "cleaned up" $text
. This extractor also creates and links the associated file:bytes
node based on the SHA256 hash.
[ hash:sha256=$text ]
{ [ file:bytes=$text <(refs)+ { media:news=$news } ] }
To learn more about Synapse, join our Slack community, check out our videos on Youtube, and follow us on Twitter, BlueSky, Mastodon, and LinkedIn.