Using Spotlight Extractors for Arbitrary Data

by thesilence | 2023-11-20

Spotlight Tool Overview

Synapse's Spotlight Tool simplifies the process of extracting analytically relevant information from prose reports. With Spotlight, users can load a PDF document or have Spotlight retrieve content from a URL and convert it to PDF format. Users can then review and process the report's content in Spotlight.

Using Synapse's extensible scrape library, Spotlight automatically recognizes and creates many common indicators of compromise (IOCs), such as hashes, IPv4 addresses, and domains (to name a few). Power-Ups may extend these capabilities; for example, if the Synapse-MITRE-ATTACK Power-Up is installed, Spotlight can recognize references to MITRE ATT&CK elements such as techniques or groups and create the corresponding nodes.

Anything extracted via Spotlight is automatically linked to the document's media:news node using a -(refs)> (for "references") light edge. This makes it easy to identify all of the nodes referenced in a given report, or to take any node and find all of the reports that reference it.

Capturing Additional Data

If you want to extract (and link) information beyond what Spotlight can automatically identify, you can highlight the relevant text in the report and tell Spotlight what kind of node to create using the right-click context menu.

Many common forms such as file names (file:base) and threat group names (ou:name) are available from this Quick Forms menu by default. You can also add your own commonly used forms by modifying your Spotlight preferences, or use the other menu option to specify a form on the fly. (The other option is also useful if you need to edit your highlighted text - such as to remove "defanging" characters, spaces, or line breaks - before creating a node).

Using Spotlight Extractors

The ability to highlight text to create nodes works well when the text is the exact value (primary property) of the node you want to create. Unfortunately you can't use this method to create composite nodes or guid nodes. It may be perfectly clear to you that when you highlight "CVE-2023-36932" you want to create a risk:vuln guid node and set the :cve property to that value, but this process is less clear to Synapse. (There are many cases where Synapse can help to "do what you mean", but Synapse can't read your mind.)

For these situations you can create Spotlight Extractors - small bits of custom Storm that tell Spotlight how to create a particular node from your highlighted text. (Note that Spotlight Extractors differ from the Spotlight Table Extractor which helps you create nodes from tables within a report.)

You can create and manage Extractors (and any custom forms you add to your Quick Forms menu) from the Your Settings dialog, which you access from the Preferences option of Spotlight's main hamburger menu (or from the Your Settings section of the Optic Top Bar).

Creating Guid Nodes

Extractors are particularly useful for creating guid nodes from highlighted text. Spotlight's text highlighting makes it easy for you to create and link the names of objects, such as an ou:name node for a threat cluster or company name. It is harder to create and link the "thing itself" (the risk:threat or ou:org referenced by the ou:name) when the "thing" is a guid node. There are many cases where you want a closer tie between the report and the object vs. the name of the object. A Storm query such as:

media:news:title="yep, more bad stuff happening" -(refs)> risk:threat

...is easier and more precise than:

media:news:title="yep, more bad stuff happening" -(refs)> ou:name -> risk:threat

The second query may return multiple threats (risk:threat nodes) from various reporting organizations that all use the same name for the threat they report on.

Spotlight defines some variables that you can use with your extractor code:

$raw: the highlighted text "as is";

$text: the "clean" version of the highlighted text (e.g., leading/trailing whitespace removed, line breaks replaced with spaces, etc.); and

$news is the guid of the report's media:news node.

When creating guid-based forms, you can take advantage of Storm's gen.* commands to deconflict the node you create based on its secondary properties. The following extractor creates a risk:threat node from a highlighted threat cluster name (ou:name). Note that the gen.risk.threat command deconflicts risk:threat nodes using both the threat name and the reporter name. The extractor obtains the reporter name from the media:news node's :publisher:name value, so this value needs to be set (e.g., through Spotlight's Document details menu) for the extractor to work (it's a good reminder to add things like the publisher, publication date, and title).

media:news=$news
$reporter=:publisher:name
yield { gen.risk.threat $text $reporter }
-media:news

Note that a Spotlight extractor must generate exactly one node; in the extractor above, we need to drop (filter out) the media:news node that we lifted in order to retrieve the :publisher:name property (to use with the gen.risk.threat command). Once we drop the media:news node, the Storm will return only our newly generated risk:threat.

You add the extractor to your Spotlight preferences in the Your Settings dialog:

Your custom extractor is now available from your context menu and you can easily create the threat cluster:

Once you've created the risk:threat node, you can view the node details in Spotlight, query it in the Research Tool, or view the node in the Vertex-Threat-Intel Power-Up Workflow to fill in additional properties if necessary:

Notice that the threat cluster is also linked to the associated Spotlight report:

Creating Composite Nodes

You can also use extractors to create composite (comp) nodes. As an example, a report may include email header information (such as subject lines) for recent spear phishing campaigns. You want to capture the headers to look for related activity. An inet:email:header node is a composite node made from the header name (such as Reply-To) and the header's value. The following extractor tells Spotlight how to create an inet:email:header node for an email subject from the highlighted text:

[ inet:email:header=(subject,$text) ]

By highlighting the text of the Subject line, you can easily create the inet:email:header node:

You can create extractors to account for more complex use cases (like creating arbitrary email headers) or the need to "clean up" text. The following extractor creates an inet:email:header node from an arbitrary header in <name>: <content> format and also tidies the text string by removing "defang" characters and replacing "smart quotes" with "straight quotes":

($name,$value) = $text.replace('[.]','.').replace('“','"').replace('”','"').split(": ",maxsplit=1)
[ inet:email:header=($name,$value) ]

Conclusion

Spotlight Extractors are a powerful way for users to easily create more complex nodes and link them to the reporting in which they appear. This feature saves considerable time and effort vs. creating these custom nodes manually, allowing analysts to focus on analysis instead of hand-crafting the data they need.

Appendix - Additional Examples

We've included some additional examples of Spotlight Extractors below that can be used "as is" or modified to meet your needs.

Malware or Software Family

Similar to the risk:threat example above, this extractor uses the highlighted text (a malware family or tool name) and the media:news:publisher:name as input to a gen.* command (gen.risk.tool.software) to create a risk:tool:software node and link it to the media:news node. However, this example also links the name of the malware / software (it:prod:softname) to the media:news node:

media:news=$news
$reporter=:publisher:name
[ +(refs)> { [ it:prod:softname=$text ] } ]
yield { gen.risk.tool.software $text $reporter }
-media:news

Vulnerability

This extractor creates (or lifts) and links a risk:vuln node using a highlighted CVE number (it:sec:cve). This extractor uses the gen.risk.vuln command (which deconflicts on the reporter name and CVE number) and explicitly uses nist as the reporter in order to retrieve information about the vulnerability from the NIST NVD database with the Synapse-NIST-NVD Power-Up.

gen.risk.vuln $text nist |
$cve=risk:vuln:cve | nist.nvd.cve.byid

Country

This extractor lifts and links an existing pol:country node using a highlighted country name. This extractor expects a set of pol:country nodes to already exist in Synapse (i.e., from a previous one-time ingest of country data).

geo:name=$text -> pol:country

SHA256 / File

SHA256 hashes often "wrap" in reports due to their length. Spotlight can't recognize the SHA256 because of the line wrap (line break). With this extractor you can highlight the wrapped text and the extractor will create a hash:sha256 node using the "cleaned up" $text. This extractor also creates and links the associated file:bytes node based on the SHA256 hash.

[ hash:sha256=$text ]
{ [ file:bytes=$text  <(refs)+ { media:news=$news } ] }

To learn more about Synapse, join our Slack community, check out our videos on Youtube, and follow us on Twitter, BlueSky, Mastodon, and LinkedIn.