Capturing Structured Data in Spotlight with the Table Extractor

by savage | 2023-08-08


Of the many ways to ingest data into Synapse, Spotlight offers a "best of both worlds" option for ingesting and modeling data from written reports as it allows users to both scrape out indicators and then easily capture and tag additional details from the text. While Spotlight is great for handling unstructured data such as prose, blogs, and reports, these will sometimes also include structured data, such as indicator tables contained within threat intel blogs. The table extractor feature within Spotlight offers users a way to ingest and model this structured data, in a manner similar to the Ingest Tool.

Using the Tables Extractor Feature

Synapse users who load a document into Spotlight via URL can use the table extractor feature to "extract" tables that appear in the document and import the data in a manner similar to that used in the Ingest Tool. When loading a document by URL, Spotlight uses the Synapse-Playwright Power-Up to capture the page contents as a PDF, rendering it in the Documents tab and saving the file bytes to the Axon. Spotlight will also extract any tables contained within the document and display them in the Tables tab.

We’ll walk through using the table extractor feature below:

Example 1: Creating Nodes from a Table in Deep Instinct’s Blog

For our first example we’ll take a look at Deep Instinct’s blog on a threat actor they call Muddywater’s use of the PhonyC2 framework. When I upload the blog to Spotlight by its URL, Spotlight renders the blog in the Document tab, and extracts any tables that appear in the Tables tab:

_images/import4.gif

One of the tables lists SHA256 hashes and associated filenames. Spotlight already recognized and scraped out the SHA256 hashes and represented them as hash:sha256 nodes, and I could manually capture the filenames as file:base nodes. However, I’d like to create file:filepath nodes to capture the hashes along with their associated filenames.

Each of the extracted tables has a hamburger menu to the left of the table title, where I can find the options to export the table contents as a CSV, or to delete the table. If I delete the extracted table and then later decide that I’d actually like to ingest that data, I’ll need to refresh the document capture to have Spotlight re-extract the table.

I can click on the icon to the right of the table title to open the table ingest. The table ingest is similar to the Ingest Tool in that it will display the contents of the table and provide a space where I can include a Storm ingest script telling Synapse how to model the data.

_images/menus.gif

There are three buttons just above the box where I enter my Storm script. The Run button will run my Storm script, while I can use the Save button to save the Storm script for that table ingest so that it will appear if I exit out and then reopen the ingest. The Rollback button will discard any pending changes I’ve made to the Storm script.

I use the Run button below to start the ingest, and Synapse prints the resulting nodes in the box at right:

_images/table_script3.gif

After importing the data, I can exit the table ingest and view the resulting file:filepath nodes represented in Spotlight alongside other nodes created from the document:

_images/ingest_results1.gif

If there’s an ingest script that I find I’m frequently using, I might consider saving that script as a macro and then using it to ingest the data, rather than writing out the full script each time. In the gif below I use the macro I’ve named file.filepath and use that to capture the data as file:filepath nodes:

_images/macro2.gif

Whereas my previous Storm script also applied the #rep.deepinstinct.muddywater and #rep.deepinstinct.phonyc2 tags to the file:filepath nodes, I kept the macro simple and wrote it to just create the nodes themselves. I’ll add the tags later.

The table ingest will not only import the data in the table but also link it back to the media:news node representing the Deep Instinct blog. I can view the new file:filepath nodes either in Spotlight or by lifting the media:news node in the Research Tool and walking across the -(refs)> light edge to view the file:filepath nodes:

_images/refs2.gif

Example 2: Capturing Victim Information with ps:contact Nodes

In our second example, we’ll import data included in Citizen Lab’s technical brief on Pegasus infections associated with the Armenia-Azerbaijan conflict. Upon uploading the report to Spotlight via its URL, the table extractor renders a table characterizing identified victims. I want to model this data as ps:contact nodes:

_images/ca_tabletab.gif

Running the Storm script creates a ps:contact node for each entry in the table, populating the :name, :desc, and :loc properties for each:

_images/ca_ingest.gif

After running the Storm script, I can exit out of the table ingest to see the newly created ps:contact nodes rendered in Spotlight:

_images/ca_results.gif

I can also lift the media:news node representing the Citizen Lab report and walk the -(refs)> light edge to see the linked ps:contact nodes:

_images/ca_refs.gif

Spotlight Table Extractor: For When a Report Includes Structured Data

Analysts can use Spotlight’s table extractor feature to easily import structured data included in blogs and reports. Spotlight will recognize and create nodes to represent common indicators such as hashes, IPv4 addresses, email addresses, FQDNs, and URLs, and analysts can always manually capture additional data that appears in the document. The table extractor presents analysts with an additional option tailored to capturing structured data from a blog, report, or other document.

To learn more about Synapse, join our Slack community, check out our videos on Youtube, and follow us on Twitter, BlueSky, Mastodon, and LinkedIn.