How I Learned the Storm Query Language

by mb | 2021/07/21


Any new platform or system often involves at least a bit of a learning curve as users familiarize themselves with and test out the capabilities. As an analyst at The Vertex Project, one question that I’ve heard a few times from new and potential users has been: What is it like for an analyst to learn Storm? While each analyst’s experience will vary, this is how I learned Storm and what resources I found useful.

What is Storm?

First things first: what is Storm? Storm is Synapse’s native query language and a way for analysts to create, lift, edit, filter, and pivot among nodes. It’s both flexible and extremely powerful, allowing analysts to query and ask questions of the data in ways that would necessitate developer support on other platforms. Despite its capabilities, however, Storm isn’t something that requires analysts to dedicate themselves to a decade of intense study before attempting a query. In my experience, it’s a straightforward language that you can learn as you go. When I first began using Storm, I primarily relied on basic queries that, while functional, were a bit clunky. As I’ve become more comfortable with the language, my queries have gradually grown in complexity to become more tailored and efficient.

Who am I?

I’ll be the first to admit that when The Vertex Project hired me as an analyst I was a little worried about whether I’d be able to learn Storm. While I’ve worked in the cybersecurity space for the past several years or so, I’m not someone who easily writes scripts or codes her own little tools. All of my coding experience has just been me playing around to a limited extent outside of work. My proudest accomplishment to date is a small Python script that will assign you a randomly-selected cat name (useful, I know). I started out in this field as a Strategic Analyst with an educational background in IR – that’s International Relations, not Incident Response. In short, I’ve written a lot of essays, but not a lot of scripts.

My introduction to storm...

When I joined Vertex, the analysts were using Synapse through cmdr, as Synapse’s UI was still in its early stages of development. I spent the majority of my first week reading through the Synapse documentation and practicing creating, modifying, and deleting nodes in my own instance of the Synapse hypergraph or data store, which we call a Cortex. My training Cortex contained data from The Vertex Project’s production Cortex, which meant that there were already nodes for me to lift, filter, and pivot among.

The Synapse docs were a valuable resource for explanations and examples, while having my own Cortex gave me a way to practice modeling and navigating among nodes. If, for example, I was modeling information related to an email message and wasn’t sure how to capture certain details, I could search the docs for information on emails and view the node type information for related nodes. I could also lift and pivot to see examples that my coworkers had previously modeled in the production Cortex.

One of my first official tasks for The Vertex Project involved modeling open source intelligence (OSINT) published in various threat intelligence reports and blogs. This helped reinforce my understanding of both the data and analytic models and my ability to perform basic Storm operations, as I could practice lifting, filtering, and pivoting, as well as creating, modifying, and tagging nodes. As most of the OSINT that I processed focused on threat intelligence, I became particularly familiar with that aspect of the data model. However, different blogs and reports would often share information that would require the use of other node types, incrementally introducing me to more of the data model.

My early queries, while functional, tended to be a bit longer and less direct (think lots of primary to secondary and secondary to primary property pivoting). However, I gradually got the hang of concepts like secondary to secondary property pivots, subquery filters, and variables, and started using them to construct more efficient queries. Learning how to use variables in particular was a bit of a game changer for me. Previously, if I wanted to try and identify whether any other domains resolved to the same IP during the same timeframe as a domain I was researching, I’d kind of just check visually (I know, I know). Now, I know how to set the .seen property as a variable and ask Storm to only return the DNS A records with .seen properties that fall within that same timeframe.

As an example, an early query to find additional domains that resolve to the same IP address during the same timeframe as the domain document.fastercapital.cc would have looked something like this and involved me trying to visually review the results to check for time overlap:

inet:fqdn=document.fastercapital.cc -> inet:dns:a -> inet:ipv4 -> inet:dns:a

After learning to lift by secondary properties and use variables, I could create a more direct query that would return the results I wanted (no eyeballing involved!):

inet:dns:a:fqdn=document.fastercapital.cc $seen=.seen -> inet:ipv4 -> inet:dns:a +.seen@=$seen

As I continue to build upon my Storm skills, I’ve found that one of the most valuable resources at my disposal are other analysts. If I’m trying to run a query and not quite getting the results that I want, or if I’m painfully aware that there must be a more efficient way to do something and I just can’t figure it out, I might ask one of my coworkers how they’d go about running the query. I’ve also found it helpful to see examples of queries that my colleagues have put together and used for their own research and analysis. If I’m reviewing research that a coworker is preparing to share through the Synapse UI’s Stories feature, for example, I’ll often check to see how that analyst selected the data displayed in the Story by navigating to the different Element menus and using the “Copy Query” option to copy the query to my clipboard. From there, I’ll go back to either the Research or Console tool to run and break down the query so that I can understand what the components are designed to do and get a sense of how I might write a similar query for my own research. Somewhat similarly, If I have any questions about why the analyst chose to use that query or display that specific set of data, I could always opt to leave a comment on the Story.