by a Vertex Analyst | 2021/10/07
One of the most critical pieces to intelligence analysis is data. Most analysts will agree that quality data is key to performing even basic analysis. No analyst wants to unnecessarily sift through large amounts of data when they could have just used a more precise data source to begin with. In this blog, we’re going to dive into some examples of common cyber threat intelligence data sources and discuss several considerations you should make before incorporating a new data source into your environment.
Before we get into evaluating data sources, let’s go over a few of the most common data sources in the cyber threat intelligence space.
Network infrastructure data is commonly used to gather context about FQDNs, URLs, and IP addresses. This context can be used to identify and pivot to connected or adjacent infrastructure which allows an analyst to have a clearer picture of the environment when making assessments. Another common use case is sinkhole identification. Identifying and monitoring sinkholes can provide a goldmine of data about malicious FQDNs.
Common data sources for infrastructure include:
Malware data is another heavily used data source. This comes in several flavors but the most common by far is sandbox data. Well known examples of this would be Virustotal or Hybrid-Analysis. The APIs from these types of data sources will return execution data including DNS requests, network flow data, interactions with the file system etc.
Common data sources for malware include:
One popular way to receive both malware and network infrastructure data is through a threat feed. The term threat feed generally refers to Indicators of Compromise (IOCs) that are provided by an organization. These vary greatly in terms of price and content. Delivery methods for threat feeds can be a subscription based API endpoint, shared in a MISP format, a Github page, or a Twitter feed. However some providers only provide access via their product’s UI (I’ll touch on why I’m not a fan of this later in this blog). Sometimes threat feeds are provided as a service to members of an organization. An example of this would be FS-ISAC, which is a cyber intelligence sharing community exclusively for the financial sector. Another example would be a company that focuses on “dark web” research, which would provide data specifically around that use case.
This is more of a melting pot of different data sources used to enumerate and enrich data about people and organizations. This type of data can be very useful when you are trying to peel back the layers behind online identities.
Some common data sources for people/organizations are:
With the types of data sources in mind, what makes a data source “good”? That may be a matter of opinion, but I will break down some of the key factors that you should consider:
You can have the sleekest UI the internet has ever seen, but it is all for nothing if you don’t have an API endpoint available for users to interact with. Analysts generally do not enjoy having tons of browser tabs open and we definitely do not like hand jamming or copying and pasting data. Most commercial data providers have an API, but this is something you definitely want to check out for yourself. If they do not provide public documentation for their API, then ask them for it. You will want to go through and test drive the endpoints to see what format the data returns in and what limitations exist.
However don’t be fooled by the marketing fluff, not all APIs are created equal. Why does this matter? Imagine you are investigating a suspicious binary, so you send it to your sandbox for analysis. After the analysis has completed, you find that you are unable to import the results through the API. The API is expecting a report ID and you sent it a hash. Unfortunately the sandbox does not have an API endpoint that is capable of mapping hashes to report IDs, forcing you to login into the sandbox UI to view the results and export them manually. This would be an extremely inefficient and loathsome process for any analyst to endure.
It would be remiss to not factor your organization’s budget into your decision tree when evaluating a new data source. While the finer minutiae of purchase orders, invoices, and SLAs may not be in your lane, knowing how much your organization is willing to pay for the data is. This will ultimately be guided by your mission and what your organization believes that you need to accomplish it.
What about free data though? Just like anything else in life, nothing is truly free. Generally speaking, “free data” is use-at-your-own-risk. Free data does not necessarily mean good data. It generally comes as is, which may or may not work for you depending on your environment. This doesn’t necessarily mean that there is anything wrong with the data, but it will not have the level of customer support and assurance as a paid subscription. This does not necessarily mean that all free data should be regarded as trash. There are several great free data sources, such as Public Suffix, that provide a great value with no cost to access.
This is obviously a huge factor when it comes to the procurement process. It is critical however that you take good notes during your evaluation of a data source and be prepared to present your findings. You must be able to articulate why your organization needs the data source, if you think it is a good fit. You also need to clearly identify and address any overlapping or duplicate capabilities that this new data source collides with. It is also a good idea to have various analysts on your team involved in the evaluation of the new data source, so you can get different perspectives.
Another question that you should be asking is, “what questions does this data source help me answer?” For example, you may be trying to expand its visibility into an attacker’s infrastructure that has recently attacked your organization. In this situation, you would want to ensure that a candidate data source has Passive DNS coverage during the time windows before, during, and after the attack. If it doesn’t provide you with answers that you need, then it’s acceptable to disregard it as a potential data source for that need.
As analysts, it is important to give honest feedback during the evaluation process for selecting data sources. The answer should not always be yes. You should carefully consider what your endorsement of a data source means. If you give every option presented to you the nod of approval, then your opinion becomes much less valuable.
Data source evaluation is an important process and analysts will want to ensure their voices are heard throughout the process. As the old adage goes, “Garbage in, Garbage out”. It’s up to you and your team to provide decision makers with clarity.