Investigation Framework | Part 4 – Correlation

Investigation Framework

Correlation

Welcome back, hopefully you’ve had a chance to take a break and refill your caffeine of choice. Findings only provide half the answer when dealing with investigations. As an analyst, your job is not only to discover findings but to also make sense of them. A blocked network connection and failed login don’t mean much on their own. However, if you can show that the two events took place in a similar timeframe or with the same user, then you have more conclusive findings. So, let’s put some pieces of the puzzle together!

Extract Indicators

Upon writing out your findings, you should always reread what exactly you have in your hands. Understand the significant events and you should already have them in order. Another key point is to identify any indicators and note them down separately. To make it simple you want to find indicators associated with:

  • Network (domain, IP)
  • File (SHA256, file path)
  • Account (name, group, privileges)
  • Identifiers (Computer Name, SID, Asset #)

There are likely hundreds of potential indicators, so try to narrow down to the indicators you see are repeated or unusual. Remember, when we say indicators, we mean indicators of compromise not necessarily any IP you come across.

You will find a location for indicators within the TEMPLATE_InvestigationNotes. As you progress and need to share information, other investigators may not have time to look through all your notes so having the indicators are helpful to perform more wide searches for any missed activity.

Normalize your findings

Once you have your findings, you should always try to do two things Normalization and Deconfliction.

Normalization refers to changing labels to understand data more easily. For example, instead of having one evidence source labeled “connected ip” and another using “source network address” you rename both to “source ip”. This reduces the chance of overlooking information due to keywords and helps readers understand you are not referring to two separate types of data.

Another common practice is to match timestamps. For all timestamps I always recommend using the ISO-8601 standard and the UTC or GMT timezone. This helps put events in order after you’ve written your findings, in addition you don’t have to worry about looking at the wrong time from devices with different timezones (it’s more common than you think). At the end of the day you want all your timestamps to look like this 2022-01-01T12:34:56Z. Even if you have an estimated time (2022-01-01T12:00:00Z) or partial (2022-01-01TZ), I still recommend using it so that sorting data is simple. This will be especially important for Timeline Analysis.

Deconfliction refers to taking multiple events and describing them in one event. For example, if you have 500 ICMP attempts to one host, you could refer to the event as multiple connection attempts rather than 500 separate findings. While looking through data one by one, we often miss the larger picture so ultimately take a step back and look for patterns.

Find the pattern, complete the puzzle

Once your findings are normalized and you extracted the vital indicators, take a step back. Look for patterns and start creating some hypotheses.

  • Are any events happening at the same time or very close in time?
  • Does it make sense for activity to occur after work hours?
  • Was the same user attempted on multiple hosts?
  • Are there any significant gaps in time between findings?

These are just a few starters to get your mind working. Experience is a big part of correlation, the more you investigate the more you know of what “typically” happens. Don’t forget to have a second pair of eyes on your findings as well, a different perspective can unveil a lot.

I’d love to give more advice in terms of correlation, but the patterns are going to be specific to your incident and evidence. If you’re running out of ideas, I recommend looking at MITRE ATT&CK and try to identify the path the adversary may be taking. Try mapping out the techniques you’ve already identified and see if there are any obvious gaps. Maybe you’ve found Initial Access and Lateral Movement techniques, so it might be worth taking a second look at Execution or Persistence. You can fall down a rabbit hole when doing this, so it helps to know in advance what techniques you can actually identify based on your evidence source.

Conclusion

Correlation isn’t the most intensive in terms of analysis or writing. However, it can feel a bit redundant, but I assure you it helps make those connections that you will miss during analysis. You want to make these correlations early, not when you are writing your report. If you want a bit more detailed correlation keep a look out for my next section on Timeline Analysis where I will go through step by step on how to create your own timeline.

Twitter: @CyberCoat

Mastodon: @ChocolateCoat@infosec.exchange

LinkedIn: terrynvalikodath

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s