Software and Seinfeld with ThreatConnect’s CAL

It’s been awhile since the CAL Analytics team has updated the world on our latest creations.  Right before the holidays, we released our CAL 2.11 update (some months after our CAL 2.10 update).  Both of these major releases shared a common theme: getting you more data.  Now that people are returning to work and facing their problems with a fresh set of eyes, we think it’s a good time to highlight some of the areas where CAL can make your life easier.   

To ease you back into the sea of data that you have to navigate, I’m going to borrow from one of the great sources of wisdom of our time: Seinfeld.  Specifically, I want to bring up some of Seinfeld’s best food-related moments because it’s so pertinent to data.  We have to eat, there’s plenty of options, it’s rarely hassle-free, and at times it can be messy.  When it comes to finding actionable data, think of Elaine Benes and her amazing idea about eating just the top of the muffin:

Elaine’s genius is in her focus on the best part

Let’s take a look at how CAL’s recent enhancements are going to help you explode out of the pan, as any good muffin does.

Explore with CAL + DNS prioritization 

We made a big deal out of our new Explore with CAL feature in the ThreatConnect 6.4 release, and with good reason.  The fact of the matter is that CAL has billions of relationships at your fingertips that you can navigate.

Leveraging CAL Data to Drive Exploration

In a way, we’ve been building towards this vision for a while.  In the past, you could look at an indicator’s Details Page and see a number of enrichments and data points that CAL has computed for you.  For example, this host drnasre2019[.]ddns[.]net has some immediate clues that help us drive our next steps:

CAL 2.10

The three Classifiers here, in addition to membership in open source feeds, explain the 664 point ThreatAssess score and give us an idea of where to go next:

  1. The Quad9.RecentLookups.7D Classifier tells us about the DNS resolution activity observed through our Quad9 partnership, and that this domain is being actively resolved out in the real world within the last week.  This points towards doing some infrastructure analysis, as the domain is live in a sense.
  2. The DNSRes.Malicious.Current Classifier leverages CAL’s DNS collection and reputation analytics to let you know that the IP address hosting this domain is considered to be malicious, which could be a point of interest moving forward.
  3. Similarly, the Usage.DedicatedServer.Suspected Classifier traverses the entirety of CAL’s 141 billion DNS resolutions to point out that we don’t know of any other domains currently hosted on that IP address.

This is a big part of our philosophy with CAL and ThreatConnect as a whole: it’s one thing to give you the data you want, it’s another thing to surface data that helps inform your next decisions.  Unlike George Costanza, we’re encouraging you to double dip here:

It’s like putting your whole mouth…right in the data!

Looking into the Past

If you use the new Explore with CAL feature on that indicator, those Classifiers will give you an immediate direction on where to pivot next:

CAL 2.10

With those billions of datapoints at your fingertips, you can immediately cut to find that DNS resolution and begin exploration on the IP address that’s hosting this domain.  That IP address has its own suspicious score (391 at the time of this writing) and Classifiers that similarly hint towards the suspicious indicators leveraging this infrastructure.  These serve to help drive investigation in the other direction, should your starting point require it. 

Of course, any good analyst is going to wonder what else that IP address may have hosted.  While we know it’s only currently hosting our initial domain, CAL’s time-series analytics can tell us what else used to resolve to this IP address:

CAL 2.10

Right away, some interesting patterns start to emerge.  It appears that this IP address has been used for a lot of dynamic DNS stuff, not all of which looks harmless.  There are DuckDNS domains impersonating legitimate Microsoft IPv6 domain names, as an example.  With a few button clicks CAL has helped us enter a refined era of ingesting data so that we can savor the best parts, much the same way George has come to enjoy his Snickers bar with a fork and knife:

How do you do infrastructure analysis…with your hands??

Machine Learning to Drive DNS Prioritization

Of course, all of the data access in the world won’t mean a thing if you don’t have the right data.  Our Analytics team has put a great emphasis in striking a balance between amassing data that’s helpful and exposing it in helpful ways.  As Jerry says about the black and white cookie, you want to get some black and white in each bite.

Jerry’s advice also applies to harmonious security operations

In our latest release, we revamped the logic under the hood to help optimize our DNS collection efforts.  Let’s be honest – CAL has over 200 million hosts, and it doesn’t make sense to treat them all the same.  We’ve implemented a reinforcement learning model, a cool machine learning technique, that will self-train over time as it learns which domains need to be investigated more often.  While the details of this are best left to a dedicated blog post, the takeaway is this: CAL is using machine learning to make sure that we’re keeping a keener eye on elusive adversaries that rotate their infrastructure.  You can have full confidence that when you’re hungry to go exploring with CAL, you’re getting the best with each bite.

A New Approach to OSINT Feeds

We’ve done some things with open-source intelligence (OSINT) feeds that we’ve made a lot of noise about in the past.  We have Report Cards to show feed behavior, advanced analytics for data hygiene, WHOIS data, and so on.  In the past two releases we’ve changed a lot of the inner workings – not to replace those offerings, but to double down on them. Similar to when Elaine doubled down and ate the whole dang cake.

CAL is now Feed Central

One of the key changes we’ve made architecturally was moving away from our prior feed distribution mechanism to having it all done in-house within CAL.  The prior approach had its own strengths – we could leverage in-house tooling to prototype the acquisition, curation, and enrichment of feeds prior to distribution.  It also allowed us to figure out distribution to hundreds of ThreatConnect customers, enabling them to get those enrichments alongside historical and current data with the click of a button.  Like the cinnamon babka, it has its place in the world.

Software architectures are like baked goods – some are improvements upon others.

However, we saw an opportunity to make a good thing better, like with chocolate babka.  Once you’ve seen both it’s clear which one is the “lesser” babka – and so we made a series of investments to bring the entire end-to-end OSINT problem space under the CAL umbrella.  This means that the ThreatConnect Analytics team is entirely in control of how these disparate data feeds are collected, processed, analyzed, and distributed.

Our Engineering team worked out a pretty elegant design that has made this switch seamless for users.  You simply logged in one day and saw that you had new feed data, same as every other day.  The difference is that with CAL owning it end-to-end, we’re able to improve and tweak these things faster.  Our analytics team was able to identify data hygiene, enrichment, and consistency issues.  We were able to add enhancements so that OSINT feeds now produce complete File Hash triplets, and contain decorations such as tags and attributes.  We enhanced reputation and scoring inputs to better reflect what our Report Cards have taught us about each feed, thanks to the global ThreatConnect community and our feed analytics.  

The takeaway is that we’ve all got access to much cleaner OSINT data, and a smoother ramp forward to continue improving it.

A Slew of New Feeds

Naturally, that smoother ramp meant that we could climb the OSINT mountain a lot faster.  In our experience, users want more options for data all the time.  Note that we’re always open to requests!  If there is something that a customer points out to us that we’re missing from our collection opportunities, we prioritize those to make sure we’re bringing in the data you need in a way that will suit your use cases.  In addition to user-requested feeds, we keep a finger on the pulse of the online chatter to help ingest more data with every release.  Our goal is to keep you fed, like George Costanza eating ice cream at the US Open:

You’re hungry, so who are we to let you starve?

We’ve done so by adding 15 new OSINT feeds in CAL 2.11:

  • The folks over at Krisk Intel have produced a number of openly available feeds.  We’ve begun to ingest (4) of them: Ransomware, Malicious Domains, Malicious IPs, and Malware Hashes.
  • GitHub user stamparm’s maltrail repo contains thousands of text files with labeled IOC’s.  We’ve taken the liberty of analyzing, curating, and down-selecting these to the pertinent (10) that would best compliment our OSINT offering.
  • While we’ve long had the Maldun sandbox feed for scanned file hashes, we’ve added (1) Maldun URL feed based on their newly-added URL scanning capability.

CAL 2.10

Make sure your Administrator enables these new feeds if you want them!

Group Object Data in Browser Extension V2

We’ve made it a point to move beyond just looking at indicators.  Sure, they’re important but very few of us work exclusively with IOC’s in our day jobs.  We need something more substantial as we’re scanning blogs, reports, and anything else that we’re trying to ingest.  As Jerry’s friend Kenny Bania explains, “soup is not a meal.”

Indicators are the “soup” of cuisine – an important component, but not the entire meal

As we detailed in the ThreatConnect 6.4 announcement we’ve added some massive CAL-backed functionality to our Browser Extension (available both for Chrome and Firefox).  CAL has a dictionary of 1,500 objects (intrusion sets, malware families, etc.) and serves as a sort of “decoder ring” for the hundreds of other aliases by which they’re known.  We’ve doubled down on this data by adding in an up-to-date feed of CVE Vulnerability information, ensuring that your Browser Extension can present you with information on the 175,000+ vulnerabilities in CAL’s dataset:

CAL 2.10

Let CAL give you context on these higher-level objects…stop jumping between tabs!

In Summary

The latest CAL releases have been dedicated to surfacing cleaner, more relevant data to you in conjunction with ThreatConnect’s growing capabilities.  The opportunities to orchestrate, analyze, and weave this data across your enterprise are near endless.  As we continue to iterate on this technology, our goal is to make you and your team shine.  Soon your boss will be eating out of your hand, like with George and the calzone!

Drew Gidwani
About the Author
Drew Gidwani

Drew Gidwani is the Director of Analytics at ThreatConnect. He drives the data modeling, collection, and analytics both within the core ThreatConnect platform and in CAL. Previously, Drew worked for the Department of Defense where he leveraged his varied analysis experiences to scale growing intelligence teams in the face of the ever-changing threats we face today. Drew holds a B.S. from Carnegie Mellon University and an M.S. from Johns Hopkins University. He currently resides in Maryland with his fierce warrior dog named Gimli.