Do Androids Dream of Electric CALFs?


CAL 2.5 introduces an additional CAL Feed that uses defined criteria to identify NRDs that we believe have been created using a domain generation algorithm (DGA). 

Just a few short months ago we announced the introduction of CAL Feeds as a part of our CAL 2.4 updates. That release included four new CAL Feeds (CALF), one of which provides a steady stream of suspicious Newly Registered Domains (NRDs) to ThreatConnect. As we noted in that release announcement, NRD’s aren’t inherently malicious – new domains get registered every day. But some subset of those that are registered daily are at least suspicious or interesting. We’ve identified NRDs that we think are leveraging suspicious infrastructure and those are what are populated in the ‘CAL Suspicious Newly Registered Domains Feed’.

CAL 2.5 takes this concept one step further. It introduces an additional CAL Feed that uses defined criteria to identify NRDs that we believe have been created using a domain generation algorithm (DGA). DGAs are very specific techniques that some groups and/or malware families (APT41 and CHOPSTICK for example) use to generate a large possibility space of domains that they can easily switch between. Attackers use DGAs so that they can evade detection and mitigation techniques by security professionals, turning something like command and control into a game of whack-a-mole.

This new CAL Feed, called ‘CAL Suspected DGA NRDs’, consists of a list of recently registered algorithmically-generated domains (AGDs), as determined by our machine learning model. How does this work? Well, everyday as CAL aggregates hundreds of thousands of domains, they’re run through our neural network with the goal of identifying domains that are “suspiciously junky”.

Neural networks are a fascinating subject, and warrant a lot of discussion of their own. We will be releasing a white paper outlining the various machine learning techniques we applied to create our statistical model in detail. In layman’s terms, we’ve been able to use our massive dataset to train our model. The neural network identifies “features” of a domain, such as how long it is or how often certain character combinations appear. The beauty of machine learning, and neural networks specifically, is that it can discover (and weigh!) features through training to come up with a 0-100% confidence range that something is suspiciously junky. We take the top slice of that confidence range to populate our CAL Feed, because we think those are so suspicious that they may have been generated via a DGA.

This new CAL Feed joins the others in providing ripe hunting grounds for analysts.

Ready to get started? Just like our supported open source feeds, CALFs are available to system administrators through our TC Exchange Feeds Catalog. And just like the other feeds, they’ll get a report card and can be enabled with the click of a button Upon enabling a CAL Feed, its Source will be automatically created and configured. It will start populating automatically, with a predefined window of historical data being created and aged out appropriately.

Let us know if you have any questions about CAL Feeds via Twitter @ThreatConnect!

 

 

About the Author
Drew Gidwani

Drew Gidwani is the Director of Analytics at ThreatConnect. He drives the data modeling, collection, and analytics both within the core ThreatConnect platform and in CAL. Previously, Drew worked for the Department of Defense where he leveraged his varied analysis experiences to scale growing intelligence teams in the face of the ever-changing threats we face today. Drew holds a B.S. from Carnegie Mellon University and an M.S. from Johns Hopkins University. He currently resides in Maryland with his fierce warrior dog named Gimli.