Who’s Next: A look at CAL 2.6’s latest additions

We’re proud to announce the release of CAL 2.6, our latest addition to our Collective Analytic Layer’s featureset.  As the latest in our ongoing quest to find the most interesting intelligence and deliver it to you, we’ve decided to incorporate some additional datasets in the form of WHOIS records, a partnership with Quad9, and even some advanced machine learning techniques.

We’ve also introduced sector-specific CAL Feeds to monitor NRDs potentially associated with popular brand names within the Financial, Energy, and Communication industries.

Additional data is critical to our vision.  It helps us make better decisions, but it also builds towards inflection points.  As you amass more data, you occasionally hit a new plateau where you aren’t just seeing more stuff — your sense of vision itself has changed.  In the analytics world, these new heights open doors for the techniques we can use to discover novel things.

Of course, new discovery techniques are only as useful as the ways we can deliver the insights they uncover.  Every release we try to leverage our new data and our new techniques to deliver better understanding of the intelligence you’re looking at.  We’re always improving our ability to regulate indicators’ scoring and status, or better provide Classifiers.  With the release of CAL Feeds in 2.4, we’ve expanded beyond that to promote indicators of interest based on our analytics and the weird stuff they’ve found in the data.  And with CAL 2.6, we’re doubling down.

There has got to be a twist (…on my brand name)

As of 2.6, CAL has added 3 new CAL Feeds (CALFs) that leverage some big analytical muscles to look at newly registered domains, or NRDs.  We’ve always paid extra attention to NRDs because they remove one of the biggest variables from the intel equation: timeliness.  A domain isn’t inherently bad because it’s new, but it sure removes some of the ambiguity in comparison to an indicator you lifted from a report that’s years old!

As security practitioners, many of us are chartered with protecting our organization across a variety of axes.  These sometimes overlap, and the lines between incident response and brand protection and threat intelligence can start to blur.  These needs aren’t always clear, they may be fulfilled by different teams in different places, and it can result in a lot of wires getting crossed.

To help bring the band together, we’re now inspecting the NRDs to look for variations — or twists, to make my puns work — of popular and legitimate brand names.  No matter what instrument you play, if you’re thrust onto the stage of defending a big name like Acme Corp, our new CAL feeds will start to surface NRDs like acm3-c0rp[.]com or acme-corp-super-legitimate-secure-login[.]io.

The above image shows an example of a Dashboard that monitors domains spoofing financial institutions using the new Financial Sector NRD CAL Feed. Please note that bank names have been changed to protect privacy.

We’ve separated these CALFs by industry type to help you keep an eye on the things that are relevant to you and your peers.  Starting with the Financial, Communications, and Energy sectors we’ve begun searching for top brands — and NRDs that are imitating them.  We will continue to monitor these feeds to find intelligence for these organizations, but we’ll also be expanding the list of brands within each industry.  If your industry isn’t one of those three, fear not!  We’ll be expanding our industry coverage moving forward as well.

These CALFs, in conjunction with other ThreatConnect capabilities such as Workflow and Playbooks, can help your organization stay on the same sheet of music.  Now, when someone registers a domain to impersonate your brand, you can automatically detect, analyze, and triage the imitation within the nerve center of your security organization.

Who are you?  (who who?)

When we’re talking about infrastructure analysis, one of the topics that always comes up is WHOIS information.  While WHOIS records became a less fruitful source of intelligence as the world wisened up to protecting privacy, it still provides some valuable data points.

WHOIS records contain (or oftentimes, don’t contain) information on the registration of a domain.  We can use some of these fields in a big-data solution like CAL to identify trends.  We have introduced a framework for a series of analytics that look at things like:

  • Registration length, since adversaries don’t often like to pay for domain names for longer than they have to
  • Registrant email addresses, since sometimes bad guys don’t pay to mask it and leave fingerprints behind
  • Registrar information, because not everyone online plays by the rules and there are some “shadier” corners of the internet
  • Expiration information, since expired domains can oftentimes be snatched up, changing their disposition (e.g. sinkholing vs. hijacking an expired domain).

These insights, as always, are available to you automatically in your ThreatConnect instance.  You’ll start to see further improvements in CAL’s contribution to the ThreatAssess score, and CAL will increase its ability to modulate indicator status to keep False Positives from lighting up your field of view.  Additionally, we’ve added some new Classifiers to help your analysts (and playbooks) act on these insights:

  • Host.WHOIS.ShortRegistration for hosts who have an abnormally short registration window
  • Host.WHOIS.LongRegistration for hosts who have an abnormally long registration window
  • Host.WHOIS.Expired for hosts whose registration window has elapsed

I can see for miles (…with Quad9)

Continuing with our theme of infrastructure analysis, we’re thrilled to announce our partnership with Quad9 DNS.  In addition to being a personal favorite in terms of speed and service, Quad9’s not-for-profit charter makes them a great partner because their not-for-profit charter mandates that they work with industry professionals to help protect users.  As part of our partnership with Quad9, we supply them with intelligence on some of the dangerous hosts we’ve identified on the internet.  Quad9 protects their users from browsing to those sites by “black-holing” the requests.

In return for our kindness, Quad9 gives us a sense of how those domains are being requested in the real world.  Quad9 is able to highlight which hosts are receiving DNS resolution attempts, at what time, and from where in the world.  While protecting any individual, personally identifiable information (PII) this data in aggregate is valuable, for the same reason as the rest of our collective analytics.  With this partnership in place, CAL can further identify which hosts are receiving substantial amounts of real-world traffic requests from across the world!

CAL is grinding away on taking these hundreds of millions of global DNS requests to better triangulate how active these malicious hosts are.  By combining this approach with our other datasets and analytics, we can ensure that you’ll see even better intelligence based on whether a domain name is being actively resolved by real life users.  In addition to improved scoring, we have added additional Classifiers to help you (and your playbooks!) understand these insights:

  • Quad9.RecentLookups.30D for hosts who have received reported DNS resolution attempts in the last 30 days
  • Quad9.RecentLookups.7D for hosts who have received reported DNS resolution attempts in the last 7 days
  • Quad9.HighVolume.30D for hosts who have received a significant amount of DNS resolution attempts (relative to the rest of the data) in the last 30 days
  • Quad9.HighVolume.7D for hosts who have received a significant amount of DNS resolution attempts (relative to the rest of the data) in the last 7 days

I won’t get fooled again (…on DNS prioritization)

When we launched CAL 2.0 we made a significant investment in collecting our own DNS data.  For years, we’ve been building up our own dataset based on active resolutions of domain names that end up in our net.  As the ThreatConnect user base has grown, so has CAL and its dataset.  As we’ve amassed hundreds of millions of domains, we’ve had to think through the right way to prioritize hosts for DNS resolution.

Some hosts, for example, we want to check regularly.  With adversaries leveraging techniques like Fast Flux DNS, it’s not always feasible to check a host’s IP once a day.  Likewise, with all of the newly-registered and newly-discovered domains out there, one can’t resolve everything every minute.  This is a scale problem that lends itself well to machine learning, to better understand the pockets of the internet and how to treat them.

We’ll be discussing our machine learning approach in-depth in a future blog post.  You’ll be able to see some of the cool techniques we’re using — and some of the interesting corners of the internet — in a bit more depth.  That topic deserves its own blog post (spoilers: unsupervised clustering is pretty handy).  In the meantime rest assured that when CAL says a host has a high score because of its DNS resolution, or that an IP address is classified a certain way because of the hosts on it, we’ve put some serious thought into it!

My generation (…algorithm is showing)

We recently detailed our machine learning approach to detecting algorithmically-generated domains (AGDs), both in the form of an academic-style whitepaper and a more digestible blog post.  We took our “AGD Detector” and pointed that at our list of NRDs, ultimately identifying real AGDs in realtime and putting them right in your crosshairs.  In the last month alone we identified over 3,500 suspected AGDs (over 90% of which remain unreported by OSINT feeds).

We’re thrilled that our lab numbers have held up so well in the real world!  In fact, we’ve seen such success and excitement in the adoption of this feed that we’ve expanded that “AGD Detector” analytic to analyze all of the hosts CAL knows about.

What does this mean for you?  As you go about your day-to-day job, CAL will begin enriching all of the indicators you’ve shared with it.  This means that as your automations and integrations begin dumping intelligence into your ThreatConnect instance, CAL will continue to assist in prioritization by raising scores and adding the Host.DGA.Suspected Classifier to help you find the way.

I’m free (…for you to use!)

With all of these additions, it may be hard to believe that CAL is included as part of your ThreatConnect subscription.  We’re continuing to make serious investments in helping to identify, understand, and prioritize the absolute ocean of information you may be drowning in.

Stay tuned, because we have some big things underway in the coming months for CAL’s roadmap.  All of these iterations on CAL’s dataset and functionality are about to pay even bigger dividends when you pair them with the exciting features coming in the ThreatConnect platform!

About the Author
Drew Gidwani

Drew Gidwani is the Director of Analytics at ThreatConnect. He drives the data modeling, collection, and analytics both within the core ThreatConnect platform and in CAL. Previously, Drew worked for the Department of Defense where he leveraged his varied analysis experiences to scale growing intelligence teams in the face of the ever-changing threats we face today. Drew holds a B.S. from Carnegie Mellon University and an M.S. from Johns Hopkins University. He currently resides in Maryland with his fierce warrior dog named Gimli.