Playbook Fridays: Document Parsing and Keyword Scanning/Tagging

Automatically tag the documents with keywords and focused areas of interest without human intervention

ThreatConnect developed the Playbooks capability to help analysts automate time consuming and repetitive tasks so they can focus on what is most important. And in many cases, to ensure the analysis process can occur consistently and in real time, without human intervention.

This Playbook is actually a set of 3 playbooks: one that saves the keyword, one that is used to verify the data is saved and what the analyst expected; and the last one that actually performs the work.

Many customers have reached out and voiced frustration because analysts were spending a lot of time looking over various reports for specific keywords and then manually applying tags based upon those keywords. This act was getting very time consuming; especially for one customer, who had 10 separate focus areas with more than 200 different keywords.

With this Playbook set, analysts can automatically tag the documents with keywords and focused areas of interest without human intervention, saving the analyst about 4-5 hours/week.

This Playbook set is triggered with the creation of a document in a source (with a specific tag “parseme” that can be removed as requirement after verifying expected functionality). First, you can set the list of keywords from the datastore contained within ElasticSearch. Then, in JSON, you define a set of keywords and have them grouped and save them as variables. The main Playbook converts the document into a set of strings that is then passed onto the regex capture groups for comparison. For those keywords that match the Playbook, it will create the tag for the group, ie: China/Russia. Additionally, the Playbook will tag the document with the actual keywords within those that match, ie: APT12/APT28 etc.


1)  Import “Populate DataStore with Keywords.pbx
In this Playbook you set a JSON array with your keywords. There are a few examples already preconfigured out of the box to get you started. This playbook only needs to be ran once to populate the datastore (and any other time the list needs to be updated).


Populate DataStore with Keywords


2) Import Document Keyword Check.pbx
This playbook will need to be set to a specific owner to monitor, and as a safety measure, is currently configured to fire off the tag “parseme”. After verifying functionality this tag requirement can be omitted so that it runs each time a document is created.

Document Keyword Check

About the Author

ThreatConnect is the only security platform with comprehensive intelligence, analytics, automation, orchestration, and workflow capabilities native within a single solution. With ThreatConnect, you will be able to increase accuracy and efficiency, improve collaboration of teams and technology, strengthen business-security goal alignment, and build a single source of truth for your entire security team.