Skip to Main Content U.S. Department of Energy
Asymmetric Resilient Cybersecurity

LINE-speed Bio-inspired Analysis and Characterization for
Event Recognition
A Biosequence-based approach in the discovery of evolving threats

Christopher Oehmen
christopher.oehmen@pnnl.gov | (509) 375-2038

LINEBACkER allows cyber security analysts to quickly discover and analyze behaviors of interest in network traffic to enhance situational awareness, enable timely responses, and facilitate rapid forensic and attribution analysis. In a collaborative, operational setting, netflow data can be converted on site in near real-time and then shared with collaborators in obfuscated form. This allows for finding attacks and anomalies faster without exposing sensitive data.

Challenge

Our reliance on cyber systems permeates virtually every aspect of national infrastructure. From banking, finance and industry to education and research, from national defense to power generation and delivery, secure computer networks are the lifeblood for maintaining critical infrastructure, information, and the US strategic advantage over our adversaries. The volume of network traffic data generated has outpaced our ability to effectively analyze it fast enough to prevent many forms of network-based attacks. In most cases new forms of attacks cannot be detected with current methods. We need a method to drastically reduce the amount of data to be analyzed, to quickly characterize an attack, and to identify previously unseen types of attacks before they're executed. Network analysts need the ability to discover malicious traffic in computer networks, and share their insight or a signature of the threat with others, without jeopardizing sensitive or institutional data.

Approach

The LINEBACkER tool allows analysts to share signatures without sharing data. This is especially beneficial when sensitive data is involved and sharing threat signatures across multiple organizations is necessary. Simply, LINEBACkER applies the MLSTONES methodology to the problem of discovering malicious sequences of traffic in computer networks. MLSTONES leverages technologies and methods from biology and DNA research, and have effectively mapped a solution to flexibly represent and identify signatures and express them in a biology-based language that cannot be "translated" back to the original data.

We've translated several biology and bioinformatics concepts onto cyber defense data. Specifically we've created a methodology that uses the concepts of protein identification and families, inheritance, and function to apply to a number of cyber based data types. The MLSTONES process creates cyber "proteins" and then create a single representation of an entire family of entities thus reducing the amount of data to analyze by several orders of magnitudes.

We can also infer the function of a "cyber protein" by its relationship to other similar proteins. This is the same process used in biology to discover similar proteins. This helps to identify completely new (zero-day) cyber threats. We apply high-performance biosequence analysis that enables inexact string matching of streaming network traffic; our approach is robust when there is more than one form of threat and supports "family resemblance" attribution. The tool characterizes baseline behavior, converts raw netflow data to bio-representation, constructs a family tree of cyber event types, and creates visual interface to deploy in a client setting, against specific threats or suspicions. The translation of network behavior is accomplished at the site of collection and it is the translated representation that is shared among collaborating agencies.

Asymmetric Resilient Cybersecurity

The ARC Team