Big Data and Cyber-Security Analytics: The Current State of Play

Despite the best efforts of cybersecurity teams, security breaches continue to plague corporations worldwide. In 2015 alone there were at least 79,000 reported security incidents and 2,122 confirmed breaches. The sheer number of attacks is troubling, but what raises greater concerns is that many of the world’s largest companies, which support well-funded, sophisticated security teams, were among last year’s security breach victims. The traditional paradigm of building walls around our networks in an attempt to keep bad actors completely outside corporate perimeters is no longer a viable tactic. Even the most technically advanced organizations will be – and likely already may have been – breached at some point.

We must accept this new reality and begin to develop our defence models not only to stop hackers from getting in, but to better identify and respond to their malicious activities once they do. Encouragingly, a new generation of tools for security analytics based on big data technology offers expanded help for companies seeking to improve their proactive and reactive cyber-defence capabilities.   THE LIMITATIONS OF TRADITIONAL SECURITY APPROACHES

One reason that current approaches to network security often fall short is that they tend to be fundamentally reactive. They focus on stopping known threats based on identifiable fingerprints and catalogued artefacts. Yet minor changes to malware by assailants can often easily make the crumbs of data left behind virtually undetectable. Today’s attackers often deploy customized malware whose footprints and payloads are unique enough that even heuristic-based scanners often look right past them.

Traditional security tools are also often deployed in a fragmented and disjointed manner. Firewalls, antivirus and malware scanners, web and mail server gateways, intrusion prevention systems, data loss prevention systems, endpoint scanners, and access control management tools all work in isolation. They were not designed to communicate with one another or to share information with each other. While these tools tend to remain vital to any security arsenal, their piecemeal utility makes it difficult for security teams to develop a clear picture of what is actually transpiring on their network at any given point in time. And when malicious code does slip by, it can possibly remain undetected for months or even years.

Another major drawback of typical current cybersecurity models is their inability to reconstruct history after an identified security incident. A breach investigation can take weeks or even months to conduct and often concludes with many key facts left unknown. One major constraint is the capture and integration of time sensitive information required for an investigation. When the alarm sounds security incident response teams are forced to scramble trying to build a picture of what transpired using fragments of information from a vast array of diverse sources. Much of this information is ephemeral, and at best may only cover a few days of network history. This may not be enough to understand several months or years of cyberattack activities. This data tends to be siloed in separate specialised systems which can be difficult to link to directly, and sometimes it is not stored at all. Event log files may need to be copied off hundreds of diffferent virtual servers, for instance.

This patchwork of ephemeral data makes the journey from incident alert through the identification of impacted machines, and on to the isolation and elimination of the threat very precarious. It could potentially have a seriously detrimental effect and hamper investigations. The end result is an ever-evolving and changing story that slowly unfolds over time, often troubling company management and shareholders with conflicting reports of findings over time.   HOW CAN BIG DATA HELP?

Big data cyberanalytics solutions can help companies better manage these issues. Big data technology, from operating systems to analytics and reporting layers, is specifically designed to address the need to rapidly process massive amounts of data from a vast array of sources, very quickly. These platforms are also designed to hold many different forms of data without the need to transform, cleanse, normalize, or valaidate their content in advance. Users can consolidate all their data feeds onto one platform quickly and proactively, regardless of their size or complexity.

A properly built platform can allow companies to stream all of their network traffic through an analytics portal in real time, allowing them to shine a light on all network activity, as if they were watching it with a video camera. For added value this data stream can also be combined with other data points such as system logs, VM-to-VM IP taffic, network flow records, user account directories, and the fragmented outputs from traditional security tools.  A good deal of third-party information can also be piped in through an on-premises feed or cloud service subsriptions like geolocation services, cyberthreat and reputation feeds. Some examples examples include Emerging Threats, Google Safe Browsing, Malware Domain List, SANS Internet Storm Center, SORBS (Spam and Open-Relay Blocking System), VirusTotal, and other spam or IP address blacklists. Supplementary website intelligence services, such as DomainTools, Robtex, and the global domain registry database, may also be intergrated for improved analysis.

Companies that create a single platform that proactively collects the data required for security analysis and makes it available instantly for real time and historic analysis may improve their cybersecurity defence and response programs dramatically. This approach can eliminate the time required to rebuild history after a security incident and drastically reduce response and investigation times from months to just days, or even a few hours. Centralizing this data also provides teams with a holistic view of each of the various stages in the response life cycle providing a significant edge over their adversaries and allowing for much earlier detection and containment of adverse activities.   USING THE CYBER-KILL CHAIN STAGES FOR SYSTEM ASSESSMENT

Are big data cyberanalytics right for you? A good place to start to answer this is by using the kill chain framework to benchmark your organization’s response procedures and identify weaknesses. First pioneered by defence contractors, the cyber-kill chain is a widely used mapping technique that describes the process an attacker goes through to achieve an objective, whether it is data theft or a disruption of service. Breaking the kill chain at any stage stops the attack. The earlier in the kill chain an incident is detected and contained, the more efficient the defence mechanism is. Leveraging big data for cybersecurity could potentially result in exponential increases to the availability of and access to information when it is most needed, affording a better chance to respond to threats earlier. A centralized high-speed response platform that is calibrated to these kill chain stages – reconnaissance, weaponisation, delivery, exploitation, installation, command and control, and action – may increase a company’s cyberdefense effectiveness, making it more likely to detect and block an attack as early as possible.

An assessment using this framework considers the types of information available at each stage and ascertains the stage at which your company would typically be able to detect breaches. Consider the reliability of your current cyber-defence tools, catalog the information they provide, and assess how the effort that would be needed to rebuild a complete and accurate picture of actions on your network for the past six to nine months. Then consider how a real-time camera feed with record and playback features for all your network traffic might help address any shortcoming you could identify.

One of the greatest benefits of big data architecture is its infinite scalability. This means companies don’t need to take a scorched earth approach when building and deploying solutions. They can start small with their most critical gaps, systems, or locations, and grow in scope as returns on investments and value become realized and demonstrable.  Just ensure you wrap a data governance program around these and any other big data program you deploy, else you may be creating more security, privacy, and data retention problems than you are solving.

David White specialises in information lifecycle governance with a particular focus on data security, privacy, analytics, and regulatory compliance. He is a former AmlLaw100 Attorney and has more than 20 years of experience assisting companies in complex regulatory compliance and investigations, including electronic discovery, compliance audits, data breaches, and forensic investigations. David is a certified Six Sigma Green Belt, and uses Lean Six Sigma and project management methodologies to develop and implement cost effective and efficient compliance protocols. He is also Certified Information Privacy Professional (CIPP/E/US) by the International Association of Privacy Professionals (IAPP), and a registered Patent Attorney with the USPTO with expertise in computer systems and database arts.

Annie Tu is a bilingual industry recognised cyber security mentor with over 10 years’ experience covering cyber incident response, forensic investigations, eDiscovery, FCPA review and business consulting. Annie has managed and coordinated numerous international projects involving multiple territories in Asia Pacific and Europe. Having resided and worked in the UK, Hong Kong and mainland China, Annie has a unique insight into the challenges faced by businesses with diverse cultures.  She is a SANS GIAC Certified Forensic Analyst (Silver), an Encase Certified Examiner, a Certified Ethical Hacker and a SANs mentor for the forensic course “Advanced Computer Forensics and Incident Response”.