Bringing the issue of security into the Big Data discussion often produces two divergent schools of thought from IT professionals − categorical denial that Big Data should be treated any differently from existing network infrastructure, and an opposite response towards over-engineering the solution given the actual (or perceived) value of the data involved.
Big Data − defined by Gartner as high-volume, high-velocity and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization − increases routine security challenges in existing data networks. These are the four facets, as defined by IDC, that give rise to challenges but also opportunities:
- Volume: The amount of data is moving from terabytes to zettabytes (1 zettabyte is 1021 bytes or 1,000,000,000 terabytes) and beyond
- Velocity: The speed of data (in and out), from static one-time datasets to ongoing streaming data
- Variety: The range of data types and sources − structured, un/semi-structured or raw
- Value: The importance of the data in context
Yet, while Big Data presents new security challenges, the starting point to resolving these challenges remain the same as creating any other data security strategy: by determining data confidentiality levels, identifying and classifying the most sensitive data, deciding where critical data is to be located, and establishing secure access models for both the data and analysis.
Plan around the Big Data Lifecycle Properly defended Big Data necessitates defining specific security requirements around the Big Data lifecycle. Typically, this begins with securing the collection of data followed by securing access to the data. Like most security policies, a proper assessment of the threats to the organization’s Big Data never ends but revolves around ensuring the integrity of data at rest and during analysis.
Performance is a key consideration when securing the collected data and the networks. Firewalls and other network security devices, such as those for encryption, must be of sufficiently high performance so they can handle the increased throughput, connections and application traffic. In a Big Data environment, policy creation and enforcement are more critical than usual because of the larger volumes of data and the number of people who will require access to it.
The sheer amount of data also proportionately increases the need to prevent data leakage. Data Loss Prevention technologies should be employed to ensure that information is not being leaked to unauthorized parties. Internal intrusion detection and data integrity systems must be used to detect advanced targeted attacks that have bypassed traditional protection mechanisms, for example, anomaly detection in the collection and aggregation layers. The inspection of packet data, flow data, sessions and transactions should all be scrutinized.
Because Big Data involves information residing over a wide area from multiple sources, organizations also need to have the ability to protect data wherever it exists. In this regard, virtualized security appliances providing a complete range of security functionality must be positioned at key locations throughout the public, private and hybrid cloud architectures frequently found in Big Data environments. Resources must be connected in a secure manner and data transported from the sources to the Big Data storage must also be secured, typically through an IPSec tunnel.