Select Page
Data Classification: How to Categorize Your Data and Where to Store It

Data Classification – How to Categorize It, Where to Store It

Darrell Jones

Previously, we discussed the requirements of a mature data classification program. In this post, we are going to review the administrative mechanics of such a program. Data classification, you’ll recall, usually includes a three- or four-layer system akin to the below:Data classification typically includes a three- or four-layer system

I recommend that organizations new to data classification begin with the three-level system as these levels and their corresponding actions and controls can be challenging to define. The three-level system considers all internal data confidential. The priority therefore is to create the processes and procedures needed to support confidential data. You can identify the limited amount of Public and Highly Confidential data later through interviews and technical discovery. Then you can clearly communicate your goals across the business, including locations, processes, and applications.

Today, we are going to cover how data is normally stored in organizations and where. These structures are going to have a huge impact on your program’s scope, operations, and technical decisions. As every organization has different business processes and technologies, each data classification project is going to be different, too.

Structured vs. Unstructured Data

Categorizing structured and unstructured data is the easiest data classification component to explain yet the hardest to manage. Structured data is any data within an application, usually a database. Your organization’s application owners, database administrators, or the application vendor can explain the different types of data stored in the application. Organizations marvel at how much data and how many data types are stored in applications. HR, customer relationship management (CMR) systems, enterprise resource planning (ERP) platforms, account platforms, M&A solutions are just a few applications that historically hold huge structured data stockpiles. Many of these systems are regulated (e.g., HR, ERP) and therefore the data must be kept or retained for a specific amount of time and, in some cases, indefinitely.

The security and governance capabilities of individual software distributions don’t always meet all the requirements for granular access control and emerging governance requirements. – Doug Henschen

Unstructured data is data not stored in an application. Excel spreadsheets, PowerPoint presentations, and Word docs are classic examples of unstructured data. Unstructured data is often found in reports generated from structured data systems. Unstructured data is usually ten times larger by volume than structured data. The reason for this is simple: saving copies of important files in several places makes employees feel secure. Email historically accounts for the largest amount of unstructured data in an organization. Think about it: employees email an important or sensitive document to ensure everyone has a copy then save the email in a PST file or in a folder on their laptop. There could be hundreds of copies of a single file containing highly sensitive data located in hundreds of locations across the network.

Ominous Approach of Data Lakes and Cloud Solutions

A trend in business today is to seek value in all the structured data that organizations store. Industries from real estate to waste management have discovered hidden value in the data they are collecting. Some may think this trend started in the finance industry but they would be wrong. The rise of data started with the focus on analytics that Google and Facebook pioneered. These and similar organizations realized they could increase profitability and customer stickiness if they targeted their advertising at specific users. IP addresses, login times, hovering points, and other data provided unique insight into their users that they could sell to a larger pool of advertisers. That information was also useful to other organizations for a host of other reasons. Remember Cambridge Analytica? This new value data provided was made possible by the first-of-their-kind data lakes, though they were not named that at the time.

The key is the new stuff doesn’t have the benefits that we expected from the old stuff. – Merv Adrian

Data lakes provide organizations the unique opportunity to “dump” data for any number of sources and formats. They are usually unmanaged and open to any account with access to the lake. Regardless of the purpose of the lake (marketing, business insights, archiving, etc.), the data classification characteristics are the same. First, it accepts all data. Second, it is an open platform by design. Third, the majority of these solutions are migrating to or being built in the cloud.

Data Warehouse vs. Data Lake

Data Warehouses are more secure than lakes.  This is because the data entered is cleansed prior to entering.  See below:

Data warehouses cleanse data prior to entering the cloud
Data warehouses cleanse data prior to entering the cloud.

A Data Lake on the other hand takes in ALL data without the step of transformation and restructuring:

A data lake, unlike a data warehouse, takes in ALL data, no questions asked
A data lake, unlike a data warehouse, takes in ALL data, no questions asked.

You cannot afford to overlook the data classification issues that arise when laking all this data, especially from a regulatory compliance standpoint. You need to be involved in the design of the lake(s) from the start. Regardless of how the lake is built, data classification needs to be a consideration in its design. For example, a lake is being designed for archival purposes. Should Highly Confidential data be included? Should Highly Confidential have its own lake, or should it be excluded completely? Applying classification either at the beginning of the data injestion, or at the end when it is being exported from the Process Data Stores is your best strategy.

Knowing what systems are providing data to the lake is important. When data is put into a lake, there are fewer protections available for the governing groups (Cyber, Risk, Compliance, etc.) compared to enterprise databases or relational database systems.

With traditional database management systems, the information security team might handle all the network security and access control protections but do little with the data once it enters the database management system. Data lake structures,however, do not come with all of the governance capabilities and policies associated with a traditional database management system, from basic referential integrity to role-based access and separation of duties. One way to approach data lake security is to think of it as a pipeline with upstream, midstream, and downstream components, according to Merv Adrian. The threat vectors associated with each stage are somewhat different and therefore need to be addressed differently.

Data lakes provide great value to the organization but require a different governance model to maintain classification controls.

Coming Up

In my next post, I will pull all the elements we’ve previously discussed together with the Data Management Controls Matrix.

RELATED POSTS

Stagehand: S1 Episode 8

Stagehand: S1 Episode 8

Carl Timmons was given 24 hours to decide what he wanted to do. This was a tactic. Twenty four hours to sit alone and think about all the money he could want and the price he’d pay for it. And 24 hours to also contemplate what Andre Savin might do to him before...

Stagehand: S1 Episode 7

Stagehand: S1 Episode 7

Andre Savin and Lincoln Palmer had met on several occasions and had the type of relationship you’d expect between two men of their standings on the billionaire scale. Contemptuous but also understanding. They were both driven by the same desire—access to...

Stagehand: S1 Episode 6

Stagehand: S1 Episode 6

Belfast, New York - 1889 They called him The Boston Strong Boy—arguably the first real boxing star and one of the highest paid athletes of his time.  He’d always been good at school. He attended Boston College where his parents thought he might pursue...

What Is Zero Trust Anyway?

What Is Zero Trust Anyway?

About three minutes into planning this post, I had one of those “god, I am old” moments. Here is why I had the moment. I have worked in cybersecurity since 1994. My first job was at a big 3 working for the U.S. government through one of the world’s...

Stagehand: S1 Episode 5

Stagehand: S1 Episode 5

Kuwait, 1990 I’m launched out of a submarine a few miles off the coast of Kuwait City. When I swim to shore, I quickly change into my dry land clothes—a full burka. I was a six-foot-one Marine posing as a good Muslim woman. The catch, beneath the modest...

Ransomware: When Policy Matters Most

Ransomware: When Policy Matters Most

Most CISOs divide their approach to cyber defense into three pillars: people, technology, and processes. These pillars define a cybersecurity program’s defensive architecture and arsenal, available assets, and policies and procedures that together inform...

Selling to a CISO? Practice Empathy, Not Salesmanship

Selling to a CISO? Practice Empathy, Not Salesmanship

The cyber security marketplace is hot. Ask any candidate for a cybersecurity role. Better yet, ask any supplier to CISOs. The supplier audience is especially vast, and it’s continuing to grow. Just three years ago, there were estimated to be less than 2,000...

The Risk of Measuring Risk

The Risk of Measuring Risk

Automated measuring of control effectiveness is a very good idea conceptually. When you can combine control gaps with relevant threat information, you get a very good picture about the actual technical cyber risks your business faces. If done correctly, it provides...

Stagehand: S1 Episode 4

Stagehand: S1 Episode 4

Keith and I left the scene like we found it: the two kidnappers dead on the floor, their shotgun up against the wall, and the rope used to tie up Carl Timmons sprawled out on the floor. We tipped off local law enforcement and were gone before they arrived, leaving no...

SecOps Needs More Democratization, Not Less SOC

SecOps Needs More Democratization, Not Less SOC

An increasing complexity of technologies, as well as an increasing number of failures and attacks followed by an increasing dependency on business goals is changing the way we run Security Operations Centers. I previously discussed the concept of a Fusion Center as an...

Measuring a Cyber Awareness Culture

Measuring a Cyber Awareness Culture

Until recently, cyber awareness metrics have been treated by many as a tick-box exercise driven by regulations. The regulator requires x number of hours of cyber awareness training per employee per year, and once that is done, the organisation ticks a box and waits...

Good Enough Isn’t Good Enough Anymore

Good Enough Isn’t Good Enough Anymore

The cyber risks we face today are more than we faced previously but also fundamentally different in several respects. Our adversaries are more adept and their tools and tactics more protean in capability.  In light of these increasing challenges, our cyber defenses...

Stagehand: S1 Episode 3

Stagehand: S1 Episode 3

Cyprus ~ 2006 Ali Hassan was a low-level operative in Hezbollah, but we had it on solid authority that he knew where three high-level leaders of the terrorist organization were hiding. Keith arrived fifty-seven hours into Hassan’s interrogation and by the looks of it,...

Mitre Disrupting Advanced Persistent Threats
Share This