Improvements in data classification capabilities have resulted in an expansion of use cases; it is used not just for organizing and making information accessible but to support users in comparing and analyzing data. Data classification is also part of security initiatives; for instance, it can help protect sensitive information by enabling controls that direct the appropriate security responses based on the type of data being retrieved, transmitted, or copied to prevent unauthorized access.
What is data classification?
Data classification is the process of separating, organizing, and tagging data into relevant groups or classes.
The objective of data classification is to make information easier to locate, access, sort, store, and protect for future use.
Data classification is critical for risk management, compliance, and data security, as it helps sort information based on the level of sensitivity, the risks it presents, handling requirements, and access limitations.
The type of data dictates its data classification. While any number of categories can be used for data classification, the following are the most commonly used. Most organizations follow these standards to ensure consistency and avoid complexity and confusion. It should be noted that several of these categories are sometimes bundled under the umbrella category of sensitive information.
- Confidential or restricted
Confidential data, also referred to as restricted data, may only be accessed by limited individuals or groups. Access to confidential information usually requires special authorization or clearance and requires data protection (e.g., encryption). Examples of confidential data include: - Internal
Internal data is information related to a specific organization and is meant for the exclusive use of individuals associated with the organizations (e.g., employees or contractors). Access to internal data generally has relatively low-security protections. Examples of internal data are: - Private
Private data is primarily personal information. Not all private data is protected by law, but it usually has basic protections, such as passwords or biometric access restrictions. Private data protected by law is personally identifiable information (PII). Examples of private data include: - Proprietary
Proprietary data is confidential or restricted data associated with a specific organization. In most cases, proprietary data gives the organization a competitive edge or unique differentiation. It requires data protections in line with those for confidential or restricted data. Examples of proprietary data are: - Public
Public data is information that is in the public domain. This type of data can be used and distributed without restrictions on its use (i.e., read, research, review, and store) and does not require data protection. Examples of public data include: - Biometric identifiers (e.g., fingerprints or voice prints)
- Certification or license numbers
- Credit card numbers and expiration dates
- Debit card personal identification numbers
- Employee records
- Financial records
- Insurance provider information
- Medical and health records (i.e., protected health information or PHI)
- Social Security Numbers
- State-issued identification card numbers or driver’s license numbers
- Student records
- Tax information
- Vehicle identification numbers (VINs)
- Archived files
- Corporate guidelines
- Email and messenger platforms
- Employee manuals
- Internal email messages or memos
- Internet protocol (IP) addresses
- Cellphone content
- Emails
- Employee identification numbers
- Online browsing history
- Personal contact information (e.g., email addresses, home addresses, and phone numbers)
- Research data
- Student identification numbers
- Trade secrets (e.g., formulas, models, and processes)
- Budget spreadsheets
- Business plans
- Revenue projections
- Technical specifications of a new product
- Birth and death records
- Company executive information
- Court records
- First and last names
- Incorporation dates
- License plate numbers
- Licensing records
- Press releases
There are three main types of data classification according to industry standards—content-based, context-based, and user-based. The use cases and types of data drive selection of the best approach.
- Content-based
With content-based data classification, software is used to inspect and identify the content of files. A category is assigned based on the type of content in a file, such as confidential, internal, private, proprietary, public, restricted, or sensitive. - Context-based
Context-based data classification uses software to review several factors related to the information, such as application, location, and creator. These variables are evaluated to find indirect indicators of what category the information falls into, such as proprietary or restricted. - User-based
Information is assessed and categorized manually based on the judgment of a knowledgeable user. This type of data classification is often initiated by the creator of a document and sometimes reviewed before the document is released.
Organizations should develop and maintain data classification policies, procedures, and guidelines that define categories and criteria.
Policies should also detail the roles and responsibilities of employees with regard to classifying and handling information, such as sharing and storage.
Why the enterprise needs data classification
There are many reasons why the enterprise needs data classification, including the following.
Access to additional data
When implemented systematically, data classification helps organizations manipulate, track, and analyze all the data needed for their strategies, goals, and objectives.
Assurance of confidentiality, availability, and integrity
The CIA triad is a guiding principle for most data security programs. Data classification facilitates this by making it easy to understand what types of information an organization has and ensuring that it meets CIA triad requirements.
Enhanced data security and privacy
Data classification is foundational for effective data privacy and security. It gives organizations visibility into the types of data they have and allows them to quickly sort it and apply the appropriate access controls to meet internal security and external compliance requirements.
Benefits of data classification
- Ensures compliance with regulatory requirements
- Expedites analysis and discovery of insights
- Facilitates data governance
- Helps organizations understand:
- Improves data security and privacy
- Increases efficacy of access management and control
- Minimizes duplications of data
- Mitigates risk
- Reduces data management costs
- Supports cyber resilience
- What sensitive data they have
- Where sensitive data resides
- Who can access, modify, and delete sensitive data
- The impact of the sensitive data being leaked, destroyed, or improperly modified
Data classification challenges
Understanding the challenges of data classification helps overcome them and realize the benefits. The most commonly cited challenges of data classification include the following.
Cost control
Data classification is notoriously difficult when it comes to budgeting. Increasing data volume, changing security policies, and inconsistent management requirements driven by types of classifications can vary widely, with costs spiraling quickly.
Data volume
While most data classification systems can handle large volumes of data, issues still arise. Although the data can be classified, it can be costly to store and manage – especially sensitive information, which requires enhanced data protection.
Incorrect data classification
Technologies used for data classification automation can mislabel data, fail to recognize duplicate data, or lack the information needed to correctly classify information that is in unrecognized file formats.
Missing association
Data classification tools can fail to detect indirect associations that change the classification level for a file. For instance, a name and file with medical study data may not be sensitive, but when combined, they become protected health information, which is considered sensitive data.
Data classification and the data lifecycle
Data lifecycle management processes control information from creation to destruction. Embedding data classification into the data lifecycle enhances visibility into information types to enable proper handling at every stage, to ensure that requirements for data security, privacy, and compliance are met.
Data classification begins with creation and should continue to be a consideration as data moves through the lifecycle with ongoing evaluations of and adjustments to the classification level.
Data classification naturally fits into each of the six stages of the data lifecycle.
- Creation
Data is continuously generated in multiple formats, such as documents, emails, social media, and websites. It should be classified when it is saved. - Use
People and systems use data, usually with access controlled based on a correlation of roles, authorizations, and classification levels. - Storage
Data is stored with access controls and encryption employed according to data classification levels. - Sharing
Rules for sharing data between employees, customers, partners, systems, and applications should be governed according to data classification. - Archiving
The type and protections required for data archives should be based on the type of data classification. - Destruction
At some point, most data, regardless of classification, should be destroyed. The destruction schedule should take the data classification level into account.
Data classification and data discovery
Data discovery locates information that is often in far-flung silos; data classification then identifies it and tags it according to its associated category. Combining data discovery and data classification gives organizations the visibility needed to operationalize and protect information effectively.
Data classification and discovery apply to all information in the three data types:
- Structured data
Structured data is text-based information (e.g., names, addresses, order details, or medical records) that is collected in predefined data models, such as rows and columns, and stored in systems, such as relational databases or data warehouses. - Unstructured data
With unstructured data, there is no defined data model for the information (e.g., email messages, videos, or transcripts), that is stored in applications, data warehouses, and data lakes. - Semi-structured data
Semi-structured data is loosely organized and tagged (e.g., server logs and messages organized in files or with hashtags) and is usually stored in applications or relational databases.
Use cases for data classification and discovery include:
- Audits
During an audit, organizations can be required to produce many types of information. Data classification ensures that information is quickly and easily accessible. Data discovery helps users find the specific information that is needed. - Cloud migrations
When transferring data from on-premises to the cloud, data discovery and classification ensure that all data types are moved to the right type of storage and made accessible to authorized users (i.e., machines and people). - Data Subject Access Requests (DSARs)
DSARs are a requirement under the European Union’s General Data Protection Regulation (GDPR). An individual can submit a DSAR to a company that requires the organization to disclose what personal data they have collected, how that data is used, how it is intended to be used, and why it was collected.
Similar requests can be made according to data privacy laws in the United States and other countries. Data discovery and data collection are vital for responding to DSARs in a timely manner. - Mergers and acquisitions
Data classification and discovery play critical roles when integrating data from two or more organizations. These processes help ensure data protection and minimize duplications.
Organizations realize countless benefits when using data classification and discovery, including:
- Collecting data from databases and silos and consolidating it into a single source
- Controlling data ingress and egress through networks, applications, systems, and devices
- Detecting misuse of all data
- Ensuring data access controls are applied correctly
- Faster identification of data protection gaps
- Improving data analysis and resulting insights
- Increasing visibility into data across the organization
- Supporting compliance
- Understanding the what, where, and why of data
How data classification works
The main steps in the data classification process are:
- Identify and gather the data.
- Define classification levels (e.g., sensitive, confidential / restricted, private, proprietary, and public).
- Categorize the data according to classification, measuring the sensitivity of information according to three key criteria at three levels of severity for implications of unauthorized access (i.e., low, moderate, high):
- Apply security controls and monitoring commensurate with the data classification level assigned to the information.
- Implement processes for ongoing data classification reviews and updates to ensure accuracy and relevance, making changes as needed.
- Confidentiality
- Integrity
- Availability
Data classification optimizes ROI and results
An oft-heard gripe about data classification is that it is difficult, but this is one of the easiest challenges to overcome. Data classification is difficult when organizations try to handle it manually.
However, with software, data classification is largely automated, with policies seamlessly embedded into user workflows. In addition, sensitive data “hidden” in silos can be automatically detected and appropriately classified.
Organizations that embrace data classification see a rapid return on their investment with time savings, increased productivity, and optimized security. Implementing and using tools and following best practices allows organizations to take full advantage of data classification, improve access to valuable information, and uplevel data protection.
In addition, data classification ensures that organizations meet stringent and difficult-to-achieve data protection requirements set forth by an increasing number of laws, regulations, and standards.