Article

Data classification guide: What is data classification?

Productivity
Time to read: 12 minutes

Improvements in data classification capabilities have resulted in an expansion of use cases; it is used not just for organizing and making information accessible but to support users in comparing and analyzing data. Data classification is also part of security initiatives; for instance, it can help protect sensitive information by enabling controls that direct the appropriate security responses based on the type of data being retrieved, transmitted, or copied to prevent unauthorized access.

What is data classification?

Data classification is the process of separating, organizing, and tagging data into relevant groups or classes.

The objective of data classification is to make information easier to locate, access, sort, store, and protect for future use.

Data classification is critical for risk management, compliance, and data security, as it helps sort information based on the level of sensitivity, the risks it presents, handling requirements, and access limitations.

The type of data dictates its data classification. While any number of categories can be used for data classification, the following are the most commonly used. Most organizations follow these standards to ensure consistency and avoid complexity and confusion. It should be noted that several of these categories are sometimes bundled under the umbrella category of sensitive information.

  1. Confidential or restricted
    Confidential data, also referred to as restricted data, may only be accessed by limited individuals or groups. Access to confidential information usually requires special authorization or clearance and requires data protection (e.g., encryption). Examples of confidential data include:
  2. Internal
    Internal data is information related to a specific organization and is meant for the exclusive use of individuals associated with the organizations (e.g., employees or contractors). Access to internal data generally has relatively low-security protections. Examples of internal data are:
  3. Private
    Private data is primarily personal information. Not all private data is protected by law, but it usually has basic protections, such as passwords or biometric access restrictions. Private data protected by law is personally identifiable information (PII). Examples of private data include:
  4. Proprietary
    Proprietary data is confidential or restricted data associated with a specific organization. In most cases, proprietary data gives the organization a competitive edge or unique differentiation. It requires data protections in line with those for confidential or restricted data. Examples of proprietary data are:
  5. Public
    Public data is information that is in the public domain. This type of data can be used and distributed without restrictions on its use (i.e., read, research, review, and store) and does not require data protection. Examples of public data include:
  6. Biometric identifiers (e.g., fingerprints or voice prints)
  7. Certification or license numbers
  8. Credit card numbers and expiration dates
  9. Debit card personal identification numbers
  10. Employee records
  11. Financial records
  12. Insurance provider information
  13. Medical and health records (i.e., protected health information or PHI)
  14. Social Security Numbers
  15. State-issued identification card numbers or driver’s license numbers
  16. Student records
  17. Tax information
  18. Vehicle identification numbers (VINs)
  19. Archived files
  20. Corporate guidelines
  21. Email and messenger platforms
  22. Employee manuals
  23. Internal email messages or memos
  24. Internet protocol (IP) addresses
  25. Cellphone content
  26. Emails
  27. Employee identification numbers
  28. Online browsing history
  29. Personal contact information (e.g., email addresses, home addresses, and phone numbers)
  30. Research data
  31. Student identification numbers
  32. Trade secrets (e.g., formulas, models, and processes)
  33. Budget spreadsheets
  34. Business plans
  35. Revenue projections
  36. Technical specifications of a new product
  37. Birth and death records
  38. Company executive information
  39. Court records
  40. First and last names
  41. Incorporation dates
  42. License plate numbers
  43. Licensing records
  44. Press releases

There are three main types of data classification according to industry standards—content-based, context-based, and user-based. The use cases and types of data drive selection of the best approach.

  1. Content-based
    With content-based data classification, software is used to inspect and identify the content of files. A category is assigned based on the type of content in a file, such as confidential, internal, private, proprietary, public, restricted, or sensitive.
  2. Context-based
    Context-based data classification uses software to review several factors related to the information, such as application, location, and creator. These variables are evaluated to find indirect indicators of what category the information falls into, such as proprietary or restricted.
  3. User-based
    Information is assessed and categorized manually based on the judgment of a knowledgeable user. This type of data classification is often initiated by the creator of a document and sometimes reviewed before the document is released.

Organizations should develop and maintain data classification policies, procedures, and guidelines that define categories and criteria.

Policies should also detail the roles and responsibilities of employees with regard to classifying and handling information, such as sharing and storage.

Why the enterprise needs data classification

There are many reasons why the enterprise needs data classification, including the following.

Access to additional data

When implemented systematically, data classification helps organizations manipulate, track, and analyze all the data needed for their strategies, goals, and objectives.

Assurance of confidentiality, availability, and integrity

The CIA triad is a guiding principle for most data security programs. Data classification facilitates this by making it easy to understand what types of information an organization has and ensuring that it meets CIA triad requirements.

Enhanced data security and privacy

Data classification is foundational for effective data privacy and security. It gives organizations visibility into the types of data they have and allows them to quickly sort it and apply the appropriate access controls to meet internal security and external compliance requirements.

Benefits of data classification

  1. Ensures compliance with regulatory requirements
  2. Expedites analysis and discovery of insights
  3. Facilitates data governance
  4. Helps organizations understand:
  5. Improves data security and privacy
  6. Increases efficacy of access management and control
  7. Minimizes duplications of data
  8. Mitigates risk
  9. Reduces data management costs
  10. Supports cyber resilience
  11. What sensitive data they have
  12. Where sensitive data resides
  13. Who can access, modify, and delete sensitive data
  14. The impact of the sensitive data being leaked, destroyed, or improperly modified

Data classification challenges

Understanding the challenges of data classification helps overcome them and realize the benefits. The most commonly cited challenges of data classification include the following.

Cost control

Data classification is notoriously difficult when it comes to budgeting. Increasing data volume, changing security policies, and inconsistent management requirements driven by types of classifications can vary widely, with costs spiraling quickly.

Data volume

While most data classification systems can handle large volumes of data, issues still arise. Although the data can be classified, it can be costly to store and manage – especially sensitive information, which requires enhanced data protection.

Incorrect data classification

Technologies used for data classification automation can mislabel data, fail to recognize duplicate data, or lack the information needed to correctly classify information that is in unrecognized file formats.

Missing association

Data classification tools can fail to detect indirect associations that change the classification level for a file. For instance, a name and file with medical study data may not be sensitive, but when combined, they become protected health information, which is considered sensitive data.

Data classification and the data lifecycle

Data lifecycle management processes control information from creation to destruction. Embedding data classification into the data lifecycle enhances visibility into information types to enable proper handling at every stage, to ensure that requirements for data security, privacy, and compliance are met.

Data classification begins with creation and should continue to be a consideration as data moves through the lifecycle with ongoing evaluations of and adjustments to the classification level.

Data classification naturally fits into each of the six stages of the data lifecycle.

  1. Creation
    Data is continuously generated in multiple formats, such as documents, emails, social media, and websites. It should be classified when it is saved.
  2. Use
    People and systems use data, usually with access controlled based on a correlation of roles, authorizations, and classification levels.
  3. Storage
    Data is stored with access controls and encryption employed according to data classification levels.
  4. Sharing
    Rules for sharing data between employees, customers, partners, systems, and applications should be governed according to data classification.
  5. Archiving
    The type and protections required for data archives should be based on the type of data classification.
  6. Destruction
    At some point, most data, regardless of classification, should be destroyed. The destruction schedule should take the data classification level into account.

Data classification and data discovery

Data discovery locates information that is often in far-flung silos; data classification then identifies it and tags it according to its associated category. Combining data discovery and data classification gives organizations the visibility needed to operationalize and protect information effectively.

Data classification and discovery apply to all information in the three data types:

  1. Structured data
    Structured data is text-based information (e.g., names, addresses, order details, or medical records) that is collected in predefined data models, such as rows and columns, and stored in systems, such as relational databases or data warehouses.
  2. Unstructured data
    With unstructured data, there is no defined data model for the information (e.g., email messages, videos, or transcripts), that is stored in applications, data warehouses, and data lakes.
  3. Semi-structured data
    Semi-structured data is loosely organized and tagged (e.g., server logs and messages organized in files or with hashtags) and is usually stored in applications or relational databases.

Use cases for data classification and discovery include:

  1. Audits
    During an audit, organizations can be required to produce many types of information. Data classification ensures that information is quickly and easily accessible. Data discovery helps users find the specific information that is needed.
  2. Cloud migrations
    When transferring data from on-premises to the cloud, data discovery and classification ensure that all data types are moved to the right type of storage and made accessible to authorized users (i.e., machines and people).
  3. Data Subject Access Requests (DSARs)
    DSARs are a requirement under the European Union’s General Data Protection Regulation (GDPR). An individual can submit a DSAR to a company that requires the organization to disclose what personal data they have collected, how that data is used, how it is intended to be used, and why it was collected.

    Similar requests can be made according to data privacy laws in the United States and other countries. Data discovery and data collection are vital for responding to DSARs in a timely manner.
  4. Mergers and acquisitions
    Data classification and discovery play critical roles when integrating data from two or more organizations. These processes help ensure data protection and minimize duplications.

Organizations realize countless benefits when using data classification and discovery, including:

  1. Collecting data from databases and silos and consolidating it into a single source
  2. Controlling data ingress and egress through networks, applications, systems, and devices
  3. Detecting misuse of all data
  4. Ensuring data access controls are applied correctly
  5. Faster identification of data protection gaps
  6. Improving data analysis and resulting insights
  7. Increasing visibility into data across the organization
  8. Supporting compliance
  9. Understanding the what, where, and why of data

How data classification works

The main steps in the data classification process are:

  1. Identify and gather the data.
  2. Define classification levels (e.g., sensitive, confidential / restricted, private, proprietary, and public).
  3. Categorize the data according to classification, measuring the sensitivity of information according to three key criteria at three levels of severity for implications of unauthorized access (i.e., low, moderate, high):
  4. Apply security controls and monitoring commensurate with the data classification level assigned to the information.
  5. Implement processes for ongoing data classification reviews and updates to ensure accuracy and relevance, making changes as needed.
  6. Confidentiality
  7. Integrity
  8. Availability

Data classification optimizes ROI and results

An oft-heard gripe about data classification is that it is difficult, but this is one of the easiest challenges to overcome. Data classification is difficult when organizations try to handle it manually.

However, with software, data classification is largely automated, with policies seamlessly embedded into user workflows. In addition, sensitive data “hidden” in silos can be automatically detected and appropriately classified.

Organizations that embrace data classification see a rapid return on their investment with time savings, increased productivity, and optimized security. Implementing and using tools and following best practices allows organizations to take full advantage of data classification, improve access to valuable information, and uplevel data protection.

In addition, data classification ensures that organizations meet stringent and difficult-to-achieve data protection requirements set forth by an increasing number of laws, regulations, and standards.

Unleash the power of unified identity security

Ensure the security of every enterprise identity, human or machine

Mark and Sumit

S1 : E2

Identity Matters with Sumit Dhawan, Proofpoint CEO

Join Mark McClain and Sumit Dhawan to understand the future of cybersecurity and how security teams can support CISO customers in the midst of uncertainty.

Play podcast
Mark and Ron

S1 : E1

Identity Matters with Ron Green, cybersecurity fellow at Mastercard

Join Mark McClain and Ron Green to understand the future of cybersecurity and the critical role identity security plays in safeguarding our digital world.

Play podcast
Dynamic Access Roles

Dynamic Access Roles

Build the next generation role and access model with dramatically fewer role and flexibility

View the solution brief