Article

What is data management? Definition, importance, and challenges

ComplianceProductivity
Time to read: 12 minutes

Data management programs and solutions benefit users and administrators. From maximizing value extraction from data to meeting regulatory compliance demands, a data management strategy enables and improves many functions across the enterprise.

What is data management?

Data management is a broad practice that focuses on helping organizations ensure that structured and unstructured data from an array of sources is properly handled to ensure that it is accessible to authorized users (e.g., people, systems, and applications). A wide range of processes, policies, and procedures are included under data management, including:

  • Archiving and destroying data in accordance with retention schedules and compliance requirements
  • Collecting, validating, ingesting, processing, organizing, using, storing, and maintaining data
  • Ensuring data privacy
  • Governing how data is used and accessed by all users
  • Integrating different types of data from disparate sources
  • Maintaining data availability for day-to-day uses and disaster recovery

The data management process includes multiple functions that work in concert to ensure that an organization’s data is accessible, accurate, and available.

Six key steps in the data management process include:

  1. Design and develop a data architecture that details the types and configurations for data storage repositories and related systems.
  2. Create data models that map workflows, data relationships, and interdependencies for different use cases.
  3. Capture information in a data repository as it is generated and processed.
  4. Integrate data collected from disparate systems in a data warehouse or data lake for analysis.
  5. Perform data quality checks to identify and correct data errors and inconsistencies.
  6. Implement data governance, including establishing data definitions and usage policies.

Types of data management

The scope of data management is expansive. Programs include many components, including the ones below.

Big data management

Data management includes ensuring that the right tools and systems are in place to collect and process big data, including data integration, data storage, and data analysis solutions optimized for big data.

Data architecture

A key aspect of effective data management is taking time to create a data architecture that addresses the organization’s requirements. Documenting the data architecture describes the organization’s data assets and infrastructure (e.g., databases, data lakes, data warehouses, and servers) as well as guidelines for managing data flows. It describes an organization’s data assets and provides a blueprint for creating and managing data flow.

Data catalogs

Data catalogs store and organize data based on back-end information called metadata and use that metadata to make information stores searchable. For example, businesses can store inventory information in a data catalog and tag entries with labels that make it easier to find product information.

Data governance

Data governance supports data management by providing policies and procedures that help organizations manage data access, integrity, security, and usage.

Data integrations

Data integrations are used to pull disparate data from different sources into a single repository.

Data lifecycle management

Data management includes data lifecycle management to monitor data from collection to deletion, including developing policies for every stage.

Data migration

Data management includes the processes used to move data from one repository to another. Data migration tools help minimize errors and formatting issues.

Data modeling

Data modeling is used to create visualizations of data flows and relationships between different types of data to support data management teams.

Data pipelines

Data pipelines are used to transfer information between systems automatically.

Data processing

During the data processing phase of data management, raw data is ingested from a range of data sources (e.g., connected devices, forms, mobile applications, sensors, and web APIs (application programming interfaces)) and is aggregated, filtered, merged, and exported to the desired format for a user.

Data quality management

Data quality management controls ensure accurate, reliable, and consistent information. As part of data management, new and existing data sets are reviewed to verify that data quality standards are being met. Among the checks that are performed are:

  1. Is the information missing, or is the record complete?
  2. Does the information meet quality criteria?
  3. Is the information accurate?
  4. Is formatting consistent across systems?

Data security

Data management ensures that all aspects of data security are in place, including those to:

  1. Authenticate and authorize users
  2. Enforce data access controls
  3. Ensure that the stored data adheres to all regulatory mandates
  4. Prevent accidental data movement or deletion
  5. Prevent unauthorized access to data, data corruption, and theft of data
  6. Protect data on internal and external systems, including personal devices
  7. Secure network access
  8. Verify that the data centers meet established security requirements

Data storage

Fundamental to data management, data storage is the process of securely saving data before and after it has been processed. The type and purpose of the data will dictate the storage system (e.g., data lake for unstructured data or data warehouse for structured data).

The importance of data management

Every function in an organization depends on seamless access to quality data. Data management enables this and delivers a number of related benefits that make it an important part of the enterprise’s operational toolkit.

In addition to its core function of ensuring the accessibility, accuracy, and availability of information, data management is also important for the following reasons.

  1. Helps organizations adhere to regulatory compliance requirements related to data privacy and protection.
  2. Avoids data collection violations, such as those set forth by the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), that can lead individuals to seek legal recourse for infractions, including:
  3. Breaks down incompatible data silos and data dependencies with integrations across disparate data owners, data sets, and functions (e.g., finance, human resources, marketing, and sales).
  4. Eliminates inconsistent data sets and data quality problems that undercut the output from business intelligence (BI) and analytics applications by reducing the reliability of data analysis results.
  5. Enhances customer experience with streamlined interactions, personalization, and customization.
  6. Improves collaboration between groups by helping to create a centralized data view across datasets.
  7. Increases revenue by increasing the efficacy and accuracy of analytics for driving insights that can optimize operations, reduce costs, and grow profits.
  8. Minimizes exposure to data breach threats and privacy violation risks that can result in fines, legal issues, negative publicity, and long-term reputational damage.
  9. Results in improved operational efficacy, insights into trends, and decision-making that can give organizations a competitive advantage over their rivals.
  10. Supports the management of massive quantities of structured and unstructured data, keeping it from becoming unwieldy and difficult to navigate.
  11. Capturing data without consent
  12. Continuing to store data after an erasure request
  13. Exercising poor control over data location and use

Data management challenges

At the root of many data management challenges organizations face is the continuously growing variety, velocity, and volume of data available to and generated by organizations. The most often cited challenges related to data management are the following.

Challenges processing data for analysis

The volume of data and its disparate formats make processing a data management bottleneck. Unstructured data is particularly challenging. However, slow or limited processing inhibits the use of data for valuable analytic functions.

Creation of data silos

Data management processes strive to prevent data silos, but this is increasingly difficult as data volumes grow and new systems are added.

Difficulty maintaining high response time performance levels for data queries

Data management teams often struggle to keep indexes updated to reflect changes in queries and avoid negative impacts on performance.

Failure to keep up with changing compliance requirements related to data management

Laws, industry regulations, and other mandates related to data management are constantly being created and updated. Most of these are complex and often multijurisdictional, making it challenging to keep data management practices aligned to the changing rules.

Challenges maintaining data quality across multiple systems and data types

The mix of structured, semi-structured, and unstructured data can be difficult to integrate and manage in a coordinated manner, which often results in inaccurate and inconsistent data sets across different data systems.

Keeping user training up to date

As regulations and systems change, users must be trained. Without proper training, users risk compliance failures, and adoption of the new systems is slowed or nonexistent.

Lack of insight into what data is available

Data management efforts often falter when dealing with unstructured and semi-structured data generated from connected devices, sensors, video cameras, and social media. While much of this data is captured, data management systems often lack the capability to let users know what information is available, rendering it veritably inaccessible and not useful.

Limited data catalog information makes data difficult to find and access

Data management teams try to maintain data catalogs with glossaries, metadata-driven dictionaries, and data lineage records, but the volume and diversity of data make it difficult to do this for all information. This can result in users being hindered in finding and accessing data.

Multiple systems are required to store different data types

Data management teams must be able to work with multiple types of storage systems, including databases, data warehouses, data lakes, and data lake houses. In addition to accessing these disparate systems, data management teams need to be able to transform the data into the formats required by users quickly.

Ongoing need to optimize for maximum IT agility and the lowest costs

IT teams must strike the right balance between on-premises and cloud data systems to meet shifting requirements related to capabilities and scalability. The pros and cons of where to store data and analyze it are tricky to navigate as teams struggle with a number of issues, from performance and price to security and accessibility.

Data management systems

Data management systems are comprised of many components. Tools that are commonly used to support data management programs include:

  1. Business intelligence
  2. Data analytics
  3. Data fabrics
  4. Data governance, security, and compliance
  5. Data integration, such as extract, transform, and load (ETL), bulk / batch data movement, change data capture, data replication, data virtualization, and data orchestration
  6. Data lakes
  7. Data warehouses
  8. Data lake houses
  9. Database management systems (DBMS), such as relational database management systems (RDBMS), object-oriented database management systems (OODMBS), in-memory databases, and columnar databases
  10. Master data management, which includes data consolidation, data governance, and data quality management
  11. NoSQL systems (e.g., document databases, key-value databases, wide-column stores, and graph databases)

Data management and data privacy

Data privacy is a subset of data management that addresses how personal data is handled to adhere to various regulations, laws, and best practices.

The systems and processes used for data privacy ensure that controls are in place to protect personal data from unauthorized access at rest and in motion and maintain the integrity of personal information.

Data management programs ensure that data privacy rules adhere to requirements for:

  1. Data collection
  2. Data processing
  3. Data portability
  4. Data retention
  5. Data deletion

Data management must take data privacy laws into account. Two of the laws that include stringent requirements for data privacy are the California Consumer Privacy Act (CCPA) and the European Union’s (EU) General Data Protection Regulation (GDPR).

CCPA grants California residents the right to ask organizations what personal data exists about them, find out what data has been given to third parties, and require organizations to delete it upon a resident’s request.

GDPR applies to EU citizens and all companies that conduct business with them. The data privacy rules set forth in GDPR cover EU citizens who are residents in countries that are not part of the EU. GDPR gives individuals the right to determine what data organizations store and request that organizations delete their data.

Data management best practices

Effective data management programs follow many best practices. Data management best practices used by leading organizations include:

  1. Create a discovery layer on top of the data tier to help users identify data and optimize its usability.
  2. Ensure adherence to compliance requirements by using data discovery tools to review data and identify what needs to be protected and monitored.
  3. Facilitate cross-team collaboration.
  4. Leverage a data science environment to automate data transformation work, expediting the development of data models.
  5. Prioritize data governance and data quality.
  6. Use artificial intelligence and machine learning to maintain optimal performance levels by continuously monitoring data storage queries and improving indexes as the queries change.

Optimizing data management for the enterprise

Most organizations require robust data management to flourish. The scale and complexity of a data management program will depend on the type and size of organization, but the objectives are the same: To keep data safe, accessible, and in the best possible condition.

Smart, scalable, seamless identity security

Trusted by 48% of the Fortune 500

Mark and Sumit

S1 : E2

Identity Matters with Sumit Dhawan, Proofpoint CEO

Join Mark McClain and Sumit Dhawan to understand the future of cybersecurity and how security teams can support CISO customers in the midst of uncertainty.

Play podcast
Mark and Ron

S1 : E1

Identity Matters with Ron Green, cybersecurity fellow at Mastercard

Join Mark McClain and Ron Green to understand the future of cybersecurity and the critical role identity security plays in safeguarding our digital world.

Play podcast
Dynamic Access Roles

Dynamic Access Roles

Build the next generation role and access model with dramatically fewer role and flexibility

View the solution brief