Establishing Data Analysis - Zero Trust Activity 4.1.1

Decoding the Data Landscape: Establishing the Foundation for Zero Trust Data Protection (Activity 4.1.1)

At the heart of nearly every security incident is compromise or misuse of data. To protect data, you first have to understand it: what it is, where it lives, how sensitive it is, and how it flows. This crucial foundational step brings us to Zero Trust Activity 4.1.1: Data Analysis.

This activity kicks off the Data pillar by focusing on the methodology and governance for understanding your data. It mandates that the DoD Enterprise develop algorithm(s) for components to map data for tagging and labeling. Here, “algorithms” refers to the precise, repeatable methodologies or models (which can include machine learning models) used to categorize and label data consistently across the enterprise. A governing body for oversight is established to ensure these methodologies are correctly applied and that data is managed according to enterprise standards. Components then take on the task of categorizing and analyzing their own data using these defined approaches.

This activity is vital because you cannot enforce data-centric Zero Trust policies (like restricting access to sensitive data based on classification, or preventing its unauthorized exfiltration) if you don’t first have a clear, consistent, and governed understanding of your data’s nature and sensitivity.

The outcomes for Activity 4.1.1 highlight the establishment of this foundational data governance:

Algorithms are entered into an algorithm registry with appropriate tagging and labeling set by the Enterprise to allow search and retrieval as appropriate (e.g., accommodating data catalog risk alignment).
Component data catalog is updated with data types for each application and service based on data classification levels.

The ultimate end state underscores the power of this structured data understanding: Data analysis ensures data protection and reduces risk. All problems have a data analysis algorithm registered in a repository with associated data indicated, and the oversight governance body has awareness that is Visible, Accessible, Understandable, Linked, Trusted, Interoperable, and Secure (VAULTIS) compliant. This means data is not just categorized, but its analysis is transparent and trustworthy across the enterprise.

Solutions for Achieving Data Analysis

Implementing Activity 4.1.1 is primarily a governance and methodology-driven activity, focusing on defining how data is analyzed and categorized, and establishing the oversight for that process. Technologies play a supporting role in operationalizing these methodologies.

Developing Data Mapping, Tagging, and Labeling Algorithms (Methodologies):
1. Process: The Enterprise defines the precise methodologies or models that Components will use to identify, map, tag, and label data. These “algorithms” can be:
  1. Rule-based: Defining specific patterns (e.g., regex for Social Security Numbers), keywords (e.g., “confidential project X”), or metadata (e.g., file owner, creation date) that indicate a certain classification.
  2. Machine Learning Models: For more advanced and scalable classification (as highlighted in Activity 6.3.1), defining the types of ML models (e.g., natural language processing for unstructured text, statistical analysis for numerical data) and the training data required.
2. Standardization: Ensure these methodologies are standardized across the enterprise for consistent application.
Establishing a Governing Body for Oversight:
1. Process: Create a cross-functional governing body (e.g., a Data Governance Council) with representatives from Enterprise data owners, security, legal, compliance, and IT.
2. Role: This body is responsible for:
  1. Approving data classification standards and methodologies.
  2. Overseeing the implementation of data analysis algorithms.
  3. Resolving disputes over data classification.
  4. Ensuring VAULTIS compliance across the enterprise (making data Visible, Accessible, Understandable, Linked, Trusted, Interoperable, and Secure).
Registering Algorithms in a Central Repository:
1. Process: Once an algorithm/methodology for data analysis and classification is developed and approved, formally register it in an “algorithm registry.” This repository makes the approved methodologies discoverable and reusable across Components.
2. Associated Data: Link the algorithms to the types of data they are designed to analyze and the tags/labels they apply.
Categorizing and Analyzing Component Data:
1. Process: DoD Components utilize the Enterprise-defined algorithms and methodologies to systematically scan, categorize, and analyze their existing data across applications and services.
2. Updating Data Catalog: As data is categorized and classified, ensure that your Component data catalogs (and ultimately, the Enterprise data catalog/CMDB) are updated with the relevant data types and classification levels for each application and service. This provides centralized visibility into data sensitivity.

How Trellix Can Achieve the Desired Outcomes and End State:

Trellix, particularly through its Trellix Data Loss Prevention (DLP) suite, provides crucial technical capabilities that operationalize the methodologies (“algorithms”) defined in Activity 4.1.1, helping to achieve the outcomes and end state.

Implementing Data Analysis Algorithms: Trellix DLP Discover and Endpoint products are the tools that execute the “algorithms” defined by the Enterprise for data mapping, tagging, and labeling.
- Rule-Based: Trellix DLP allows you to configure specific rules based on keywords, regular expressions, file properties, and data fingerprinting (all aspects of defined algorithms).
- ML-Powered Classification: As highlighted in Activity 6.3.1, Trellix DLP uses machine learning to automatically identify and classify sensitive data (e.g., PII, intellectual property) across endpoints, network shares, and cloud repositories. This directly implements the ML-based “algorithms” for data classification.
Updating Component Data Catalog: Trellix DLP Discover can scan network shares, cloud storage, and endpoints, locate sensitive data, classify it according to the defined levels, and then generate reports or integrate with data catalogs (like your Component data catalog) to update them with data types and classification levels for associated applications and services. This contributes directly to “Component data catalog is updated with data types for each application and service based on data classification levels.”
Ensuring Data Protection and Reducing Risk: By automatically identifying and classifying data, Trellix DLP enables the enforcement of policies to protect that data (e.g., preventing unauthorized movement or requiring encryption), directly contributing to “data analysis ensures data protection and reduces risk” as part of the end state.
Supporting Oversight Governance (VAULTIS): Trellix DLP provides extensive logging and reporting on data classification and data handling events. This telemetry can be fed into your SIEM (e.g., Elastic Security) to contribute to the “Visible, Accessible, Understandable, Linked, Trusted, Interoperable, and Secure (VAULTIS)” awareness for the oversight governance body, by showing what data is classified, where it is, and how it’s being accessed/used.

For the Technical Buyer:

Activity 4.1.1 is the foundational step in understanding your data within a Zero Trust framework. It’s about developing the algorithms for consistently mapping, tagging, and labeling data across your enterprise, and establishing the governing body to oversee this process. For technical buyers, success here means contributing to the precise definition of these data analysis methodologies and ensuring that tools like Trellix DLP are leveraged to operationalize these algorithms, automatically categorizing your data. This activity is paramount for updating your data catalogs with accurate classification levels, enabling precise data protection policies, reducing risk, and ensuring your data management aligns with VAULTIS principles.

Pillar: Data

Capability: 4.1 Data Catalog Risk Alignment

Activity: 4.1.1 Data Analysis

Phase: Target Level

Predecessor(s): None

Successor(s): None

Technology Partners