
5 minutes read

Document Classification and Management of Unstructured Data: Leveraging Microsoft Purview Sensitive Information Types (SITs)

In this article, our Associate Consultant Robin Groh outlines a systematic approach to setting up SITs in the MS Purview Compliance portal, focusing on building a logical and comprehensive classification system.

Robin Groh
20/02/2024 5:45 AM

In today's data-driven business environment, effectively managing and classifying documents is crucial for operational efficiency and compliance. Microsoft Purview offers a powerful suite of tools to help organizations classify documents that are saved in different Microsoft Applications through SITs. Moving forward, let's take a closer look at the three crucial steps in this approach.

1. Developing a Categorization Framework

The first step in setting up SITs in Purview is to establish a categorization framework that aligns with your organization's structure and operational domains. In this article, we build up an example using three hierarchical levels: Business Domains, Functional Areas, and Document Categories.

  • Business Domains: Identify the different domains within your company, such as Human Resources, Finance, Legal & Procurement, Management, IT, Research and Development, Marketing, and Sales. These domains represent the broadest classification level and serve as the foundation for further categorization.

  • Functional Areas: Within each business domain, identify specific functional areas that further describe the domain's activities. For example, within the Finance domain, functional areas might be Budgeting, Accounting, and Financial Planning, among others.

  • Document Categories: At the most granular level, classify documents based on their content and purpose within each functional area. For instance, within the Accounting area of the Finance domain, categories might include Invoices, Financial Statements, and Tax Documents.

2. Keyword Identification and Classification

After establishing a categorization framework, the next step is to identify and classify keywords that will be used to tag and classify documents within Purview. Start with a set of documents you already know should be categorized, such as financial statements, and conduct a thorough review to identify relevant keywords. It is also advisable to research similar documents online to gather a broader set of examples. Make sure to have enough documents for adequate analysis and testing.

Classify identified keywords into four categories:

  • Must Have: Keywords that are essential for a document to be classified within a specific category.

  • Should Have: Keywords that are commonly found in the document type but are not essential.

  • Could Have: Keywords that might appear in the document but are less common.

  • Must Not Have: Keywords that should not appear in the document, helping to reduce false positives.

3. Logic Construction in Purview

With your keywords categorized, the final step is to build your classification logic within Purview. When configuring SITs, prioritize quality over quantity — focus on the most relevant keywords to ensure accurate classification. Aim to minimize false negatives (incorrectly excluding relevant documents) rather than minimizing false positives (incorrectly including irrelevant documents). "Must Not Have" keywords should be used judiciously to prevent documents from being wrongly classified.


Setting up SITs in Microsoft Purview requires a systematic approach that starts with a solid categorization framework and careful keyword identification. By aligning your classification logic with your organization's operational structure and focusing on precision, you can enhance the management and security of your documents saved in different Microsoft Applications. This process not only aids in compliance and data protection but also enhances operational efficiency by ensuring that documents are accurately classified and easily retrievable.

Best regards,

Robin Groh

Read on:
Document Management
Microsoft Purview
Categorization Framework
Keyword Identification
Data Management
Operational Efficiency
Data Protection
Document Security

Robin Groh

Associate Consultant

EMPA-Consulting Group is a management consulting firm. We partner with clients to drive change that transforms their business and creates lasting value.

Related Posts
Unlocking up to 50% savings potential in Data Management

In this article, Andreas Thäwel from our Management team examines the tangible benefits that Machine Learning (ML) and Artificial Intelligence (AI) solutions offer to the realm of Data Management & Governance.

Understanding the Universe of Microsoft Services and Software for a Modern Enterprise

In this article, our Senior Consultant, Dr. Danylo Batulin, offers a comprehensive overview of Microsoft’s evolving suite of software and services, focusing on their effective utilization in data management and analysis.

Navigating the Data Maze: Relevance of Deletion Policies and Data Structuring

In this article, our Consultant Leonard Molina Bülow discusses the importance of organizing and selectively deleting company data. He explains how these actions can make businesses run smoother and stay in line with legal rules, while also protecting them from the risks of holding onto unnecessary information.

Data Governance
Document Management
Microsoft Purview
Categorization Framework
Keyword Identification
Data Management
Operational Efficiency

EMPA - Data & Management Consulting GmbH








EMPA - Data & Management Consulting GmbH

Bettinastraße 62,
60325 Frankfurt am Main

+49 176 83425662


© 2024 EMPA - Data & Management Consulting GmbH



Code of Conduct