Learn how Microsoft Purview Information Protection discovers and protects your most sensitive data

Anna_Chiang · ‎Mar 28 2023

Did you know that 88% of organizations lack the confidence to prevent sensitive data loss?¹Data discovery and classification are the important first steps for organizations who want to better protect sensitive PII and corporate intellectual property; you can’t prevent data loss with policies if the right files aren’t correctly labeled and aren’t protected to begin with.

With Microsoft Purview, our goal is to provide a built-in, intelligent, unified, and extensible solution to protect sensitive data across your digital estate. This includes Microsoft clouds such as Microsoft 365 and Azure, as well as on-premises, hybrid and third-party clouds, and SaaS applications. With Microsoft Purview Information Protection, we are building a unified set of capabilities for data classification, labeling, and protection for our customer’s multicloud and multiplatform IT landscape.

At Microsoft Secure, we are highlighting several new Information Protection product capabilities, including:

Optical character recognition (OCR) support for various workloads and data security solutions including Endpoint DLP, Information Protection, Insider Risk Management and Data Lifecycle Management is coming soon to public preview.
Context-based classification with default site labels is generally available today, and coming soon to public preview are contextual summary support and new contextual predicates.
Smart alerts and alert insights that provide system admins with top risks is coming soon to public preview.
Extending sensitivity labels to Outlook meetings and Teams is now generally available.
An enhanced pre-trained source code classifier is now generally available.

Expanded OCR support for more comprehensive sensitive data discovery and data loss prevention

We’ve listened to customers’ requests to expand the workloads and platform support for OCR. OCR is already generally available for Communication Compliance and eDiscovery Premium. We’re pleased to announce that OCR will soon be in public preview for additional data security solutions: Data Loss Prevention, Insider Risk Management, Information Protection and Data Lifecycle Management and the following workloads: Exchange email, SharePoint sites, OneDrive accounts, Teams chat and channel messages and Windows devices.

Figure 1. Specifying workloads for OCR scans

Figure 2. Selecting sensitive information types to be covered by a DLP policy

Once the OCR settings are configured for different workloads and locations, all your existing DLP, auto labelling, Insider Risk Management, and Data Lifecycle Management policies will start applying to images also if there is any sensitive content in them. For example, if you have configured the DLP condition “content contains sensitive information” and used any classifier or sensitive information type (e.g., a built in SIT like credit card number, custom SIT, exact data match, or trainable classifiers), these classifiers will now scan the content in images and apply the DLP actions if the sensitive content is found in image. There is no need to update existing policies across any of these data security solutions.

Figure 3. An attempt to send a credit card image over Teams is automatically blocked

Context-based classification for improved classification granularity and coverage

To improve the ease of use for system admins, support for contextual summary in simulation mode for service side auto-labeling is coming soon to public preview. When reviewing matched items in the Contextual Summary tab, system admins will be able to easily review what sensitive information type was found as a match in the document. This enables them to further optimize their policies before production deployment, for improved accuracy and reduced false positives.

To improve classification granularity and coverage, new contextual predicates shown below will enable system admins to leverage a document’s context, such as document property, file extension, size, author/owner and document name in auto-labeling policies. This will make it easier and faster to auto-label specific files that aren’t currently possible using other advanced classifiers.

New contextual predicates include:
- Document property is
- File extension is
- Document size equals or is greater than
- Document created by (only available in advanced rules in OneDrive and SharePoint locations)
- Document names contain words or phrases

Proactive smart alerts for system admins on risky user behavior are coming soon in public preview

While system admins can use content explorer and activity explorer to monitor and analyze where sensitive files are stored across their digital estate and how they’re being used, currently they must first have manual or auto labeling and DLP policies that label the sensitive data and files already in place. What if system admins could proactively be shown alerts and insights of risky user behavior without having to first implement specific policies – reinforcing our Zero Trust and secure by default promise where the organizations are aware and protected from the riskiest events. We’re pleased to announce that smart alerts, which can help improve visibility of risky behavior and eliminate blind spots of sensitive data exposure for system admins, is coming soon to public preview.

Smart alerts are out-of-the-box alerts/insights for admins that are system generated and surface the top risks admins can triage as a priority. These are intelligent alerts that leverage various signals including user activity, source and target domains, across workloads and then combine them within and across solutions to flag high risk detections to system admins. They are not dependent on policies, and admins can benefit from these detections even if they don’t have policies in place.

Figure 4. View Smart Alerts incidents in the M365 Defender portal, part of the incident management queue for DLP.

Extending sensitivity labels to Outlook invites and Teams meetings for secure collaboration

For many organizations, highly confidential information may be discussed or shared in meetings, where the meeting content needs to be protected (e.g., mergers and acquisitions). We are pleased to announce the general availability of extending sensitivity labels to Outlook meeting invites, appointments, and Teams meetings. This feature helps organizations ensure that sensitive information is only shared with authorized individuals and that they are aware of the sensitivity level. This can also help address compliance with data protection regulations.

System admins can configure meeting settings for various sensitivity labels in the Microsoft Purview compliance portal, such as protecting and encrypting the meeting content (body and attachments) that meeting owners can apply to their meetings based on the sensitivity level. For a more detailed description of capabilities, check out these Outlook and Teams blogs that also describe the Teams Premium and other license requirements.

Figure 5. Apply a sensitivity label to classify and protect Teams meetings.

Figure 6. Add a watermark to prevent screenshots and taking photos of sensitive content shared onscreen.

Figure 7. Prevent copying Teams chat messages to other applications.

General availability of an enhanced pre-trained source code classifier

Unauthorized exfiltration of source code by insiders can expose organizations to great risk of intellectual property loss and potential damages. In February we announced the public preview of this enhanced source code classifier that supports more extensions (70+), 23 programming languages, addresses customer inputs, and can detect embedded and partial source code and can even work on shorter text (approximately 50 words or phrases) in conversations in Teams and mail. We are pleased to announce that this source code classifier is now generally available and can be directly used in auto-labeling and data loss prevention policies.

Figure 8. Screenshot of the new enhanced source code classifier in action with DLP policies.

Our recent GA of this and 23 other ready-to-use business category trainable classifiers help organizations more quickly and comprehensively discover, label, and protect massive volumes of sensitive data across their digital estate. These classifiers can detect some of the most critical sensitive content such as IP and Trade Secrets, Material Non-Public Info, Sensitive health and medical files, business sensitive financial info and PII for GDPR compliance. Our engineering team leveraged Microsoft’s broad and deep machine learning expertise and leading frameworks, platforms, and development environments that include proprietary and open-source platforms (e.g., Porch, ML.NET, Babel, ONNX) in the model generation, building, peer review, testing (includes real-time) and feedback in the development workflow for these trainable classifiers.

To help provide you with an overview of which trainable classifiers to use for specific use cases and a short tutorial on machine learning, please check our new trainable classifiers eBook in the blog attachments below. For those who want a deeper technical dive, understand the process of how our engineering team built and optimized our ML-models, and how they can be used with our (Microsoft Purview) Information Protection, Data Loss Prevention, and Data Lifecycle Management compliance solutions, please check out our new trainable classifiers whitepaper in the attachments below.

How to Get Started

Get access to Microsoft Purview solutions directly in the Microsoft Purview compliance portal with a trial. By enabling the trial in the Purview compliance portal, you can quickly access these advanced classifiers. Visit your Microsoft Purview compliance portal for more details or check out the Microsoft Purview solutions trial.

¹ Forrester, Security Concerns Security Priorities Survey 2020.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Learn how Microsoft Purview Information Protection discovers and protects your most sensitive data