In the rapidly evolving world of digital transformation, organizations are constantly seeking ways to manage and make sense of their data. One such tool that has emerged as a game-changer in content understanding is Microsoft Syntex. This article aims to provide a comprehensive guide to the different types of models available in Microsoft Syntex, helping you to understand their benefits, limitations, and best use cases.
Microsoft Syntex, part of Microsoft’s Project Cortex, uses advanced AI and machine teaching to amplify human expertise, automate content processing, and transform content into knowledge. It offers two types of models: Custom Models and Prebuilt Models. Each type has its strengths and is designed to cater to different needs and scenarios.
In the following sections, we will explore Microsoft Syntex models, exploring their features, benefits, and limitations, and providing insights on the right model for your specific needs. Whether you are a business analyst, a data scientist, or someone interested in content understanding and management, this guide will equip you with the knowledge you need to use Microsoft Syntex in your organization.
Let’s get started!
Microsoft Syntex Models
Microsoft Syntex offers two primary model categories to streamline document processing:
Model Deployment Options:
Syntex provides flexibility in model deployment:
Enterprise Models
It is created within a centralized location called the content center. These models are overseen by IT or designated administrators, ensuring consistency and quality across the organization. Once deployed, all authorized users within your organization can leverage these models for their document processing needs.
This option is ideal for:
Standardized document types used across departments (e.g., invoices, expense reports)
Extracting critical information for enterprise-wide reporting or analysis
Local Models
It is developed directly on a specific SharePoint site. These models address the unique document processing needs of a particular department or team. Only users with access to that specific SharePoint site can utilize the local model.
This option is suitable for:
Documents specific to a department or team (e.g., marketing campaign materials, research papers)
Streamlining internal workflows within a department or team
1. Custom Models
In Microsoft Syntex, custom models are user-created tools designed to extract and understand information from various document formats. These models are tailored to your specific needs, allowing you to process and categorize documents in a way that pre-built models might not.
Benefits of Custom Models:
Custom models are designed to handle file types and support more than 40 languages.
They can be tailored to your needs, allowing you to extract information based on phrases or patterns unique to your documents.
Custom models can be either enterprise models, created in a content center, or local models, created on your local SharePoint site.
Limitations of Custom Models:
Custom models have specific requirements such as file type and size, supported languages, and geographical considerations.
There are also OCR considerations for certain file types, including limitations on file size, dimensions, and quality.
In a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location.
Choosing the Right Custom Model:
The most suitable custom model depends on the kind of documents you work with:
Unstructured Document Processing: Ideal for handling documents with flexible layouts and varying structures, like emails, reports, or customer reviews.
Freeform Document Processing: This model tackles documents with less rigid structures, such as blog posts, social media content, or product descriptions.
Structured Document Processing: Designed for documents with a well-defined layout and consistent organization, such as invoices, financial statements, or spreadsheets.
Factors | Unstructured Document processing | Freeform Document processing | Structured Document processing |
---|---|---|---|
Usage | Unstructured or semi-structured file formats, for example Office documents where there are differences in the layout, but still similar information to be extracted. | Unstructured and free-form file formats, for example documents that have no set structure such as letters, contracts, and statements of work. | Structured and semi-structured file formats, for example PDFs for forms content such as invoices or purchase orders where the layout and formatting is similar. |
Model Creation | Content Center | SharePoint document library | SharePoint library |
Classification Type | Trainable classifier with optional extractors using machine teaching to assign document location on what data to extract. | Not applicable | Not applicable |
Locations | Can be applied to multiple libraries. | Can be applied to multiple libraries. | Can be applied to multiple libraries. |
Supported File Types | Train on 5-10 .pdf, Office, or email files, including negative examples. Files are truncated at 64,000 characters. OCR-scanned files are limited to 20 pages. Supports more than 20 file types. | Train on .pdf, .jpg, or .png format, total 50 MB and 500 pages. | Train on .pdf, .jpg, or .png format, total 50 MB and 500 pages. |
Integrate with Managed Metadata | Yes, by training entity extractor referencing a configured managed metadata field. | No | No |
Compliance feature integration with Microsoft Purview Information Protection | Set published retention labels. Set published sensitivity labels. | Set published retention labels. Set published sensitivity labels. | Set published retention labels. Set published sensitivity labels. |
Capacity | No capacity restrictions | It uses the default Power Platform environment | It uses the default Power Platform environment |
Transactional Cost | Pay-as-you-go | For pay-as-you-go licensing, not applicable. For per-user licensing, uses AI Builder credits. 3,500 credits are included for each Syntex license per month. One million credits allow the processing of 10,000 file pages. | For pay-as-you-go licensing, not applicable. For per-user licensing, uses AI Builder credits. 3,500 credits are included for each Syntex license per month. One million credits allow the processing of 10,000 file pages. |
Supported Languages | Supported in 40 + language | Supported in 40+ language | Supported in 100+ language |
Supported Regions | Available in all regions | It relies on the Power Platform | It relies on the Power Platform |
Selecting the Training Method:
During model creation, the platform will prompt you to choose the training method based on the model type. For instance, creating an unstructured document processing model likely involves selecting the "Teaching method" option on the "Options for model creation" page.
Training Method by Model Type:
Microsoft Syntex utilizes different training methods for each custom model type. The specific method you choose depends on the model you're creating:
Model Type | Training Method |
Unstructured Document Processing | Teaching Method |
Freeform Document Processing | Teaching Method (likely same as unstructured) |
Structured Document Processing | Platform-specific (may differ from unstructured/freeform) pen_spark |
By creating custom models, you can automate document processing tasks in Microsoft Syntex, saving time and ensuring accuracy in extracting critical information from your files.
2. Prebuilt Models
In Microsoft Syntex, pre-built models are essentially pre-trained document processing tools designed for common document types with well-defined structures, such as invoices, contracts, or receipts. These models come ready to use, eliminating the need for you to invest time and resources in building custom models from scratch.
Benefits of Pre-Built Models:
No training is required, allowing you to leverage the model for document processing.
Save time and effort by avoiding the need to build custom models.
Pre-built models are designed for their respective document types, ensuring reliable information extraction.
Limitations of Pre-Built Models:
If the model doesn’t detect the fields that you need, you may need to analyze them again by using a different file.
Prebuilt models have specific requirements and limitations similar to custom models.
Available Options:
Syntex currently offers pre-built models for three common document types:
Contract Processing: Efficiently extracts key information like clauses, dates, and involved parties from contracts.
Invoice Processing: Automates invoice processing by capturing crucial details like vendor, amount due, and line items.
Receipt Processing: Quickly extracts details like merchant name, date, and total amount from receipts.
Prebuilt Model | Description | Use Case | Tools Used | Limitations |
---|---|---|---|---|
contract Processing | The prebuilt contracts model analyzes and extracts key information from contract documents. The model recognizes contracts in various formats and extracts key contact information, such as client name and address, contract duration, and renewal date | It is ideal for processing contract documents in various formats. | It uses AI models to identify and extract information based on phrases or patterns | The identified text designates both the type of file it is (its classification) and what you’d like to extract (its extractors). |
Invoice Processing | The prebuilt invoices model analyzes and extracts key information from sales invoices. The API analyzes invoices in various formats and extracts key invoice information such as customer name, billing address, due date, and amount due | It is ideal for processing sales invoices in various formats. | It uses AI models to identify and extract information based on phrases or patterns. | The layout of your document is learned by training your model. You only need five form documents to get started. |
Receipt Processing | The prebuilt receipts model analyzes and extracts key information from sales receipts. The API analyzes printed and handwritten receipts and extracts key receipt information such as merchant name, merchant phone number, transaction date, tax, and transaction total | It is ideal for processing sales receipts, both printed and handwritten. | It uses AI models to identify and extract information based on phrases or patterns. | Syntex will analyze your example files for key-value pairs, and you can also manually identify ones that might not have been detected. |
Choosing the Right Model
Choosing the right model in Microsoft Syntex depends on several factors:
Types of Files: The type of files you use can influence the choice of model. Different models may support different file types.
Format and Structure of Files: The format and structure of your files are also important considerations. For instance, unstructured documents like letters or contracts are suited for unstructured document processing models.
Specific Needs: If you have specific needs or requirements, a custom model might be more appropriate as it can be tailored to extract information based on phrases or patterns unique to your documents.
Geographical Considerations: In a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location.
OCR Considerations: Certain file types have OCR considerations, including limitations on file size, dimensions, and quality.
Language Support: Different models support different languages. Ensure the model you choose supports the languages you need.
Cost: Prebuilt models can be used at no cost if you have pay-as-you-go billing set up.
Conclusion
The choice between Custom and Prebuilt models depends on your specific requirements. Custom models offer the advantage of being tailored to your unique needs, while Prebuilt models provide ready-to-use solutions for common document types.
Comments