In the era of big data, the ability to efficiently search and retrieve information from vast volumes of data is crucial. Whether you’re developing a web application, mobile app, or a software-as-a-service (SaaS) app, implementing a robust and efficient search capability can significantly enhance the user experience. This is where Azure AI Search comes into play.
Azure AI Search, a powerful cloud-based search-as-a-service solution by Microsoft, provides developers with the tools to add sophisticated search capabilities to their applications. One of the key components of Azure AI Search is the search index. A search index in Azure AI Search is akin to a database table that stores and organizes your searchable data.
This article will guide you through creating an Azure AI Search index. We will cover everything from connecting to your data source, defining your index schema, to finally creating and loading your index.
What is an Index in Azure AI Search?
An index in Azure Search is a data structure that contains searchable information from one or more data sources. It is essentially a structured representation of the data that needs to be searched. This structured format enables efficient querying and retrieval of relevant information.
Role of indexes in storing and organizing searchable information:
Indexes play a crucial role in facilitating search operations within Azure Search. They serve as the foundation for conducting searches by storing preprocessed and indexed data in a structured format.
Here are the key roles of indexes:
Storage: Indexes store searchable data in a structured format, allowing for efficient storage and retrieval of information.
Organization: Indexes organize the data into fields and documents to easily search and retrieve information. Each field in the index represents an attribute or property of the data, such as title, description, or date.
Searchability: Azure Search enables fast and accurate search operations by indexing the data. The indexed data can be queried using keywords, filters, and other search parameters to retrieve relevant results quickly.
Scalability: Indexes are designed to scale with the size of the data and the complexity of search queries. They can handle large volumes of data efficiently and provide fast search results even as the dataset grows.
How to Create an Azure AI Search Index
Below are the steps to create the index in Azure AI Search:
Using the"+Add index" option
Using the "Import data" option
Option 1: Create Azure AI Search Index Using the"+Add index" option
STEP 1: Sign in to the Azure portal. In the search box at the top, type “Azure AI Search” and select it from the dropdown menu.
STEP 2: Create an Azure AI Search service if you haven’t already. Once the service is created, go to your resource.
STEP 3: On the Overview page, click the "+ add index" option and select "Add index"
STEP 4: This will open an embedded editor where you can specify an index schema. Here, you’ll need to define the schema for your index.
This includes:
Specifying fields: Click “+ add field” to add a new field.
Setting data types for each field.
Configuring indexing options for each field.
Step 5: Identify a document key. A document key is a unique identifier for each document in your index. It’s a single-string field from a source data field containing unique values.
Step 6: Once you’ve defined the schema and identified a document key, click “Create” to create your index.
Option 2: Create Azure AI Search Index Using the "Import data" option
STEP 1: Click on the "Import data" option.
STEP 2: Connect to Data Source
Expand the Data source dropdown and select "Samples". From the list of samples, choose the hotel sample.
You can also connect to your data source. Azure AI Search supports various data sources such as:
Azure SQL Database
SQL Server on Azure VMs
Azure Cosmos DB
Azure Blob Storage
Azure Data Lake Storage Gen2
Azure Table Storage
SharePoint Online (Preview)
Azure File Storage (Preview)
Azure Database for MySQL (Preview)
Click "Next: Add cognitive skills (optional)".
STEP 3: Configure Cognitive Skills
You can configure cognitive skills here to add AI enrichment to your data. This step is optional and can be skipped if not needed.
Click "Skip to: Customize Target index".
STEP 4: Customize target index
It automatically creates a schema based on the built-in hotels-sample data.
Accept the suggested values for the Index name (hotels-sample-index) and Key field (HotelId).
Accept the system-generated field attributes (unless you're rerunning the wizard with an existing data source).
An index requires at least an Index name and a collection of Fields. Each document needs a unique identifier defined by a Key field (always a string). The wizard automatically selects a suitable field for the key.
Each field has the following properties:
Name: A descriptive name for the field.
Data type: The type of data field (e.g., string, integer).
Attributes: These control how the field is used in search:
Retrievable: Whether the field is returned in search results.
Filterable: Whether the field can be used for filtering searches.
Sortable: Whether the field can be used for sorting search results.
Facetable: Whether the field can be used for faceted navigation.
Searchable: Whether the field is used in full-text search (strings are searchable by default).
Analyzers/Suggesters: Optional attributes for enabling features like autocomplete and suggested queries.
Click "Next: Create an indexer".
STEP 5: Create an Indexer
In this step, you’ll create an indexer that will connect to your data source, read the data, and pass it to the search engine for indexing.
Specify the Indexer Name: This is a unique identifier for the indexer within the indexer collection.
Set the Schedule: You can set the indexer to run once, hourly, daily, or on a custom schedule. This determines how often the indexer will run to update the index with any changes in the data source.
Configure Advanced Options:
Click on the advanced option to configure the following:
Base-64 Encode keys: If your document keys contain special characters, you can choose to Base-64 encode them.
Max Failed Items: This is the maximum number of items that can fail to be indexed before the entire indexer run is considered a failure.
Max Failed Items Per Batch: This is the maximum number of items that can fail in a single batch before the entire indexer run is considered a failure.
Batch Size: The number of items the indexer will attempt to index in a single batch.
Once configured all the settings, click “Submit” to create the indexer.
The indexer will start running according to the schedule, and you can monitor its progress in the Azure portal.
Indexing Data in Azure AI Search
Azure AI Search utilizes indexers, and specialized crawlers that streamline data ingestion. These crawlers extract textual data from various cloud sources and populate a search index. This process, often a pull model, eliminates the need for custom code to add data to the index.
Enriching Data with AI and Skills:
Indexers act as catalysts for skillset execution and AI enrichment. Skills are configurable modules that perform additional processing on content before it's indexed. Examples include:
Optical Character Recognition (OCR) for extracting text from images
Text Split Skill for chunking large data into manageable pieces
Text Translation Skill for multilingual search capabilities
Supported Data Sources and Configuration:
Indexers target specific data sources. This involves defining a data source (origin) and a target search index (destination). Specific data sources, like Azure Blob Storage, might require additional configuration options tailored to their content type.
Scheduling Data Refresh:
You can run indexers either on-demand or set up recurring schedules. Schedules can be as frequent as every five minutes. For even more frequent updates, a push model is necessary. This model synchronizes data updates across Azure AI Search and the external source simultaneously.
Indexer Performance and Scalability:
A search service assigns a single indexer job per search unit. To achieve concurrent processing, ensure you have sufficient replicas allocated. Indexers are foreground processes, meaning heavy indexing activity might temporarily increase query throttling.
Data Ingestion Strategies:
Indexers offer flexibility for data ingestion. You can use them as the sole data source or combine them with other techniques. The search index can accept content from various sources, with each indexer contributing new data from its respective provider. Each source can contribute entire documents or populate specific fields within documents.
For parallelized indexing of massive datasets, consider a multi-indexer strategy. This approach assigns subsets of the data to individual indexers, enabling faster and more efficient processing.
Azure AI Search Schema Definition
In Azure Search, defining the schema for an index is a critical step in creating a searchable index. It involves specifying the structure of the data that will be stored in the index, including fields, data types, and indexing options.
Importance of defining the schema for an index:
Data Consistency: By defining the schema, you ensure that the data stored in the index follows a consistent structure. This consistency is essential for accurate search results and efficient query processing.
Search Relevance: The schema allows you to specify which fields are searchable, filterable, and sortable. By defining these properties, you can control the relevance of search results and provide users with more meaningful insights.
Index Optimization: A well-defined schema enables Azure Search to optimize indexing and query processing. You can improve performance and resource utilization by specifying data types and indexing options leading to faster search operations.
Data Enrichment: The schema can include fields for storing additional metadata or derived attributes. This enables data enrichment processes, such as adding computed fields or extracting insights from the indexed data.
Conclusion
In this article, we’ve walked you through creating an Azure AI Search index, a fundamental of the Azure AI Search service. We’ve covered everything from connecting to your data source and defining your index schema to creating and loading your index.
If you found this article helpful, share it with others who might benefit. Also, if you have any additional or updated information, please leave a comment. Your feedback and contributions are greatly appreciated! 😊
Comments