In today's information age, users expect search experiences that are both intuitive and highly relevant. Traditional search methods often struggle to keep pace with the evolving nature of user queries and the vast amount of data available. Here's where hybrid search emerges as a game-changer, offering a powerful approach that elevates search experiences to a new level.
This article delves into the world of hybrid search within Azure AI Search. We'll explore how it leverages the strengths of both full-text and vector search to deliver exceptional results of keyword matching. You'll gain insights into the inner workings of hybrid search, its key benefits, and how it seamlessly integrates with existing Azure AI Search functionalities.
What is Hybrid Search?
Hybrid Search in Azure AI Search combines full text and vector queries that execute against a search index containing searchable plain text content and generated embeddings.
Hybrid search excels by combining the strengths of two distinct search techniques:
Full-Text Search: This method is a workhorse, adept at finding documents containing the exact keywords present in your query. It excels at literal matches.
Vector Search: This approach takes a more nuanced perspective. It goes beyond keywords, identifying documents that share similar semantic meaning with your query, even if they don't use the precise words. Vector search leverages the power of embeddings, which are compressed representations of text data that capture its semantic essence.
By leveraging a search index that stores the original text data and its corresponding embeddings (compressed representations capturing semantic meaning), hybrid search unlocks a new level of search accuracy.
How Hybrid Search Executes
Single Query Request: You submit a single query that acts as a powerhouse, containing a search string for full-text search and a vector representation for vector search. This combined query unlocks the potential of both search techniques.
Parallel Execution: Azure AI Search leverages its infrastructure to execute both full-text and vector searches simultaneously. This parallel processing optimizes search efficiency, delivering results faster for your users.
Merging the Results: After both searches are complete, a technique called Reciprocal Rank Fusion (RRF) steps in to combine the results. RRF analyzes the individual rankings generated by each search type. It assigns a final score to each document, considering its relevance ranking based on keywords (from full-text search) and its semantic search similarity to the query (from vector search). This final score determines the overall relevance ranking of each document to the user's query.
Unified Response: You receive a single response containing the final ranked list of documents. This response utilizes RRF to select the most relevant matches from the full-text and vector search results, offering a clear and consistent user experience.
Beyond the Basics: Flexibility and Refinement
Flexible Vector Fields: Azure AI Search provides further flexibility. It allows vector fields containing embeddings to coexist with traditional text and numeric fields within the search index. This empowers you to formulate sophisticated hybrid queries that tap into the strengths of both full-text and vector search approaches within a single request.
Leveraging Existing Functionalities: Hybrid search seamlessly integrates with existing Azure AI Search functionalities like filtering, faceting, sorting, scoring profiles, and semantic search ranking. This allows you to refine your search results further and deliver a more tailored user experience. By using these features in conjunction with hybrid search, you can personalize and enhance search experiences.
Structure of Hybrid Query
A hybrid query in Azure AI Search is predicated on having a search index that contains fields of various data types, including plain text and numbers, geo coordinates for geospatial search, and vectors for a mathematical representation of a chunk of text. You can use almost all query capabilities in Azure AI Search with a vector query, except for client-side interactions such as autocomplete and suggestions.
Here’s an example of a representative hybrid query:
POST https://{{searchServiceName}}.search.windows.net/indexes/hotels-vector-quickstart/docs/search?api-version=2023-11-01
content-type: application/JSON
{
"count": true,
"search": "historic hotel walk to restaurants and shopping",
"select": "HotelId, HotelName, Category, Description, Address/City, Address/StateProvince",
"filter": "geo.distance(Location, geography'POINT(-77.03241 38.90166)') le 300",
"facets": [ "Address/StateProvince"],
"vectors": [
{
"value": [ <array of embeddings> ],
"k": 7,
"fields": "DescriptionVector"
},
{
"value": [ <array of embeddings> ],
"k": 7,
"fields": "Description_frVector"
}
],
"queryType": "semantic",
"queryLanguage": "en-us",
"semanticConfiguration": "my-semantic-config"
}
Key points include:
search specifies a full-text search query.
vectors for vector queries, which can be multiple, targeting multiple vector fields.
Replace {{searchServiceName}} with your actual search service name, and <array of embeddings> with your actual vectors.
In addition to the basic structure of a hybrid query in Azure AI Search, there are several other important aspects to consider:
Prerequisites: Before using a hybrid query, you should have an Azure AI Search service and a search index that contains both vector and non-vector fields. You should also use a supported version of the Search Post REST API or the Azure SDKs.
Vectorizer: The stable version of vector search doesn’t provide built-in vectorization of the query input string. You should pass the query string to an external embedding model for vectorization. However, the preview version of vector search adds integrated vectorization.
Results: All results are returned in plain text, including vectors in fields marked as retrievable. Because numeric vectors aren’t useful in search results, you can choose other fields in the index as a proxy for the vector match. For example, if an index has “descriptionVector” and “descriptionText” fields, the query can match on “descriptionVector” but the search result can show "descriptionText".
Select Parameter: Use the select parameter to specify only human-readable fields in the results.
Multiple Vector Queries: The vector query parameter targeting multiple vector fields. If the embedding space includes multi-lingual content, vector queries can find the match that requires no language analyzers or translation.
Setting up Hybrid Search in Azure AI Search
Prerequisites for Setting Up Hybrid Search:
An Azure AI Search service in any region and on any tier. Most existing services support vector search.
A search index containing both vector and non-vector fields.
You need to use Search Post REST API version 2023-11-01 or REST API 2023-10-01-preview, Search Explorer in the Azure portal, or packages in the Azure SDKs that have been updated to use this feature.
(Optional) If you want to use semantic ranking and vector search together, your search service must be Basic tier or higher, with semantic ranking enabled.
Here’s a Python code snippet demonstrating how to set up a hybrid search in Azure AI Search. This example assumes you have an Azure Cognitive Search service and an index with text and vector fields.
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
# Set the values of these variables to your Azure Cognitive Search service, index, and credentials
search_service_endpoint = "https://ai-search100.search.windows.net"
index_name = "hotels-sample-index"
api_key = "API_KEY"
# Create a SearchClient
credential = AzureKeyCredential(api_key)
client = SearchClient(endpoint=search_service_endpoint,
index_name=index_name,
credential=credential)
# Define your search query and vector query
search_query = "historic hotel walk to restaurants and shopping"
vector_query = [0.1, 0.2, 0.3, 0.4, 0.5] # This should be your actual vector
# Construct the hybrid search request
search_request = {
"count": True,
"search": search_query,
"select": "HotelId, HotelName, Category, Description, Address/City, Address/StateProvince",
"filter": "geo.distance(Location, geography'POINT(-77.03241 38.90166)') le 300",
"facets": ["Address/StateProvince"],
"vectors": [
{
"value": vector_query,
"k": 7,
"fields": "DescriptionVector"
}
],
"queryType": "semantic",
"queryLanguage": "en-us",
"semanticConfiguration": "my-semantic-config"
}
# Send the search request and get the response
response = client.search(search_request)
# Print the results
for result in response:
print(result)
This code sends a hybrid search request to the Azure AI Search service.
Please replace "API_KEY" with your API key, and vector_query with your actual vector. Also, make sure that the index_name and search_service_endpoint match your Azure AI Search service and index
Tips for Setting Up Hybrid Search:
The stable version (2023-11-01) of vector search doesn’t provide built-in vectorization of the query input string. Encoding (text-to-vector) of the query string requires that you pass the query string to an external embedding model for vectorization.
The preview version (2023-10-01-Preview) of vector search adds integrated vectorization. If you want to explore this feature, create and assign a vectorizer to get the built-in embedding of query strings.
All results are returned in plain text, including vectors in fields marked as retrievable. Because numeric vectors aren’t useful in search results, choose other fields in the index as a proxy for the vector match. For example, if an index has “descriptionVector” and “descriptionText” fields, the query can match on “descriptionVector” but the search result can show "descriptionText".
Use the select parameter to specify only human-readable fields in the results.
Benefits of Hybrid Queries:
Hybrid queries are useful because they support all query capabilities, including orderby and semantic ranking. For example, in addition to the vector query, you could search over people or product names or titles, scenarios for which similarity search isn’t a good fit
Benefits of Hybrid Search
Comprehensive Search: Hybrid search combines full text and vector queries, allowing you to search for information in your data using traditional keyword-based search methods and more advanced vector-based search methods.
Parallel Execution: The search engine runs full text and vector queries in parallel, which can improve the speed and efficiency of the search.
Merged Results: All matches from both full text and vector queries are evaluated for relevance ranking using Reciprocal Rank Fusion (RRF) and a single result set is returned in the response. This provides a more comprehensive set of results.
Support for All Query Capabilities: Hybrid queries support all query capabilities, including orderby and semantic ranking. This makes them very versatile and useful for search scenarios.
Improved Relevance: Hybrid search can deliver markedly improved relevance out-of-the-box, especially for Generative AI scenarios where applications use the retrieval-augmented generation (RAG) pattern.
Limitations of Hybrid Search
Prerequisites: Before using a hybrid query, you should have an Azure AI Search service and a search index that contains both vector and non-vector fields. This might require additional setup and configuration.
No Built-in Vectorization: The stable version of vector search doesn’t provide built-in vectorization of the query input string. Encoding (text-to-vector) of the query string requires that you pass the query string to an external embedding model for vectorization.
Limited Client-Side Interactions: You can use almost all query capabilities in Azure AI Search with a vector query, except for client-side interactions such as autocomplete and suggestions.
Numeric Vectors Aren’t Useful in Search Results: All results are returned in plain text, including vectors in fields marked as retrievable. Because numeric vectors aren’t useful in search results, you should choose other fields in the index as a proxy for the vector match.
Semantic Ranking Limitations: Semantic ranking starts with a BM25-ranked result from a text query or an RRF-ranked result from a hybrid query. If the retrieval step (L1) misses an ideal document, the ranking step (L2) can’t fix that.
Conclusion
Hybrid search in Azure AI Search empowers you to deliver exceptional search experiences. Users find highly relevant results, boosting satisfaction and engagement. As AI advances, even more sophisticated search techniques are on the horizon.
Comments