Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google BigQuery Introduces Vector Search

Google BigQuery Introduces Vector Search

This item in japanese

Google recently announced that BigQuery now supports vector search. The new functionality enables vector similarity search required by data and AI use cases such as semantic search, similarity detection, and retrieval-augmented generation (RAG) with a large language model (LLM).

In preview mode, the approximate nearest-neighbor search of the serverless data warehouse provides a VECTOR_SEARCH function and relies on an index to optimize the lookups and distance computations required to identify closely matching embeddings. BigQuery vector indexes are automatically updated and the first implemented type (IVF) combines a clustering model with an inverted row locator in a two-piece index.

Omid Fatemieh, engineering lead at Google, and Michael Kilberry, head of product at Google, explain:

Vector search is often performed on high-dimensional numeric vectors, a.k.a. embeddings, which incorporate a semantic representation for an entity and can be generated from numerous sources, including text, image, or video. BigQuery vector search relies on an index to optimize the lookups and distance computations required to identify closely matching embeddings.

According to the cloud provider, a syntax similar to BigQuery’s text search functionality helps combine vector search operations with other SQL primitives. The LangChain implementation simplifies Python-based integrations with other open-source and third-party frameworks. Max Ostapenko, senior product manager at Opera, comments:

Just got positively surprised trying out vector search with embeddings in BigQuery! Diving into the world of enhancing product insights with Vertex AI now. It really expands your approaches to working with textual data.

A popular community request, vector search comes with a tutorial on how to perform semantic search and retrieval-augmented generation. Using the Google Patents public dataset as an example, Google shows three different use cases for the new feature: patent search using pre-generated embedding, patent search with BigQuery embedding generation, and RAG via integration with generative models. Fatemieh and Kilberry write:

BigQuery’s advanced capabilities allow you to easily extend the search cases covered above into full RAG journeys. More specifically, you can use the output from the VECTOR_SEARCH queries as context for invoking Google’s natural language foundation (LLM) models via BigQuery’s ML.GENERATE_TEXT function.

Vector search is not the sole recent announcement for BigQuery. The cloud provider has revealed that Gemini 1.0 Pro is now accessible for BigQuery customers via Vertex AI. Additionally, there is a new BigQuery integration to Vertex AI for text and speech.

Billing for the CREATE VECTOR INDEX statement and the VECTOR_SEARCH function is based on BigQuery compute pricing. For the CREATE VECTOR INDEX statement, only the indexed column is taken into account for the calculation of processed bytes.

About the Author

Rate this Article