📝 Guest Post: Choosing the Right Vector Index For Your Project*
Was this email forwarded to you? Sign up here In this post, Frank Liu. ML Architect at Zilliz, discusses vector databases and different indexing strategies for approximate nearest neighbor search. The options mentioned include brute-force search, inverted file index, scalar quantization, product quantization, HNSW, and Annoy. Liu emphasizes the importance of considering application requirements when choosing the appropriate index. Vector databases are purpose-built databases meant to conduct approximate nearest neighbor search across large datasets of high-dimensional vectors (typically over 96 dimensions and sometimes over 10k). These vectors are meant to represent the semantics of unstructured data, i.e. data that cannot be fit into traditional databases such as relational databases, wide-column stores, or document databases. Conducting efficient similarity search requires a data structure known as a vector index. These indexes enable efficient traversal of the entire database; rather than having to perform brute-force search with each vector. There are a number of in-memory vector search algorithms and indexing strategies available to you on your vector search journey. Here's a quick summary of each: Brute-force search (`FLAT`)Brute-force search, also known as "flat" indexing, is an approach that compares the query vector with every other vector in the database. While it may seem naive and inefficient, flat indexing can yield surprisingly good results for small datasets, especially when parallelized with accelerators like GPUs or FPGAs. Inverted file index (`IVF`)IVF is a partition-based indexing strategy that assigns all database vectors to the partition with the closest centroid. Cluster centroids are determined using unsupervised clustering (typically k-means). With the centroids and assignments in place, we can create an inverted index, correlating each centroid with a list of vectors in its cluster. IVF is generally a solid choice for small- to medium-size datasets. Scalar quantization (`SQ`)Scalar quantization converts floating-point vectors (typically
The quantized dataset typically uses 8-bit unsigned integers, but lower values (5-bit, 4-bit, and even 2-bit) are also common. Product quantization (`PQ`)Scalar quantization disregards distribution along each vector dimension, potentially leading to underutilized bins. Product quantization (PQ) is a more powerful alternative that performs both compression and reduction: high-dimensional vectors are mapped to low-dimensional quantized vectors assigning fixed-length chunks of the original vector to a single quantized value. `PQ` typically involves splitting vectors, applying k-means clustering across all splits, and converting centroid indices. Hierarchical Navigable Small Worlds (`HNSW`)HNSW is the most commonly used vectoring indexing strategy today. It combines two concepts: skip lists and Navigable Small Worlds (NSWs). Skip lists are effectively layered linked lists for faster random access ( Approximate Nearest Neighbors Oh Yeah (`Annoy`)Annoy is a tree-based index that uses binary search trees as its core data structure. It partitions the vector space recursively to create a binary tree, where each node is split by a hyperplane equidistant from two randomly selected child vectors. The splitting process continues until leaf nodes have fewer than a predefined number of elements. Querying involves iteratively the tree to determine which side of the hyperplane the query vector falls on. Don't worry if some of these summaries feel a bit obtuse. Vector search algorithms can be fairly complex but are often easier to explain with visualizations and a bit of code. Picking a vector indexSo how exactly do we choose the right vector index? This is a fairly open-ended question, but one key principle to remember is that the right index will depend on your application requirements. For example: are you primarily interested in query speed (with a static database), or will your application require a lot of inserts and deletes? Do you have any constraints on your machine type, such as limited memory or CPU? Or perhaps the domain of data that you'll be inserting will change over time? All of these factors contribute to the most optimal index type to use. Here are some guidelines to help you choose the right index type for your project: 100% recall: This one is fairly simple - use
10MB < 2GB < 20GB < One last note on Annoy - we don't recommend using it simply because it fits into a similar category as HNSW since, generally speaking, it is less performant. Annoy is the most uniquely named index, so it gets bonus points there. A word on disk indexesAnother option we haven't dove into explicitly in this blog post is disk-based indexes. In a nutshell, disk-based indexes leverage the architecture of NVMe disks by colocating individual search subspaces into their own NVMe page. In conjunction with zero seek latency, this enables efficient storage of both graph- and tree-based vector indexes. These index types are becoming increasingly popular since they enable the storage and search of billions of vectors on a single machine while maintaining a reasonable performance level. The downside to disk-based indexes should be obvious as well. Because disk reads are significantly slower than RAM reads, disk-based indexes often experience increased query latencies, sometimes by over 10x! If you are willing to sacrifice latency and throughput for the ability to store billions of vectors at minimal cost, disk-based indexes are the way to go. Conversely, if your application requires high performance (often at the expense of increased compute costs), you'll want to stick with Wrapping upIn this post, we covered some of the vector indexing strategies available. Given your data size and compute limitations, we provided a simple flowchart to help determine the optimal strategy. Please note that this flowchart is a general guideline, not a hard-and-fast rule. Ultimately, you'll need to understand the strengths and weaknesses of each indexing option, as well as whether a composite index can help you squeeze out the last bit of performance your application needs. All of these index types are freely available to you in Milvus, so you can experiment as you see fit. Go out there and experiment! To learn more, register to join the upcoming Zilliz webinar: Vector Search Best Practices on July 13 where we will cover more details on vector index selection. See you then! *This post was written by Frank Liu, ML Architect at Zilliz, exclusively for TheSequence. We thank Zilliz for their ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
The Generative Audio Momentum
Sunday, June 25, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 302: Inside MPT-7B: MosaicML's Suite of Open Source LLMs that Supports 65k Tokens
Thursday, June 22, 2023
The new suite of models was released by MosaicML and support models optimized for Instructions, Chats, Stories and More.
The Sequence Chat: Vipul Ved Prakash, CEO, Together on Decentralized, Open Source Foundation Models
Wednesday, June 21, 2023
Together has been behind some of the most interesting releases in open source foundation models.
Edge 301: Retrieval-Augmented Language Models Methods
Tuesday, June 20, 2023
The ideas for decoupling model knowledge from language generation.
The Sequence Pulse: Inside Merlin, the Platform Powering Machine Learning at Shopify
Tuesday, June 20, 2023
The eCommerce giant published some details about the platform powering its ML workflows
You Might Also Like
Tuesday Triage #200 and giveaway
Tuesday, May 14, 2024
Your weekly crème de la crème of the Internet is here! The 200th edition featuring annual subscriptions giveaway, thoughts on nearly four years of ...
🎮 How AI Tools Are Changing Game Development — Grab a Pixel 8a Instead of Waiting for Pixel 9
Tuesday, May 14, 2024
Also: Sharing Your Google Maps Trip Progress, and More! How-To Geek Logo May 14, 2024 Did You Know In a bid to keep the ingredients secret, WD-40 was never patented. 🤖 The New GPT It's Tuesday!
Meta shuts down Workplace
Tuesday, May 14, 2024
Plus: Everything that happened at Google I/O and AWS CEO steps down View this email online in your browser By Christine Hall Tuesday, May 14, 2024 Hello, and welcome back to TechCrunch PM. The team
Flattening Lists of Lists, Python 3.13, Sets, and More
Tuesday, May 14, 2024
Flattening a List of Lists in Python #629 – MAY 14, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo Flattening a List of Lists in Python In this video course, you'll learn how to flatten a list
Daily Coding Problem: Problem #1441 [Easy]
Tuesday, May 14, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. UTF-8 is a character encoding that maps each symbol to one, two, three, or four
Noonification: 3 Quick Ways to Optimize RecyclerView
Tuesday, May 14, 2024
Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 14, 2024? The HackerNoon Newsletter brings the HackerNoon
Using 97 fewer cores thanks to PGO
Tuesday, May 14, 2024
Plus an HNSW indexed vector store library, a new Go game hits the Steam store, and is 'ok' ok?. | #507 — May 14, 2024 Unsub | Web Version Together with Stytch logo Go Weekly Reclaiming CPU for
Ranked | The Top 6 Economies by Share of Global GDP (1980-2024) 📈
Tuesday, May 14, 2024
Gain a unique perspective on the world's economic order from this graphic showing percentage share of global GDP over time. View Online | Subscribe Presented by: Data that drives the
Free online event this Thursday: Getting ahead with time series data
Tuesday, May 14, 2024
Free Online Event Do you know how your competitors use time series data to get ahead? Join us on Thursday, May 16 at 10am PT/1pm ET for a free, hour-long online fireside chat called “Unleash the Full
Here's the deal
Tuesday, May 14, 2024
We wanted you to be among the first to know about our plans to relaunch the Gigantic training courses that Product Collective now powers! Here's the deal: From May 20th - May 31st, anybody that