Was this email forwarded to you? Sign up here

📝 Guest Post: How to Enhance the Usefulness of Large Language Models*

Apr 17

In this guest post, Filip Haltmayer, a Software Engineer at Zilliz, explains how LangChain and Milvus can enhance the usefulness of Large Language Models (LLMs) by allowing for the storage and retrieval of relevant documents. By integrating Milvus, a vector database, with LangChain, LLMs can process more tokens and improve their conversational abilities.

1. What is LangChain

Large Language Models (LLMs) are the new tech wave. The giants, Google and Microsoft, are stepping into the Colosseum to fight in the AI battle and take the search crown. This colossal fight is just the beginning of the LLM breakthrough, and LangChain is here to help push it even further. LLMs alone offer good conversational abilities with limited memory and a tendency to hallucinate in their responses. An example of this hallucination is when you ask ChatGPT, "What are the difficulty levels in Read Dead Redemption"; ChatGPT will respond by stating that there are no difficulty settings. This answer is incorrect, as RDR has 3 set difficulty levels. Because the LLM generates answers using its weights, verifying information or providing sources is impossible.

What if we want LLMs to answer questions and summarize our data?
What if we want the ability to provide answer sources?
What if we want LLMs to remember previous conversations that are outside the token limit?

This is where LangChain comes in. LangChain is a framework that allows you to chain together different computations and knowledge with LLMs to push LLM's usefulness even further. LangChain enables you to create domain-specific chatbots, action agents for specific computation, etc. (TheSequence explained LangChain in more detail here).

2. What role Milvus plays

So where does Milvus come into play? LLMs can only process a certain amount of tokens (around four characters) at a time, meaning LLMs cannot analyze a large document or collection of documents. To get around this, we can store all our documents in a database, search for only the relevant documents to the input question, and feed those documents into the LLM for answer generation. Milvus is perfect for this, as we can take advantage of semantic search to retrieve more relevant documents faster. We begin by taking all the documents we analyze and converting the texts to embeddings. A significant benefit is that we can use the original LLM to generate the embeddings, ultimately keeping the LLM's "thought process" in our search. With all the data embedded, we can store the (embedding and original text) alongside any metadata within Milvus. Then, when a query comes in, we can take the query text, embed it using the same model, search for the relevant texts, and feed them into our LLM to generate answers. Although this pipeline sounds simple, creating it can take a lot of work. Luckily, LangChain makes it easy to do such a pipeline and offers the VectorStore wrapper for vector databases.

## 3. Integration

Here is the integration that I worked on between Milvus and LangChain.

class VectorStore(ABC):
    """Interface for vector stores."""

    @abstractmethod
    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        **kwargs: Any,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore."""

    @abstractmethod
    def similarity_search(
        self, query: str, k: int = 4, **kwargs: Any
    ) -> List[Document]:
        """Return docs most similar to query."""



    def max_marginal_relevance_search(
        self, query: str, k: int = 4, fetch_k: int = 20
    ) -> List[Document]:
        """Return docs selected using the maximal marginal relevance."""
        raise NotImplementedError

    @classmethod
    @abstractmethod
    def from_texts(
        cls: Type[VST],
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
        **kwargs: Any,
    ) -> VST:
        """Return VectorStore initialized from texts and embeddings."""

As previously mentioned, integrating Milvus into LangChain involves extending the VectorStore class. To do this, we needed to implement a few key functions: `add_texts()`, `similarity_search()`, `max_marginal_relevance_search()`, and `from_text()`. The Milvus VectorStore follows a simple pipeline in most use cases. It first starts by receiving a collection of Documents. In most LLM projects, a Document is a data class that contains the original text and all the metadata that comes with it. The Document's metadata is usually JSON based, allowing for more accessible storage within Milvus. Once the Milvus VectorStore consumes a Document, it embeds the text found within it using the supplied embedding function. In most production systems, these embedding functions are usually 3rd party LLM services such as OpenAI, Cohere, etc. However, it is possible to use your model as long as you provide a function that accepts a text input and returns a vector. At this stage in the pipeline, LangChain transfers control to Milvus itself. Milvus takes in the embedding, original text, and metadata and stores it in a collection. As more and more of these documents enter the collection, indexes are created for faster searches across all the stored embeddings.

The last stage of this pipeline is performing searches for relevant data. When a user sends in a question, the text and metadata filters are sent to the Milvus VectorStore. Using the same embedding function as before, Milvus embeds the question and performs a similarity search across its data. Milvus offers two types of searches as a VectorStore, a default one that returns the objects in their unmodified order and one where the max marginal relevance ordering is used. Max marginal relevance ordering works by finding the examples with the embeddings that have the greatest cosine similarity with the inputs and then iteratively adding them while penalizing them for closeness to already selected examples.

Integrating Milvus into LangChain did come with some hiccups, the most prominent being that Milvus cannot handle JSON. The lack of JSON support made things difficult due to the nature of how the configurations are generated for Milvus. Currently, there are only two routes for using the Milvus VectorStore: creating a VectorStore on an already existing Milvus collection or creating a VectorStore based on the first document passed into Milvus. If the Collection already exists, the schema is set in stone. All data that gets added needs to follow the schema to a "t"; if any data is missing or malformed, the system will ignore the entire entry.

Similarly, if any extra metadata is added to a Document after the Collection is created, that extra metadata will be ignored. This doesn't lend itself to a very adaptable system and results in having to do quite a bit of extra work cleaning up inputs and extra work in creating a new collection. Luckily, in version 2.3, Milvus will add the ability to store JSON, simplifying this integration and any further ones. Milvus 2.3 is currently in beta, so feel free to check it out and provide us with any feedback!

4. Conclusion

The result is a working memory and knowledge base for LLMs. Since the integration was merged, LangChain has changed a bit, bringing in the idea of Retrievers. Retrievers are a way to connect to external storage and are the route LangChain will go down. Due to the quick-moving nature of this project, the code found in the Milvus VectorStore could be cleaner, and the next steps will be to clean it up and also add this retriever functionality. As previously mentioned, Milvus cannot handle a dynamic schema, making dealing with premade collections and inserts that are missing data very difficult. Once Milvus supports JSON metadata, it will be time to return and redo this integration.

Overall, working on this project has been a great experience. The community is friendly, and Harrison is very active and helpful. I am looking forward to seeing the heights that the LangChain project reaches.

*This post was written by Filip Haltmayer, a Software Engineer at Zilliz. We thank Zilliz for their ongoing support of TheSequence.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

📝 Guest Post: How to Enhance the Usefulness of Large Language Models*

📝 Guest Post: How to Enhance the Usefulness of Large Language Models*

1. What is LangChain

2. What role Milvus plays

4. Conclusion

*This post was written by Filip Haltmayer, a Software Engineer at Zilliz. We thank Zilliz for their ongoing support of TheSequence.

Older messages

Edge 283: Federated Learning and Differential Privacy

Edge 281: Cross-Device Federated Learning

📝 Guest Post: Caching LLM Queries for Improved Performance and Cost Savings*

The LLama Effect: How an Accidental Leak Sparked a Series of Impressive Open Source Alternatives to ChatGPT

📌 EVENT: Join us at LLMs in Production conference – the first of its kind

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR