Blog
Simplifying Legal Research with RAG, Milvus, and Ollama

Simplifying Legal Research with RAG, Milvus, and Ollama

Jul 11, 20247 min read

In this blog post, we will see how we can apply RAG to Legal data.

Legal research can be time-consuming. You usually need to review a large number of documents to find the answers you need. Retrieval-Augmented Generation (RAG) can help streamline your research process.

What is RAG?

Retrieval Augmented Generation (RAG) is a technique that enhances LLMs by integrating additional data sources. A typical RAG application involves:

Indexing - a pipeline for ingesting data from a source and indexing it, which usually consists of loading, splitting, and sorting the data in Milvus.
Retrieval and generation - At runtime, RAG processes the user's query, fetches relevant data from the index stored in Milvus, and the LLM generates a response based on this enriched context.

In this hands-on guide we’ll explore how to set up a Retrieval Augmented Generation (RAG) system using Ollama, focusing on legal data and leveraging Milvus as our vector database.

The tools:

Ollama: Brings the power of LLMs to your laptop, simplifying local operation.
Milvus: The vector database we use to store and retrieve your data efficiently.
Llama 3: Meta’s latest iteration of a lineup of large language models.
Voyage AI - Provides embedding models specialized in specific domains. We’re using their specialized law embedding, optimized for legal and long-context retrieval.

Preparation

Dependencies and Data

First, install the necessary dependencies:


    ! pip install --upgrade pymilvus ollama tqdm pypdf voyageai openai wget

Import API Keys

If you have your API key set in a .env file, you can load it using dotenv to avoid leaking the key when you push your notebook.


    import os
    from dotenv import load_dotenv

    load_dotenv()

    VOYAGE_API_KEY = os.getenv('VOYAGE_API_KEY')

If you don’t have an .env file, just skip the dotenv import.

Prepare the data

For this tutorial, we’ll use data from the Royal Courts of Justice in London. Download it using wget and save it locally:


    ! wget https://www.judiciary.uk/wp-content/uploads/2024/07/Final-Judgment-CA-2023-001978-BBC-v-BBC-Pension-Trust-another.pdf

Next, read the PDF file and extract its content for use in our RAG application:


    from pypdf import PdfReader

    reader = PdfReader("Final-Judgment-CA-2023-001978-BBC-v-BBC-Pension-Trust-another.pdf")

    pages = [page.extract_text() for page in reader.pages]

    print(pages[0])

This should give you the following output:


    Neutral Citation Number  [2024] EWCA Civ 767   
     
    Case No:  CA-2023 -001978   
    IN THE COURT OF APPEAL ( CIVIL  DIVISION)  
    ON APPEAL FROM  THE HIGH COURT OF JUSTICE  
    BUSINESS AND PROPERTY COURTS OF ENGLAND AND WALES  
    BUSINESS LIST : PENSIONS (ChD)        
    The Hon Mr Justice Adam Johnson  
    [2023] EWHC 1965  (Ch)  
    Royal Courts of Justice  
    Strand, London, WC2A 2LL  
     
    Date:  09/07 /2024   
    Before :  
     
    LORD JUSTICE LEWISON  
    LADY JUSTICE FALK  
    and 
    SIR CHRISTOPHER FLOYD

Embed your Documents

We’ll use voyage-law-2, an embedding model specialized for the legal domain.

First, define a function to generate text embeddings using the Voyage AI API:


    import voyageai

    voyage_client = voyageai.Client()

    def embed_text(text: str) -> str:
        return voyage_client.embed([text], model="voyage-law-2").embeddings[0]

Generate a test embedding and print its dimension and first few elements.


    result = voyage_client.embed(["hello world"], model="voyage-law-2")
    embedding_dim = len(result.embeddings[0])
    print(embedding_dim)
    print(result.embeddings[0][:10])


    1024
    [0.000756315013859421, -0.02162403240799904, 0.0052010356448590755, -0.02917512319982052, -0.00796651840209961, -0.03238343447446823, 0.0660339742898941, 0.03845587745308876, -0.01913367211818695, 0.05562642216682434]

Load data into Milvus

Create the collection in Milvus


    from pymilvus import MilvusClient

    milvus_client = MilvusClient(uri="./milvus_legal.db")

    collection_name = "my_rag_collection"

Setting the URI as a local file, e.g., ./milvus_legal.db, is convenient as it automatically uses Milvus Lite to store all data in this file. For large-scale data, you can set up a more performant Milvus server on Docker or Kubernetes.

Drop the collection if it exists already, then create a new one:


    if milvus_client.has_collection(collection_name):
        milvus_client.drop_collection(collection_name)

    milvus_client.create_collection(
        collection_name=collection_name,
        dimension=embedding_dim,
        metric_type="IP",  # Inner product distance
        consistency_level="Strong",  # Strong consistency level
    )

If we don't specify any field information, Milvus will automatically create a default id field for primary key, and a vector field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.

Insert Data

Iterate through the pages of our document, create the embeddings, and then insert the data into Milvus.

We introduce a new field called text, non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level.


    from tqdm import tqdm

    data = []

    for i, page in enumerate(tqdm(pages, desc="Creating embeddings")):
        data.append({"id": i, "vector": embed_text(page), "text": page})

    milvus_client.insert(collection_name=collection_name, data=data)


    Creating embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00,  2.45it/s]
    {'insert_count': 20,
     'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
     'cost': 0}

Build the basic RAG system

Next, let’s define a query about the content of the court hearing:


    question = "who are the lawyers?"

We’ll use Milvus to search our indexed data. Later, we’ll integrate this with an LLM.


    search_res = milvus_client.search(
        collection_name=collection_name,
        data=[
            embed_text(question)
        ],  # Use the `embed_text` function to convert the question to an embedding vector
        limit=3,  # Return top 3 results
        search_params={"metric_type": "IP", "params": {}},  # Inner product distance
        output_fields=["text"],  # Return the text field
    )

Print the retrieved lines with their distances:


    import json
    retrieved_lines_with_distances = [
        (res["entity"]["text"], res["distance"]) for res in search_res[0]
    ]
    print(json.dumps(retrieved_lines_with_distances, indent=4))
    ```

The text associated with our embeddings is quite long but here is an example of what Milvus returns

["  n nNeutral Citation Number  [2024] EWCA Civ 767   n nCase No:  CA-2023 -001978   nIN THE COURT OF APPEAL ( CIVIL  DIVISION)  nON APPEAL FROM  THE HIGH COURT OF JUSTICE  nBUSINESS AND PROPERTY COURTS OF ENGLAND AND WALES  nBUSINESS LIST : PENSIONS (ChD)        nThe Hon Mr Justice Adam Johnson  n[2023] EWHC 1965  (Ch)  nRoyal Courts of Justice  nStrand, London, WC2A 2LL  n nDate:  09/07 /2024   nBefore :  n nLORD JUSTICE LEWISON  nLADY JUSTICE FALK  nand nSIR CHRISTOPHER FLOYD  n n- - - - - - - - - - - - - - - - - - - - - n nBetween :  n n BRITISH BROADCASTING CORPORATION  Appellant  n - and -  n (1) BBC P ENSION TRUST LIMITED  n(2) CHRISTINA BURNS   nRespondent s n n- - - - - - - - - - - - - - - - - - - - - n n     Michael Tennet KC and Edward Sawyer  (instructed by  Linklaters LLP ) nfor the Appellant       n      Brian Green KC and Joseph Steadman (instructed by  Slaughter and May  Solicitors ) nfor the  First Respondent  nAndrew Spink KC and Saul Margo  (instructed by Stephenson Harwood LLP ) nfor the Second Respondent     n nHearing dates: 25, 26 & 27/06/2024       n- - - - - - - - - - - - - - - - - - - - - n nApproved Judgment  n nThis judgment was handed down remotely at 11.00am on 09/07/2024 by circulation to the nparties or their representatives by e -mail and by release to the Nat ional Archives.  n n.............................  n",
        0.1101425364613533
    ],

Use an LLM to get a RAG Response

Combine the retrieved documents into a context and define system and user prompts for the language model:

    context = "n".join(
        [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
    )

    SYSTEM_PROMPT = """
    Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
    """

    USER_PROMPT = f"""
        Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
        <context>
        {context}
        </context>
        <question>
        {question}
        </question>
    """

Ollama has a fully compatible OpenAI API, meaning we can use the OpenAI Python SDK to call Ollama:

from openai import OpenAI

client = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='ollama', # required, but unused
)

response = client.chat.completions.create(
    model="llama3",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)

    According to the text, the lawyers involved in this case are:

    * Andrew Spink KC (instructed by Stephenson Harwood LLP) for the First Respondent
    * Saul Margo (also instructed by Stephenson Harwood LLP) for the First Respondent
    * Mr. Tennet (representing the Second Respondent)
    * Mr. Spink KC (also representing the Second Respondent)
    * Arden LJ (mentioned as having given a previous judgment in Stena Line)

    Note that "KC" stands for King's Counsel, which is a title of distinction conferred upon certain senior barristers in England and Wales.

Conclusion

Setting up RAG with Milvus and Ollama for legal data can make legal research a lot easier and more efficient.

Feel free to check out Milvus, the code on Github, and share your experiences with the community by joining our Discord.

More resources on Milvus and Ollama:

Updated on Aug 01, 2025

Stephen Batifol
Stephen Batifol is a Developer Advocate at Zilliz. He previously worked as a Machine Learning Engineer at Wolt, where he was working on the ML Platform and as a Data Scientist at Brevo. Stephen studied Computer Science and Artificial Intelligence. He enjoys dancing and surfing.

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

DeepSeek Always Busy? Deploy It Locally with Milvus in Just 10 Minutes—No More Waiting!

Learn how to set up DeepSeek-R1 on your local machine using Ollama, AnythingLLM, and Milvus in just 10 minutes. Bypass busy servers and enhance AI responses with custom data.

Matryoshka Representation Learning Explained: The Method Behind OpenAI’s Efficient Text Embeddings

Matryoshka Representation Learning (MRL) is a method for generating hierarchical, nested embeddings that capture information at multiple levels of abstraction.

Deliver RAG Applications 10x Faster with Zilliz and Vectorize

Zilliz Cloud delivers reliable vector storage and search, while Vectorize automates your RAG pipelines and keeps your embeddings up-to-date.