Blog
Enabling Fine-Grained Access Control with Milvus Row-Level RBAC

Enabling Fine-Grained Access Control with Milvus Row-Level RBAC

Nov 16, 202410 min read

Why Access Control Matters in Modern Data Systems

Access control is one of the most pressing challenges in modern data management, especially for enterprises. As organizations grow, particularly in environments with multiple departments and roles, both robust security and seamless access are critical. Let’s take a look at two examples where access control is especially important.

Healthcare: Balancing Privacy and Collaboration

In the healthcare sector, organizations must protect patient privacy while fostering collaboration among medical professionals. Imagine a doctor who needs full access to a patient’s medical records to make an accurate diagnosis and treatment plan. However, that doctor shouldn’t be able to access records of patients they’re not treating. To meet this need, fine-grained access control is essential, ensuring that only authorized medical staff can view sensitive patient data. This level of control helps organizations comply with strict regulations like HIPAA while safeguarding privacy.

Finance: Protecting Sensitive Data

The financial industry faces similar challenges when it comes to access control. Banks and financial institutions handle massive amounts of sensitive data—account details, transaction histories, credit scores, etc. This data is often converted into vectors and stored in a vector database like Milvus for fraud detection, risk analysis, and personalized customer experiences. Without proper access controls, sensitive data could be exposed to unauthorized parties, leading to significant financial and legal consequences.

Fine-grained controls, such as row-level permissions, allow institutions to restrict access to specific users—for example, a client manager can access only their own clients' account data. Even broader access, required by risk management teams, can be carefully monitored and restricted to prevent misuse.

Milvus Row-level RBAC: a Fine-Grained Access Control Solution

Role-Based Access Control (RBAC) is a security model where access to resources is granted based on a user’s role within an organization. Roles define permissions, and users inherit those permissions, ensuring secure and efficient management of access rights.

Milvus is an open-source, high-performance vector database built for scale. It is perfect for building model AI applications such as retrieval augmented generation (RAG), semantic search engines, recommendation systems, and chatbots. Milvus offers a fine-grained RBAC solution based on a permission model that uses bitmap indexing to enable row-level access control. This feature allows you to control access to specific Milvus resources and permissions based on user roles and privileges. Currently, Milvus RBAC is only available in Python and Java.

Milvus RBAC offers several advantages:

Speed and Efficiency: It allows for fast querying of permissions in large datasets.
Flexibility: It adapts as roles and responsibilities evolve, ensuring permissions remain up-to-date.

RBAC Fundamentals

Roles and Permissions

Role: Represents a user’s role in the system, with each role assigned a specific set of permissions.
Permission: Specifies access rights to individual rows in a data connection (table) within Milvus, such as the ability to read, write, or delete specific data.

Bitmap Index building

Bitmap indexes are the foundation of this access control mechanism:

Each role is associated with a bitmap that indicates the rows it can access.
The bitmap's length matches the number of rows in the connection:
- A 1 in a position means the role has access to that row.
- A 0 means no access.

Bitmap Index Usage

Granting Permissions: To give a role access to a row, set the corresponding bit in the role’s bitmap to 1.
Checking Permissions: To verify if a role can access a specific row, simply check if the corresponding bit in the bitmap is 1.

Let’s say we have a connection (table), Collection A, that stores enterprise knowledge. Each row represents content identified by doc_id and its associated knowledge base, kb_id.

Row ID	PK	Data	doc_id	kb_id	Role
1	0	Data A	1	1	role1
2	1	Data B	1	1	role1
3	2	Data C	2	1	role1
4	3	Data D	2	1	role1
5	4	Data E	3	2	role2

Roles are defined as follows:

Role 1: Can access rows 1, 2, 3, and 4 (kb_id = 1).
Role 2: Can access row 5 (kb_id = 2).

Query Operations

When a user queries the data, their role’s bitmap is combined with the query conditions to filter rows they are authorized to access. For example, if a user with Role 1 queries "all data," their bitmap 11110 will return rows 1–4 (Data A, Data B, Data C, and Data D).

Updating Permissions

To add or remove access for a role, simply update the corresponding bit in the role’s bitmap. For example, setting a bit to 1 grants access, while setting it to 0 revokes it.

Advantages and Considerations

Efficiency: Bitmap operations are fast and well-suited for managing permissions in large datasets.
Low Storage Overhead: Bitmaps consume minimal storage, even for datasets with millions of rows.
Flexibility: Bitmap indexes support complex query conditions and combinations, making them highly adaptable to various use cases.

Demonstrating Milvus Row-level RBAC

In this section, we’ll demonstrate how Milvus manages access to various knowledge bases of a large enterprise.

Use Case: An Enterprise RAG with Multiple Knowledge Bases

In a large enterprise, different departments often maintain separate knowledge bases—some public, others confidential—for their RAG application powered by Milvus. To manage permissions effectively across these knowledge bases, we implement role-based access controls (RBAC) based on entities or documents. For example, a super admin role (admin) might oversee access, with additional roles for specific business units like CEO, finance, sales, and developer, each having access to different sets of data.

Figure: Access control for different roles in a large enterprise

Defining Permission Columns

To manage access, we store permission data in an array column within Milvus. This column will define which roles have access to which rows of data. The field_name for this permission column is customizable, and the array’s size can be adjusted based on the user’s needs. A BITMAP index is then created for this column, making permission checks efficient.

Below is how this setup is done in code:

# 1. Set up a Milvus client
client = MilvusClient(
    uri=CLUSTER_ENDPOINT
)

# 2. Create a collection
schema = MilvusClient.create_schema(
    auto_id=False,
    enable_dynamic_field=False,
)

# 3. define schema 
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="data", datatype=DataType.VARCHAR, max_length=100)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=128)

# 4. add security column
schema.add_field(field_name="security_group", datatype=DataType.ARRAY, 
                 element_type=DataType.VARCHAR, max_capacity=10, max_length=100)

index_params = MilvusClient.prepare_index_params()
index_params.add_index(
    field_name="vector",
    index_type="IVF_FLAT",
    metric_type="L2",
    params={"nlist": 1024}
)

# 5. create bitmap index for security column
index_params.add_index(field_name="security_group", 
                       index_type="BITMAP")

# 6. create collection
client.create_collection(
    collection_name="test_collection",
    schema=schema,
    index_params=index_params
)

Write Permissions

When inserting new data, you assign permissions by specifying which roles can access each row. This can be done by writing the corresponding role(s) into the security_group column.

Here’s an example of how it works:

data =[]
data.append({
        "id": random.randint(0, 100000),
        "vector": [ random.uniform(-1, 1) for _ in range(128) ],
        "data": "data" + str(random.randint(0,100000)),
       # ceo role can read
      "security_group": ["ceo"]
})

data.append({
        "id": random.randint(0, 100000),
        "vector": [ random.uniform(-1, 1) for _ in range(128) ],
        "data": "data" + str(random.randint(0,100000)),
        # finance role can read
       "security_group": ["finance"]
})

data.append({
        "id": random.randint(0, 100000),
        "vector": [ random.uniform(-1, 1) for _ in range(128) ],
        "data": "data" + str(random.randint(0,100000)),
        # both sales and developer can read
      "security_group": ["sales", "finance"]
})

res = client.insert(collection_name="test_collection", data=data)

Query Permission

When performing search or query operations, it’s essential to restrict the results to only show the data that a user’s specific role has access to. Data outside of the user’s permitted roles will be hidden from the query results. Here’s how this can be done:

Querying Data Based on Role Permissions

In the examples below, we use the array_contains() function to filter data based on role-specific permissions. Each query retrieves only the data that the given role is authorized to see.

res = client.query(
    collection_name="test_collection",
  # Query data visible only to the CEO role
   filter='array_contains(security_group, "ceo")',
    output_fields=["id", "data", "security_group"],
)
print("ceo role read:")
print(res)

res = client.query(
    collection_name="test_collection",
   # Query data visible only to the Sales role
   filter='array_contains(security_group, "sales")',
    output_fields=["id", "data", "security_group"],
)
print("sales role read:")
print(res)

res = client.query(
    collection_name="test_collection",
    # Query data visible only to the Developer role
    filter='array_contains(security_group, "develop")',
    output_fields=["id", "data", "security_group"],
)
print("developer role read:")
print(res)

res = client.query(
    collection_name="test_collection",
  # Query data visible to either the Developer or CEO roles
   filter='array_contains_any(security_group, ["develop", "ceo"])',
    output_fields=["id", "data", "security_group"],
)
print("developer or ceo role read:")
print(res)

Here’s an example of what the output would look like:

ceo role read:
data: [
"{'security_group': ['ceo'], 'id': 3443, 'data': 'data35077'}", 
"{'security_group': ['ceo'], 'id': 12181, 'data': 'data99090'}", 
"{'security_group': ['ceo'], 'id': 16551, 'data': 'data74619'}", 
"{'security_group': ['ceo'], 'id': 24466, 'data': 'data1373'}", ...
sales role read:
data: [
"{'data': 'data75305', 'security_group': ['sales'], 'id': 9122}", 
"{'data': 'data61054', 'security_group': ['sales'], 'id': 20087}", 
"{'data': 'data47948', 'security_group': ['sales', 'develop'], 'id': 21726}", 
"{'data': 'data8596', 'security_group': ['sales'], 'id': 40090}", ... 
developer role read:
data: [
"{'data': 'data1515', 'security_group': ['develop'], 'id': 6429}", 
"{'data': 'data47031', 'security_group': ['develop'], 'id': 10953}", 
"{'data': 'data47948', 'security_group': ['sales', 'develop'], 'id': 21726}", 
"{'data': 'data86894', 'security_group': ['develop'], 'id': 56980}"], ... 
developer or ceo role read:
data: [
"{'data': 'data35077', 'security_group': ['ceo'], 'id': 3443}",
 "{'data': 'data1515', 'security_group': ['develop'], 'id': 6429}", 
 "{'data': 'data47031', 'security_group': ['develop'], 'id': 10953}", 
 "{'data': 'data99090', 'security_group': ['ceo'], 'id': 12181}", ...

This approach ensures that each role sees exactly the data they're authorized to view while hiding any unauthorized data. You can also stack multiple roles in the security_group array, allowing for flexible and efficient permission management.

Custom Filters with Role-Based Access

In some cases, users may need to apply custom filters when querying data. These filters can be combined with role-based access to refine searches. By applying the role-based access control along with custom filter conditions, we can ensure that users only retrieve data they are permitted to see based on both the role and the specific query criteria.

For example:

res = client.query(
    collection_name="test_collection",
   # Sales role queries data with the filter "pk in [1, 3, 5]"
   filter='pk in [1, 3, 5] && array_contains(security_group, "sales")',
    output_fields=["id", "data", "security_group"],
)

res = client.query(
    collection_name="test_collection",
   # Developer role queries data with the filter "pk > 10"
   filter='pk > 10 && array_contains(security_group, "develop")',
    output_fields=["id", "data", "security_group"],
)

In these examples, the queries not only check the role-based permissions but also apply additional filters (like pk in [1, 3, 5] or pk > 10) to narrow down the results based on specific criteria. This flexibility allows users to craft highly targeted queries while maintaining tight control over data access.

Update Permissions

There are times when you need to change permissions—whether it's granting access to a specific role for a particular data row, or removing that access. Milvus makes this process easy with its upsert API, which allows you to update the permissions associated with a row of data.

Let’s walk through how you can use this API to modify permissions for a specific row.

Example: Updating Permissions for a Data Row

To update the permissions for a row, you simply adjust the security_group field. In this example, we'll add the "sales" role to a row that was previously only accessible by the "finance" role.

upsert_row_update = {
        "id": 101,
        "vector": upsert_vector,
        "data": upsert_data,
        # update role 
       "security_group": ["finance", "sales"]
}
res = client.upsert(
    collection_name="test_collection",
    data=upsert_row_update)

Result:

Before the upsert, the row with pk = 101 looked like this:

pk = 101:
data: [" {'id': 101,
'data': 'data63309', 
'vector': [0.38069534, 0.15088418, -0.6266929, -0.6038463, 0.2516377...],
'security_group': ['finance'],"]  

after upsert
data: ["{'id': 101, 
'data': 'data63309', 
'vector': [0.38069534, 0.15088418, -0.6266929, -0.6038463, 0.2516377...], 
'security_group': ['finance', 'sales']}"]

By using the security_group array column and bitmap index filtering, we've established a solid foundation for row-level read access control, allowing us to effectively manage permissions during queries. This method delivers strong performance and fine-grained control over access rights. However, it does require a more hands-on approach from administrators, who must carefully manage permissions when inserting or updating data and ensure a well-thought-out permission strategy when creating tables.

Conclusion

Implementing fine-grained access control is a critical component of modern data management, especially for industries dealing with sensitive information like healthcare and finance. Milvus offers row-level RBAC (Role-Based Access Control) which is a robust solution for managing data access with precision and efficiency.

This approach not only enhances security but also offers flexibility for evolving business needs, ensuring that access policies can adapt as roles and responsibilities change. With its powerful tools and flexible permissions model, Milvus empowers organizations to create highly secure, scalable data systems that meet regulatory requirements while offering seamless access to the right people.

For even more dynamic permission management and inheritance, Zilliz Cloud, the fully managed service of Milvus, has even more fine-grained permission features. These enhancements will not only improve administrative efficiency but also offer greater flexibility, making it easier to meet a wider range of business requirements. For more information about Zilliz Cloud RBAC, check out its documentation.

Keep Reading

AI Integration in the Legal Industry: Revolutionizing Legal Practice with Data-Driven Solutions

Discover how AI and vector databases are revolutionizing legal work through advanced document processing, semantic search, and contract analysis capabilities.

Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀

We're thrilled to announce the release of Milvus 2.5, a significant step in our journey to build the world's most complete solution for all search workloads.

Combining Images and Text Together: How Multimodal Retrieval Transforms Search

Discuss multimodal retrieval and composed image retrieval (CIR) techniques, including Pic2Word, CompoDiff, CIReVL, and MagicLens.