Zilliz logo

Graph-based Recommendation System with Milvus

Learn how to build a graph-based recommendation system using PinSage, Deep Graph Library (DGL), MovieLens datasets, and Milvus.

December 1, 2020 by Zilliz

Link copied

Background

A recommendation system [1] (RS) can identify user preferences based on their historical data and suggest products or items to them accordingly. Companies will enjoy considerable economic benefits from a well-designed recommendation system.

There are three elements in a complete set of recommendation system: user model, object model, and the core element, recommendation algorithm. Currently, established algorithms include collaborative filtering, implicit semantic modelling, graph-based modelling, combined recommendation, and more. In this article, we will provide some brief instructions on how to use Milvus to build a graph-based recommendation system.

Graph Convolutional Neural (GCN) Networks

PinSage

Users tag contents to their interest (pins) and related categories (boards) on Pinterest’s website, accumulating 2 billion pins, 1 billion boards and 18 billion edges (an edge is created only when the pin falls into a specific board). The following illustration is a pins-boards bipartite graph.

1.png

A pins-boards bipartite graph.

PinSage uses pins-boards bipartite graph to generate high-quality embeddings from pins for recommendations tasks such as pins recommendation. It has three key innovations:

Dynamic convolutions

Unlike the traditional GCN algorithms, which perform convolutions on the feature matrices and the full graph, PinSage samples the neighborhood of the nodes, and performs more efficient local convolutions through dynamic construction of computational graph.

Constructing convolutions with random walk modelling

Performing convolutions on the entire neighborhood of the node will result in a massive computational graph. To reduce the computation required, traditional GCN algorithms examine k-hop neighbor; PinSage simulates random walk to set the highly-visited contents as the key neighborhood and construct convolution based on it.

Efficient MapReduce Inference

Performing local convolution on nodes takes with it the problem of repeated computation. This is because k-hop neighborhood overlaps. In each aggregate step, PinSage maps all nodes without repeated calculation, links them to the corresponding upper-level nodes, and then retrieves the embeddings of the upper-level nodes.

DGL

Deep Graph Library (DGL)[4] is a Python package designed for building graph-based neural network model on top of existing deep learning frameworks, such as PyTorch, MXNet, Gluon, and more. With its easy-to-use backend interfaces, DGL can be readily implanted to frameworks that are based on tensor and supporting auto-generation. The PinSage algorithm that this article is dealing with is optimized based on DGL and PyTorch. https://github.com/dmlc/dgl/tree/master/examples/pytorch/pinsage

Milvus

The next thing to obtaining embeddings is to conduct similarity search in these embeddings to find items that might be of interest.

Milvus[5] is an open-source AI-powered similarity search engine supporting a wide variety of unstructured data-converted vectors. It has been adopted by 400+ enterprise users, and has applications spanning image processing, computer vision, natural language processing (NLP), speech recognition, recommendation engines, search engines, new drug development, gene analysis, and more. The following shows a general similarity search process using Milvus:

  1. The user uses deep learning models to convert unstructured data to feature vectors and import them to Milvus.
  2. Milvus stores and builds indexes for the feature vectors.
  3. After receiving a vector query from the user, Milvus outputs a result similar to the input vector. Upon request, Milvus searches and returns vectors most similar to the input vectors.

beike-intelligent-house-platform-diagram.jpg

Similarity search process using Milvus.

Implementation of Recommendation System

System Overview

Here we will use the following figure to illustrate the basic process of building a graph-based recommendation system with Milvus. The basic process includes data preprocessing, PinSage model training, data loading, searching, and recommending.

3-building-graph-based-recommender-system.png

Building a graph-based recommendation system with Milvus.

Data Preprocessing

The recommendation system we build in this article is based on the open data sets MovieLens[5] (m1–1m), which contain 1,000,000 ratings of 4,000 movies by 6,000 users. Collected by GroupLens Research Labs, the data include movie information, user characteristics, and ratings of movies. In this article, we will use users’ movie history to build a graph with classification characteristics, a users-movies bipartite graph g.

Build graph

graph_builder = PandasGraphBuilder()
graph_builder.add_entities(users, 'user_id', 'user')
graph_builder.add_entities(movies_categorical, 'movie_id', 'movie')
graph_builder.add_binary_relations(ratings, 'user_id', 'movie_id', 'watched')
graph_builder.add_binary_relations(ratings, 'movie_id', 'user_id', 'watched-by')
g = graph_builder.build()

PinSage Model Training

The embedding vectors of pins generated by using the PinSage model are feature vectors of the acquired movie info. First, create a PinSage model according to the bipartite graph g and the customized movie feature vector dimensions (which is 256-dimension at default). Then, train the model with PyTorch to obtain the h_item embeddings of 4000 movies.

# Define the model
model = PinSAGEModel(g, item_ntype, textset, args.hidden_dims, args.num_layers).to(device)
opt = torch.optim.Adam(model.parameters(), lr=args.lr)
# Get the item embeddings
for blocks in dataloader_test:
   for i in range(len(blocks)):
   blocks[i] = blocks[i].to(device)
   h_item_batches.append(model.get_repr(blocks))
h_item = torch.cat(h_item_batches, 0)

Data Loading

Load the movie embeddings h_item generated by the PinSage model into Milvus, and Milvus will return the corresponding IDs. Import the IDs and the corresponding movie information into MySQL.

# Load data to Milvus and MySQL
status, ids = milvus.insert(milvus_table, h_item)
load_movies_to_mysql(milvus_table, ids_info)

Searching

Get the corresponding embeddings in Milvus based on the movie IDs and carry out a similarity search with these embeddings in Milvus. Then, find the corresponding movie information in a MySQL database accordingly.

# Get embeddings that users like
_, user_like_vectors = milvus.get_entity_by_id(milvus_table, ids)
# Get the information with similar movies
_, ids = milvus.search(param = {milvus_table, user_like_vectors, top_k})
sql = "select * from " + movies_table + " where milvus_id=" + ids + ";"
results = cursor.execute(sql).fetchall()

Recommendation

Finally, the system will recommend movies most similar to the search queries to the users.

Above is the main workflow of building a recommendation system. For more details, see Milvus-Bootcamp: https://github.com/milvus-io/bootcamp/tree/0.10.0/solutions/graph_based_recommend.

System Demo

In addition to a FastAPI method, the project also has a front-end demo. By simulating the process of a user clicking on the movies to his liking, the demo makes movie recommendations.

The system also provides a FastAPI interface and front-end display that recommend movies catering to users’ tastes. You can simulate the process by logging into the Movie Recommendation System and mark the movies you like.

4-system-demo-milvus-graph-based-recommender-system.png

5-system-demo-milvus-graph-based-recommender-system.gif

Conclusion

PinSage is a graph convolutional neural network that can be used for recommendation tasks. It generates high-quality embeddings of pins via a pins-boards bipartite graph. We use the MovieLens datasets to create a users-movies bipartite graph, and the DGL open-source package and the PinSage model to generate feature vectors of movies.

The vectors are then stored in Milvus, a similarity embeddings search engine. Recommendations of movies are returned to users afterwards.

Milvus embedding vector similarity search engine can be integrated into a wide variety of deep learning platforms and multiple AI scenarios. By fully leveraging the optimized vector retrieval algorithms and integrated heterogeneous computing resources, Milvus can continually empower companies with vector retrieval capabilities.

References

  1. https://patentimages.storage.googleapis.com/0e/96/31/98058cb476cd77/CN105913296A.pdf
  2. Graph Convolutional Neural Networks for Web-Scale Recommender Systems, arxiv: 1806.01973
  3. https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48
  4. https://docs.dgl.ai/en/latest/
  5. http://files.grouplens.org/datasets/movielens/ml-1m.zip

🤗 Join the Community

Don’t be a stranger, follow us on Twitter or join us on Slack!