What is the REALM architecture for embeddings?

The REALM (Retrieval-Augmented Language Model) architecture for embeddings is designed to enhance language models by integrating a retrieval mechanism that accesses external knowledge. At its core, REALM uses dense vector embeddings to represent both input queries (questions or prompts) and documents (external knowledge sources) in a shared semantic space. These embeddings allow the model to efficiently retrieve relevant documents that can inform its predictions. Unlike traditional language models that rely solely on internal parameters, REALM dynamically fetches information from a precomputed corpus, making it particularly useful for tasks requiring up-to-date or domain-specific knowledge.

The architecture consists of two main components: a neural knowledge retriever and a knowledge-augmented language model. The retriever uses dual encoders—one for the input query and another for documents—to generate embeddings. For example, a query like "What causes auroras?" is encoded into a vector, and the retriever searches a large corpus (e.g., Wikipedia) for documents whose embeddings are closest in vector space. This is done using Maximum Inner Product Search (MIPS), which efficiently finds documents with the highest similarity scores. The retrieved documents are then passed to the language model, which processes the combined input (query + documents) to generate a final answer. During training, both the retriever and language model are optimized jointly, ensuring that the embeddings improve retrieval accuracy and downstream task performance.

A key detail is how REALM precomputes document embeddings for efficiency. During preprocessing, all documents in the corpus are encoded offline using the document encoder. At inference time, only the query is encoded, and retrieval happens in near-real time. For instance, in open-domain question answering, this setup allows REALM to quickly fetch passages about "solar wind interacting with Earth's magnetosphere" when asked about auroras. The model’s training also incorporates noise-contrastive estimation, where it learns to distinguish between relevant and irrelevant documents by comparing their embeddings. This approach ensures that the retriever prioritizes high-quality context for the language model. By unifying retrieval and language modeling, REALM’s embeddings enable more informed predictions while maintaining scalability—a practical balance for developers building systems that require both broad knowledge and efficient computation.