A technique that enhances LLM outputs by retrieving relevant information from an external knowledge base before generating a response, combining generative power with accurate, up-to-date data.
RAG solves one of the core limitations of LLMs: their knowledge is frozen at training time. By connecting a model to a dynamic knowledge base, you get responses that are both fluent and factually grounded.
RAG pipelines have two phases: (1) Retrieval — convert the query to an embedding, find the most similar chunks from your knowledge base using vector similarity search; (2) Generation — pass the query plus retrieved context to the LLM, which generates a grounded response.
The knowledge base is stored as vector embeddings in Supabase pgvector, Pinecone, or Weaviate. When a query arrives, it is embedded and compared against stored embeddings to find the most semantically similar chunks.
Open-source Firebase alternative with pgvector for AI apps
Managed vector database built for production AI search
Weekly AI tool reviews, news digests, and how-to guides.
Join 12,000+ builders