Advanced1 min readUpdated

Building a RAG Pipeline with Qdrant and Gemini

A practical walkthrough of retrieval-augmented generation: chunking, embeddings, vector search with Qdrant, and grounding answers with Gemini.

Stack
  • Qdrant
  • Gemini
  • Python
  • FastAPI

Part 3 of 3

Series

Building an LLM Knowledge System

  1. Understanding Vector Embeddings for Search
  2. Prompt Engineering Patterns for Production
  3. Building a RAG Pipeline with Qdrant and Gemini

What you’ll learn

  • How to chunk documents so retrieval stays relevant
  • Storing and querying embeddings in Qdrant
  • Grounding Gemini answers in retrieved context to cut hallucinations
  • Wiring the retrieve-then-generate loop behind an API
On this page

Retrieval-augmented generation (RAG) grounds a language model in your own data, so answers stay accurate and current without retraining. Here is the pipeline I reach for.

The four stages

  1. Chunk the source documents into passages.

  2. Embed each chunk into a vector.

  3. Store the vectors in Qdrant for fast similarity search.

  4. Retrieve the top matches and pass them to Gemini as context.

  5. Build the Docker Image: Use the Dockerfile to build the Docker image for the Express application.

    sh
    docker build -t express-lb .

Request Workflow Diagram

graph TD;
    A[Client] -->|HTTP Request| B[Nginx Load Balancer :8000];
    B -->|Round Robin<br/>Primary| C[App Instance 1 :4500];
    B -->|Round Robin<br/>Primary| D[App Instance 2 :4501];
    B -->|Backup<br/>if 1 & 2 down| E[App Instance 3 :4502<br/>Backup];
    C --> F[Response];
    D --> F;
    E -->|Only if needed| F;
    F --> A;

Searching with Qdrant

Qdrant returns the nearest neighbours for a query embedding in milliseconds:

tssrc/lib/foo.ts
const hits = await client.search('articles', {
    vector: queryEmbedding,
    limit: 5,
});

Feed those passages into the prompt and let the model answer from them. The result is grounded, cite-able output that keeps improving as your corpus grows.

Request Workflow Diagram

graph TD;
    A[Client] -->|HTTP Request| B[Nginx Load Balancer :8000];
    B -->|Round Robin<br/>Primary| C[App Instance 1 :4500];
    B -->|Round Robin<br/>Primary| D[App Instance 2 :4501];
    B -->|Backup<br/>if 1 & 2 down| E[App Instance 3 :4502<br/>Backup];
    C --> F[Response];
    D --> F;
    E -->|Only if needed| F;
    F --> A;

VS Code Laravel

module.exports = {
root: true,
env: {
browser: true,
node: true,
},
parserOptions: {
parser: '@babel/eslint-parser',
requireConfigFile: false,
},
extends: [
'@nuxtjs',
'plugin:nuxt/recommended',
'prettier'
],
plugins: ['prettier'],
rules: {
'prettier/prettier': ['error'],
'vue/html-indent': ['error', 4],
'vue/singleline-html-element-content-newline': 0,
'vue/component-name-in-template-casing': ['error', 'PascalCase'],
'vue/valid-v-slot': [
'error',
{
allowModifiers: true,
},
],
},
globals: {
_: true,
},
}
view raw .eslintrc.js hosted with ❤ by GitHub
# Ignore artifacts:
build
coverage
view raw .prettierignore hosted with ❤ by GitHub
{
"semi": false,
"singleQuote": true,
"tabWidth": 4,
"printWidth": 120
}
view raw .prettierrc hosted with ❤ by GitHub
{
"devDependencies": {
"@babel/eslint-parser": "^7.15.0",
"@nuxtjs/eslint-config": "^6.0.1",
"@nuxtjs/eslint-module": "^3.0.2",
"eslint": "^7.32.0",
"eslint-config-prettier": "^8.3.0",
"eslint-plugin-nuxt": "^2.0.0",
"eslint-plugin-prettier": "^3.4.0",
"eslint-plugin-vue": "^7.15.1",
"prettier": "^2.3.2"
}
}
view raw package.json hosted with ❤ by GitHub
{
"vetur.format.defaultFormatter.html": "none",
// Set the default
"editor.formatOnSave": false,
// Enable per-language
"[javascript]": {
"editor.formatOnSave": true
},
"[vue]": {
"editor.formatOnSave": true
}
}
view raw settings.json hosted with ❤ by GitHub