Skip to content

Building a RAG Assistant to explore my technical Portfolio

Nov 10, 2025 (3 months ago)

3 min read

RAG Assistant

In this project, I developed a RAG (Retrieval-Augmented Generation) Assistant designed to allow an AI to understand, explore, and answer questions about my software and data project portfolio as if it were a human technical reviewer.

The main motivation was simple:
If a recruiter can ask an AI "how did you implement data ingestion?" or "what does this project do?", then my portfolio stops being static and becomes interactive.

In this post, I explain how I designed it, the decisions I made, and what I learned during the process.


What problem did I want to solve?

A traditional technical portfolio has several limitations:

  • It requires the reader to manually navigate between repositories, READMEs, and notebooks.
  • It doesn't scale well when many projects accumulate.
  • It doesn't allow open-ended questions in natural language.

My goal was to create a system that indexes my projects, understands both documentation and code, and allows flexible natural language queries.


General Approach: RAG on my Portfolio

The system follows the classic Retrieval-Augmented Generation pattern, divided into three stages:

  1. Document Ingestion
  2. Semantic Retrieval
  3. Response Generation with an LLM

All knowledge lives in a local vector store, and the model only receives the strictly necessary context to answer each question.


Ingestion: How I turn my projects into knowledge

Ingestion was the most critical part of the project.

My portfolio includes multiple formats: Markdown, Python, PDF, DOCX, plain text, and Jupyter notebooks.

Key decisions:

  • Use dedicated loaders per file type.
  • Notebooks are loaded at the cell level, including Markdown and code.
  • Irrelevant folders like .git or __pycache__ are ignored.
  • Metadata is normalized to track the origin of each fragment.

Intelligent Segmentation: Prose ≠ Code

Not all content is treated the same:

  • Prose and documentation use larger chunks to preserve coherence.
  • Python code is divided into smaller chunks using language-aware segmentation.

This significantly improves the quality of retrieval and responses.


Embeddings and Vector Store

  • Embeddings: BAAI/bge-m3 (multilingual and normalized).
  • Vector store: ChromaDB, persistent and local.

Each change in the projects is reflected simply by re-running the ingestion process.


Retrieval: Maximal Marginal Relevance

To avoid redundant fragments, the system uses MMR, which allows retrieving diverse and complementary context within the same project.


Generation and Privacy

Generation is performed with a DeepSeek model.

The model only receives:

  • The user's question.
  • The retrieved fragments.

It does not have direct access to the files or the complete vector store, maintaining data control and privacy.


Interfaces: CLI and Web App

The system can be used from:

  • A command-line interface.
  • A web application developed with Dash and Bootstrap.

Both share exactly the same RAG pipeline.


Initial Validation

To test the system, I initially indexed a single real repository from my portfolio: evalcards, a Python library for generating model evaluation reports.

This allowed me to validate that the assistant understands metrics, documentation, and real code.


What this project demonstrates

  • Design of real RAG pipelines.
  • Technical judgment in chunking and retrieval.
  • Integration of AI with UX.
  • Product-oriented thinking and scalability.

Conclusion

This project transforms my portfolio into an AI-queryable knowledge base, allowing me to explain technical decisions and architecture interactively.

It is not just an AI demo; it is a new way to present technical expertise.