Technical Architecture

How I Built This Portfolio

An interactive portfolio with RAG chat, a reactive 3D avatar, local vector search, and smart failover across multiple AI providers — all running without an external database server.

Flow of a Question

User

Asks a question via chat or submits a job description

Next.js API Route

Validates input, sanitizes, applies rate limit

RAG Pipeline

Generates embedding → vector + FTS5 search → builds context

LLM (Gemini/Groq)

Receives context + question → generates structured JSON response

Avatar Stage

Interprets animation, mood, gesture, and focus from response

Rendering

Markdown response + 3D avatar reacting in real time

Tech Stack

Frontend

Next.js 16
App Router with Server Components
React 19
Latest APIs including Server Actions
Three.js + R3F
3D avatar with React Three Fiber
Tailwind CSS 4
Dark glassmorphic theme

AI & LLMs

Google Gemini
Primary chat & embeddings
Groq (LLaMA 3.1)
Free fallback for chat
Cohere
Fallback for embeddings
RAG Pipeline
Retrieval-Augmented Generation

Database

SQLite
Local database, no external server
sqlite-vec
Native vector search
FTS5
Full-text keyword search
WAL Mode
Concurrent writes

3D Avatar

GLB Model
Exported from Avaturn
11 Animations
FBX: idle, wave, victory, capoeira...
State Machine
Avatar state controller
5 Moods
Calm, warm, confident, focused, playful

Security

Sanitization
XSS and injection prevention
Rate Limiting
Per IP and session, configurable
Content Filter
Profanity detection
Prompt Guard
Prompt injection prevention

Observability

Analytics
Real-time event tracking
Metrics
Latency, tokens, failure rate
Sessions
Per-visitor tracking
Export
Data exportable as CSV

Architectural Decisions

Why SQLite with vector search?

Eliminates the need for an external database like Postgres + pgvector. With sqlite-vec, embedding search runs locally with minimal latency. Ideal for a single-tenant portfolio that needs to be easy to deploy.

Why multiple AI providers?

Gemini has a generous free tier for embeddings, Groq offers LLaMA 3.1 with ultra-low latency for chat. If one provider hits a rate limit, the system automatically falls back to the next — zero downtime for the visitor.

Why structured JSON responses instead of streaming?

The avatar needs to know which animation, mood, and gesture to play BEFORE displaying the response. Structured JSON responses ensure the avatar state is coherent with the content — impossible with partial streaming.

Why hybrid RAG (vector + FTS5)?

Vector search captures semantics ('AI experience' finds 'machine learning'), while FTS5 captures exact keyword matches ('React', 'TypeScript'). Combining both provides far superior recall.

Try the AI Chat

Ask a question and see this entire architecture in action