Technical Architecture

How I Built This Portfolio

An interactive portfolio with RAG chat, a reactive 3D avatar, local vector search, and smart failover across multiple AI providers — all running without an external database server.

Flow of a Question

User

Asks a question via chat or submits a job description

1

Next.js API Route

Validates input, sanitizes, applies rate limit

2

RAG Pipeline

Generates embedding → vector + FTS5 search → builds context

3

LLM (Gemini/Groq)

Receives context + question → generates structured JSON response

4

Avatar Stage

Interprets animation, mood, gesture, and focus from response

5

Rendering

Markdown response + 3D avatar reacting in real time

6

Tech Stack

Frontend

  • Next.js 16

    App Router with Server Components

  • React 19

    Latest APIs including Server Actions

  • Three.js + R3F

    3D avatar with React Three Fiber

  • Tailwind CSS 4

    Dark glassmorphic theme

AI & LLMs

  • Google Gemini

    Primary chat & embeddings

  • Groq (LLaMA 3.1)

    Free fallback for chat

  • Cohere

    Fallback for embeddings

  • RAG Pipeline

    Retrieval-Augmented Generation

Database

  • SQLite

    Local database, no external server

  • sqlite-vec

    Native vector search

  • FTS5

    Full-text keyword search

  • WAL Mode

    Concurrent writes

3D Avatar

  • GLB Model

    Exported from Avaturn

  • 11 Animations

    FBX: idle, wave, victory, capoeira...

  • State Machine

    Avatar state controller

  • 5 Moods

    Calm, warm, confident, focused, playful

Security

  • Sanitization

    XSS and injection prevention

  • Rate Limiting

    Per IP and session, configurable

  • Content Filter

    Profanity detection

  • Prompt Guard

    Prompt injection prevention

Observability

  • Analytics

    Real-time event tracking

  • Metrics

    Latency, tokens, failure rate

  • Sessions

    Per-visitor tracking

  • Export

    Data exportable as CSV

Architectural Decisions

Why SQLite with vector search?

Eliminates the need for an external database like Postgres + pgvector. With sqlite-vec, embedding search runs locally with minimal latency. Ideal for a single-tenant portfolio that needs to be easy to deploy.

Why multiple AI providers?

Gemini has a generous free tier for embeddings, Groq offers LLaMA 3.1 with ultra-low latency for chat. If one provider hits a rate limit, the system automatically falls back to the next — zero downtime for the visitor.

Why structured JSON responses instead of streaming?

The avatar needs to know which animation, mood, and gesture to play BEFORE displaying the response. Structured JSON responses ensure the avatar state is coherent with the content — impossible with partial streaming.

Why hybrid RAG (vector + FTS5)?

Vector search captures semantics ('AI experience' finds 'machine learning'), while FTS5 captures exact keyword matches ('React', 'TypeScript'). Combining both provides far superior recall.

Try the AI Chat

Ask a question and see this entire architecture in action