Frontend
Next.js 16
App Router with Server Components
React 19
Latest APIs including Server Actions
Three.js + R3F
3D avatar with React Three Fiber
Tailwind CSS 4
Dark glassmorphic theme
An interactive portfolio with RAG chat, a reactive 3D avatar, local vector search, and smart failover across multiple AI providers — all running without an external database server.
User
Asks a question via chat or submits a job description
Next.js API Route
Validates input, sanitizes, applies rate limit
RAG Pipeline
Generates embedding → vector + FTS5 search → builds context
LLM (Gemini/Groq)
Receives context + question → generates structured JSON response
Avatar Stage
Interprets animation, mood, gesture, and focus from response
Rendering
Markdown response + 3D avatar reacting in real time
Next.js 16
App Router with Server Components
React 19
Latest APIs including Server Actions
Three.js + R3F
3D avatar with React Three Fiber
Tailwind CSS 4
Dark glassmorphic theme
Google Gemini
Primary chat & embeddings
Groq (LLaMA 3.1)
Free fallback for chat
Cohere
Fallback for embeddings
RAG Pipeline
Retrieval-Augmented Generation
SQLite
Local database, no external server
sqlite-vec
Native vector search
FTS5
Full-text keyword search
WAL Mode
Concurrent writes
GLB Model
Exported from Avaturn
11 Animations
FBX: idle, wave, victory, capoeira...
State Machine
Avatar state controller
5 Moods
Calm, warm, confident, focused, playful
Sanitization
XSS and injection prevention
Rate Limiting
Per IP and session, configurable
Content Filter
Profanity detection
Prompt Guard
Prompt injection prevention
Analytics
Real-time event tracking
Metrics
Latency, tokens, failure rate
Sessions
Per-visitor tracking
Export
Data exportable as CSV
Eliminates the need for an external database like Postgres + pgvector. With sqlite-vec, embedding search runs locally with minimal latency. Ideal for a single-tenant portfolio that needs to be easy to deploy.
Gemini has a generous free tier for embeddings, Groq offers LLaMA 3.1 with ultra-low latency for chat. If one provider hits a rate limit, the system automatically falls back to the next — zero downtime for the visitor.
The avatar needs to know which animation, mood, and gesture to play BEFORE displaying the response. Structured JSON responses ensure the avatar state is coherent with the content — impossible with partial streaming.
Vector search captures semantics ('AI experience' finds 'machine learning'), while FTS5 captures exact keyword matches ('React', 'TypeScript'). Combining both provides far superior recall.
Ask a question and see this entire architecture in action