Building a RAG-Powered Chatbot for My Portfolio Site
A technical deep-dive into creating an AI-powered assistant using Retrieval-Augmented Generation
The Problem
I wanted my portfolio site to do more than just display static information. I needed a way for recruiters, potential clients, and collaborators to learn about my work interactively—without me having to answer the same questions repeatedly.
The solution? A RAG (Retrieval-Augmented Generation) chatbot that could answer questions about my experience, skills, and background intelligently.
I also wanted to have some fun integrating my pups into the action so you will notice there are two ambassadors of the site in the chat, Gus and Mitch, who are choosen randomly.
What is RAG?
Retrieval-Augmented Generation combines two powerful concepts:
- Retrieval: Finding relevant information from a knowledge base
- Generation: Using an LLM to create natural, contextual responses
Instead of fine-tuning an entire language model on my data (expensive and overkill), RAG lets me:
- Store my information in a vector database
- Retrieve relevant chunks when someone asks a question
- Feed those chunks to an LLM for a natural response
The Tech Stack
Here's what I used to build this:
Core Technologies
- Frontend: Next.js (React framework)
- Vector Database: Pinecone (for storing embeddings)
- Embeddings: OpenAI's
text-embedding-ada-002(1536 dimensions) - LLM: OpenAI's GPT-4 (for generating responses)
- Language: TypeScript/JavaScript
Why These Choices?
Pinecone: Managed vector database with excellent performance and simple API. No infrastructure management needed.
OpenAI Embeddings: Best-in-class semantic search. The text-embedding-ada-002 model produces 1536-dimensional vectors that capture meaning exceptionally well.
Next.js: My portfolio site was already built with Next.js, so integrating the chatbot was seamless.
Architecture Overview
User Question
↓
1. Convert question to embedding (OpenAI)
↓
2. Query Pinecone for similar vectors
↓
3. Retrieve relevant knowledge base chunks
↓
4. Send chunks + question to GPT-4
↓
5. Generate natural response
↓
User receives answerImplementation Steps
Step 1: Creating the Knowledge Base
I started by documenting everything about my professional experience in a structured markdown file:
# Ragamuffin Knowledge Base - Rockwell Windsor Rice ## Professional Profile Name: Rockwell Windsor Rice Title: Senior Software Engineer Experience: 10+ years ## Technical Expertise - Ruby on Rails (10+ years) - Next.js, React (4+ years) - AWS, GCP infrastructure ...The key was being comprehensive but structured—covering:
- Technical skills and expertise
- Work history and achievements
- Project details with metrics
- Professional philosophy
- Personal interests (for personality)
Step 2: Setting Up Pinecone
Created a Pinecone index with the correct configuration:
import { Pinecone } from '@pinecone-database/pinecone';
const pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
// Create index with 1536 dimensions (matches OpenAI embeddings)
await pinecone.createIndex({
name: 'chatbot-knowledge-base',
dimension: 1536,
metric: 'cosine',
spec: {
serverless: {
cloud: 'aws',
region: 'us-east-1'
}
}
});Important: The dimension (1536) must match your embedding model!
Step 3: Chunking and Embedding the Knowledge Base
Breaking the knowledge base into chunks and converting to embeddings:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Split knowledge base into chunks
function chunkText(text, maxChunkSize = 500) {
const chunks = [];
const sections = text.split('\n## '); // Split by headers
sections.forEach(section => {
if (section.length <= maxChunkSize) {
chunks.push(section);
} else {
// Further split large sections
const sentences = section.split('. ');
let currentChunk = '';
sentences.forEach(sentence => {
if ((currentChunk + sentence).length <= maxChunkSize) {
currentChunk += sentence + '. ';
} else {
chunks.push(currentChunk.trim());
currentChunk = sentence + '. ';
}
});
if (currentChunk) chunks.push(currentChunk.trim());
}
});
return chunks;
}
// Generate embeddings for each chunk
async function embedChunks(chunks) {
const embeddings = [];
for (const chunk of chunks) {
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: chunk,
});
embeddings.push({
id: `chunk-${embeddings.length}`,
values: response.data[0].embedding,
metadata: { text: chunk }
});
}
return embeddings;
}Key considerations for chunking:
- Keep chunks semantically meaningful (don't split mid-sentence)
- Include enough context in each chunk
- Store original text in metadata for retrieval
Step 4: Uploading to Pinecone
async function uploadToPinecone(embeddings) {
const index = pinecone.index('chatbot-knowledge-base');
// Upload in batches (Pinecone recommends batch sizes of 100)
const batchSize = 100;
for (let i = 0; i < embeddings.length; i += batchSize) {
const batch = embeddings.slice(i, i + batchSize);
await index.upsert(batch);
}
console.log(`Uploaded ${embeddings.length} vectors to Pinecone`);
}Step 5: Building the Query System
Creating the API endpoint to handle user questions:
// pages/api/chat.js
export default async function handler(req, res) {
const { question } = req.body;
try {
// 1. Convert question to embedding
const questionEmbedding = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: question,
});
// 2. Query Pinecone for similar vectors
const index = pinecone.index('chatbot-knowledge-base');
const queryResponse = await index.query({
vector: questionEmbedding.data[0].embedding,
topK: 5, // Get top 5 most relevant chunks
includeMetadata: true,
});
// 3. Extract relevant context
const context = queryResponse.matches
.map(match => match.metadata.text)
.join('\n\n');
// 4. Generate response with GPT-4
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `You are a helpful assistant representing Rockwell Windsor Rice, a senior software engineer. Answer questions based on the following context:\n\n${context}`
},
{
role: 'user',
content: question
}
],
temperature: 0.7,
max_tokens: 500,
});
res.status(200).json({
answer: completion.choices[0].message.content,
});
} catch (error) {
console.error('Chat error:', error);
res.status(500).json({ error: 'Failed to process question' });
}
}Step 6: Frontend Integration
Simple chat interface in Next.js:
// components/ChatBot.jsx
import { useState } from 'react';
export default function ChatBot() {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState('');
const [loading, setLoading] = useState(false);
const handleSubmit = async (e) => {
e.preventDefault();
if (!input.trim()) return;
// Add user message
const userMessage = { role: 'user', content: input };
setMessages(prev => [...prev, userMessage]);
setInput('');
setLoading(true);
try {
// Call API
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question: input }),
});
const data = await response.json();
// Add assistant response
const assistantMessage = { role: 'assistant', content: data.answer };
setMessages(prev => [...prev, assistantMessage]);
} catch (error) {
console.error('Error:', error);
} finally {
setLoading(false);
}
};
return (
<div className="chat-container">
<div className="messages">
{messages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
{msg.content}
</div>
))}
{loading && <div className="loading">Thinking...</div>}
</div>
<form onSubmit={handleSubmit}>
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask me anything about Rockwell's experience..."
/>
<button type="submit">Send</button>
</form>
</div>
);
}Challenges & Solutions
Challenge 1: Vector Dimension Mismatch
Problem: Got error Vector dimension 1536 does not match the dimension of the index 1024
Root cause: Initially created Pinecone index with wrong dimensions.
Solution:
// Delete old index
await pinecone.deleteIndex('chatbot-knowledge-base');
// Recreate with correct dimensions
await pinecone.createIndex({
name: 'chatbot-knowledge-base',
dimension: 1536, // Must match embedding model!
metric: 'cosine',
});Lesson: Always verify your embedding model's output dimensions before creating your vector database index.
Challenge 2: Chunking Strategy
Problem: Initial chunks were too large or too small, leading to poor retrieval.
Solution:
- Split by logical sections (headers)
- Keep chunks between 300-500 tokens
- Ensure each chunk has sufficient context
- Test with various question types
Lesson: Good chunking is critical for RAG performance. Too large = irrelevant info; too small = missing context.
Challenge 3: Response Quality
Problem: Early responses were too generic or missed key details.
Solution:
- Increased
topKfrom 3 to 5 (more context) - Improved system prompt with better instructions
- Added personality to knowledge base (not just dry facts)
- Tuned temperature for balance between creativity and accuracy
Performance & Cost
Response Times
- Embedding generation: ~200ms
- Pinecone query: ~100ms
- GPT-4 completion: ~2-3 seconds
- Total: ~2.5-3.5 seconds per query
Costs (Approximate)
- Pinecone: Free tier covers my usage (~1M queries/month)
- OpenAI Embeddings: $0.0001 per 1K tokens (negligible for my traffic)
- GPT-4: ~$0.03-0.06 per conversation (most expensive component)
Monthly estimate: ~$10-20 for moderate traffic
Results & Impact
What Works Well
✅ Answers questions about my experience accurately ✅ Provides specific examples and metrics when asked ✅ Maintains consistent professional tone ✅ Handles follow-up questions naturally ✅ Works 24/7 (no need for me to respond)
Use Cases
- Recruiters: Quick answers about tech stack and experience
- Potential clients: Learn about past projects and capabilities
- Collaborators: Understand my work style and interests
Sample Interactions
Q: "What's Rockwell's experience with Ruby on Rails?" A: "Rockwell has 10+ years of production Ruby on Rails experience, having built and maintained systems serving 100,000+ daily users. He was the sole technical owner for 15+ production applications over 6 years, managing everything from backend development to infrastructure optimization. Notable achievements include reducing AWS costs by 42.5% and maintaining 99.9% uptime for mission-critical systems."
Q: "Has Rockwell done any mentoring?" A: "Yes, Rockwell has extensive mentoring experience. He led instruction for 200+ students across multiple cohorts in Udacity's Front End Developer Nanodegree program, delivering 40+ hours of live technical instruction. He's also onboarded and mentored junior engineers in government contracting environments, conducting code reviews and pair programming sessions."
Key Takeaways
- RAG is powerful for personal knowledge bases: No need to fine-tune expensive models
- Vector databases are essential: Pinecone made this trivial to implement
- Chunking matters: Invest time in getting your chunking strategy right
- Keep costs in mind: GPT-4 calls add up; consider caching common questions
- Test extensively: Try edge cases and unusual questions
Resources
Have questions about building RAG chatbots? Feel free to ask my chatbot—it's literally built for this! 😄