Building a RAG Chatbot with Pinecone and OpenAI: An Engineer's Guide

30 min read

Building a RAG-Powered Chatbot for My Portfolio Site

A technical deep-dive into creating an AI-powered assistant using Retrieval-Augmented Generation

The Problem

I wanted my portfolio site to do more than just display static information. I needed a way for recruiters, potential clients, and collaborators to learn about my work interactively—without me having to answer the same questions repeatedly.

The solution? A RAG (Retrieval-Augmented Generation) chatbot that could answer questions about my experience, skills, and background intelligently.

I also wanted to have some fun integrating my pups into the action so you will notice there are two ambassadors of the site in the chat, Gus and Mitch, who are choosen randomly.

What is RAG?

Retrieval-Augmented Generation combines two powerful concepts:

Retrieval: Finding relevant information from a knowledge base
Generation: Using an LLM to create natural, contextual responses

Instead of fine-tuning an entire language model on my data (expensive and overkill), RAG lets me:

Store my information in a vector database
Retrieve relevant chunks when someone asks a question
Feed those chunks to an LLM for a natural response

The Tech Stack

Here's what I used to build this:

Core Technologies

Frontend: Next.js (React framework)
Vector Database: Pinecone (for storing embeddings)
Embeddings: OpenAI's text-embedding-ada-002 (1536 dimensions)
LLM: OpenAI's GPT-4 (for generating responses)
Language: TypeScript/JavaScript

Why These Choices?

Pinecone: Managed vector database with excellent performance and simple API. No infrastructure management needed.

OpenAI Embeddings: Best-in-class semantic search. The text-embedding-ada-002 model produces 1536-dimensional vectors that capture meaning exceptionally well.

Next.js: My portfolio site was already built with Next.js, so integrating the chatbot was seamless.

Architecture Overview

User Question
    ↓
1. Convert question to embedding (OpenAI)
    ↓
2. Query Pinecone for similar vectors
    ↓
3. Retrieve relevant knowledge base chunks
    ↓
4. Send chunks + question to GPT-4
    ↓
5. Generate natural response
    ↓
User receives answer

Implementation Steps

Step 1: Creating the Knowledge Base

I started by documenting everything about my professional experience in a structured markdown file:

# Ragamuffin Knowledge Base - Rockwell Windsor Rice ## Professional Profile Name: Rockwell Windsor Rice Title: Senior Software Engineer Experience: 10+ years ## Technical Expertise - Ruby on Rails (10+ years) - Next.js, React (4+ years) - AWS, GCP infrastructure ...

The key was being comprehensive but structured—covering:

Technical skills and expertise
Work history and achievements
Project details with metrics
Professional philosophy
Personal interests (for personality)

Step 2: Setting Up Pinecone

Created a Pinecone index with the correct configuration:

import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY,
});

// Create index with 1536 dimensions (matches OpenAI embeddings)
await pinecone.createIndex({
  name: 'chatbot-knowledge-base',
  dimension: 1536,
  metric: 'cosine',
  spec: {
    serverless: {
      cloud: 'aws',
      region: 'us-east-1'
    }
  }
});

Important: The dimension (1536) must match your embedding model!

Step 3: Chunking and Embedding the Knowledge Base

Breaking the knowledge base into chunks and converting to embeddings:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Split knowledge base into chunks
function chunkText(text, maxChunkSize = 500) {
  const chunks = [];
  const sections = text.split('\n## '); // Split by headers
  
  sections.forEach(section => {
    if (section.length <= maxChunkSize) {
      chunks.push(section);
    } else {
      // Further split large sections
      const sentences = section.split('. ');
      let currentChunk = '';
      
      sentences.forEach(sentence => {
        if ((currentChunk + sentence).length <= maxChunkSize) {
          currentChunk += sentence + '. ';
        } else {
          chunks.push(currentChunk.trim());
          currentChunk = sentence + '. ';
        }
      });
      
      if (currentChunk) chunks.push(currentChunk.trim());
    }
  });
  
  return chunks;
}

// Generate embeddings for each chunk
async function embedChunks(chunks) {
  const embeddings = [];
  
  for (const chunk of chunks) {
    const response = await openai.embeddings.create({
      model: 'text-embedding-ada-002',
      input: chunk,
    });
    
    embeddings.push({
      id: `chunk-${embeddings.length}`,
      values: response.data[0].embedding,
      metadata: { text: chunk }
    });
  }
  
  return embeddings;
}

Key considerations for chunking:

Keep chunks semantically meaningful (don't split mid-sentence)
Include enough context in each chunk
Store original text in metadata for retrieval

Step 4: Uploading to Pinecone

async function uploadToPinecone(embeddings) {
  const index = pinecone.index('chatbot-knowledge-base');
  
  // Upload in batches (Pinecone recommends batch sizes of 100)
  const batchSize = 100;
  for (let i = 0; i < embeddings.length; i += batchSize) {
    const batch = embeddings.slice(i, i + batchSize);
    await index.upsert(batch);
  }
  
  console.log(`Uploaded ${embeddings.length} vectors to Pinecone`);
}

Step 5: Building the Query System

Creating the API endpoint to handle user questions:

// pages/api/chat.js
export default async function handler(req, res) {
  const { question } = req.body;
  
  try {
    // 1. Convert question to embedding
    const questionEmbedding = await openai.embeddings.create({
      model: 'text-embedding-ada-002',
      input: question,
    });
    
    // 2. Query Pinecone for similar vectors
    const index = pinecone.index('chatbot-knowledge-base');
    const queryResponse = await index.query({
      vector: questionEmbedding.data[0].embedding,
      topK: 5, // Get top 5 most relevant chunks
      includeMetadata: true,
    });
    
    // 3. Extract relevant context
    const context = queryResponse.matches
      .map(match => match.metadata.text)
      .join('\n\n');
    
    // 4. Generate response with GPT-4
    const completion = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: [
        {
          role: 'system',
          content: `You are a helpful assistant representing Rockwell Windsor Rice, a senior software engineer. Answer questions based on the following context:\n\n${context}`
        },
        {
          role: 'user',
          content: question
        }
      ],
      temperature: 0.7,
      max_tokens: 500,
    });
    
    res.status(200).json({
      answer: completion.choices[0].message.content,
    });
    
  } catch (error) {
    console.error('Chat error:', error);
    res.status(500).json({ error: 'Failed to process question' });
  }
}

Step 6: Frontend Integration

Simple chat interface in Next.js:

// components/ChatBot.jsx
import { useState } from 'react';

export default function ChatBot() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
  
  const handleSubmit = async (e) => {
    e.preventDefault();
    if (!input.trim()) return;
    
    // Add user message
    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setLoading(true);
    
    try {
      // Call API
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ question: input }),
      });
      
      const data = await response.json();
      
      // Add assistant response
      const assistantMessage = { role: 'assistant', content: data.answer };
      setMessages(prev => [...prev, assistantMessage]);
      
    } catch (error) {
      console.error('Error:', error);
    } finally {
      setLoading(false);
    }
  };
  
  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map((msg, idx) => (
          <div key={idx} className={`message ${msg.role}`}>
            {msg.content}
          </div>
        ))}
        {loading && <div className="loading">Thinking...</div>}
      </div>
      
      <form onSubmit={handleSubmit}>
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask me anything about Rockwell's experience..."
        />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

Challenges & Solutions

Challenge 1: Vector Dimension Mismatch

Problem: Got error Vector dimension 1536 does not match the dimension of the index 1024

Root cause: Initially created Pinecone index with wrong dimensions.

Solution:

// Delete old index
await pinecone.deleteIndex('chatbot-knowledge-base');

// Recreate with correct dimensions
await pinecone.createIndex({
  name: 'chatbot-knowledge-base',
  dimension: 1536, // Must match embedding model!
  metric: 'cosine',
});

Lesson: Always verify your embedding model's output dimensions before creating your vector database index.

Challenge 2: Chunking Strategy

Problem: Initial chunks were too large or too small, leading to poor retrieval.

Solution:

Split by logical sections (headers)
Keep chunks between 300-500 tokens
Ensure each chunk has sufficient context
Test with various question types

Lesson: Good chunking is critical for RAG performance. Too large = irrelevant info; too small = missing context.

Challenge 3: Response Quality

Problem: Early responses were too generic or missed key details.

Solution:

Increased topK from 3 to 5 (more context)
Improved system prompt with better instructions
Added personality to knowledge base (not just dry facts)
Tuned temperature for balance between creativity and accuracy

Performance & Cost

Response Times

Embedding generation: ~200ms
Pinecone query: ~100ms
GPT-4 completion: ~2-3 seconds
Total: ~2.5-3.5 seconds per query

Costs (Approximate)

Pinecone: Free tier covers my usage (~1M queries/month)
OpenAI Embeddings: $0.0001 per 1K tokens (negligible for my traffic)
GPT-4: ~$0.03-0.06 per conversation (most expensive component)

Monthly estimate: ~$10-20 for moderate traffic

Results & Impact

What Works Well

✅ Answers questions about my experience accurately ✅ Provides specific examples and metrics when asked ✅ Maintains consistent professional tone ✅ Handles follow-up questions naturally ✅ Works 24/7 (no need for me to respond)

Use Cases

Recruiters: Quick answers about tech stack and experience
Potential clients: Learn about past projects and capabilities
Collaborators: Understand my work style and interests

Sample Interactions

Q: "What's Rockwell's experience with Ruby on Rails?" A: "Rockwell has 10+ years of production Ruby on Rails experience, having built and maintained systems serving 100,000+ daily users. He was the sole technical owner for 15+ production applications over 6 years, managing everything from backend development to infrastructure optimization. Notable achievements include reducing AWS costs by 42.5% and maintaining 99.9% uptime for mission-critical systems."

Q: "Has Rockwell done any mentoring?" A: "Yes, Rockwell has extensive mentoring experience. He led instruction for 200+ students across multiple cohorts in Udacity's Front End Developer Nanodegree program, delivering 40+ hours of live technical instruction. He's also onboarded and mentored junior engineers in government contracting environments, conducting code reviews and pair programming sessions."

Key Takeaways

RAG is powerful for personal knowledge bases: No need to fine-tune expensive models
Vector databases are essential: Pinecone made this trivial to implement
Chunking matters: Invest time in getting your chunking strategy right
Keep costs in mind: GPT-4 calls add up; consider caching common questions
Test extensively: Try edge cases and unusual questions

Resources

Have questions about building RAG chatbots? Feel free to ask my chatbot—it's literally built for this! 😄