Building a Customer Support AI Chatbot from a Knowledge Base
To build a customer support chatbot that answers questions using your company's proprietary knowledge base, use a Retrieval-Augmented Generation (RAG) architecture. This approach retrieves relevant documents from your database first, then feeds them to a Large Language Model (LLM) as context to generate an accurate, grounded response.
Architecture Overview
The system consists of three main components:
- Knowledge Base: Your raw support documents, FAQs, or markdown files.
- Vector Database: A database (like ChromaDB or Pinecone) that stores vector embeddings of your document chunks to enable semantic search.
- LLM (OpenAI GPT): The generative model that synthesizes the retrieved context into a natural, conversational support response.
Step 1: Environment Setup
Install the required Python libraries. We will use chromadb as our vector database and openai for embeddings and text generation.
pip install openai chromadb
Step 2: Prepare and Index the Knowledge Base
This script reads your support documents, generates vector embeddings using OpenAI's text-embedding-3-small model, and stores them in a local ChromaDB collection.
import chromadb
from openai import OpenAI
# Initialize clients
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection(name="kb_collection")
# Sample knowledge base data
knowledge_base = [
{"id": "kb_01", "text": "To request a refund, navigate to Settings > Billing and click 'Request Refund'. Refunds take 5-10 business days to process."},
{"id": "kb_02", "text": "Our support hours are Monday through Friday, 9:00 AM to 5:00 PM EST. We are closed on weekends and major holidays."},
{"id": "kb_03", "text": "The Basic plan costs $19/month. The Pro plan costs $49/month and includes API access and priority support."}
]
def get_embedding(text):
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Index documents
for doc in knowledge_base:
embedding = get_embedding(doc["text"])
collection.add(
embeddings=[embedding],
documents=[doc["text"]],
ids=[doc["id"]]
)
print("Knowledge base indexed successfully.")
Step 3: Build the Retrieval and Generation Pipeline
This function processes user queries, retrieves the most relevant document chunk from the vector database, and uses GPT-4o-mini to draft the final response based strictly on that context.
def query_chatbot(user_query):
# 1. Generate embedding for the user query
query_vector = get_embedding(user_query)
# 2. Query the vector database for the closest match
results = collection.query(
query_embeddings=[query_vector],
n_results=1
)
# Extract the retrieved context
if results['documents'] and len(results['documents'][0]) > 0:
context = results['documents'][0][0]
else:
context = "No relevant information found."
# 3. Construct the prompt with system instructions and retrieved context
system_prompt = (
"You are a precise customer support assistant. Answer the user's question using ONLY the provided context. "
"If the context does not contain the answer, say 'I am sorry, but I do not have that information in my knowledge base.' "
"Do not make up or assume any facts.\n\n"
f"Context:\n{context}"
)
# 4. Generate response from the LLM
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
],
temperature=0.0 # Kept at 0.0 to minimize creative hallucinations
)
return response.choices[0].message.content
# Example Execution
user_question = "How long does a refund take?"
answer = query_chatbot(user_question)
print(f"Q: {user_question}\nA: {answer}")
Limitations and Production Considerations
- Hallucinations: Setting LLM
temperatureto0.0and using strict system prompts minimizes, but does not completely eliminate, the risk of the model generating incorrect information. - Context Window Limits: If your documents are large, you must implement a text-splitting strategy (e.g., recursive character splitting) to chunk documents into manageable sizes (e.g., 500-character segments) before indexing.
- Data Privacy: Sending queries and document chunks to OpenAI transmits data to external servers. If you handle highly sensitive or regulated data, consider self-hosting an open-source model (like Llama 3) and running embeddings locally.
Need this done fast? order it on Kwork.
I take on freelance fixes and builds in this area.