How to Build a Chatbot That Answers From Your Company Documents
To build a chatbot that answers questions using your company's proprietary documents, you need to implement Retrieval-Augmented Generation (RAG). This architecture extracts relevant information from your documents and passes it to a Large Language Model (LLM) to generate accurate, context-aware answers without retraining the model.
Prerequisites
Install the required Python libraries. This setup uses LangChain, FAISS (a local vector database), and OpenAI:
pip install langchain langchain-openai langchain-community faiss-cpu tiktoken
Set your OpenAI API key as an environment variable in your terminal:
export OPENAI_API_KEY="your-api-key-here"
Step 1: Load and Split Your Documents
Place your company documents (such as PDFs, TXT, or Markdown files) into a local directory named /docs. Large documents must be broken down into smaller chunks so they fit within the LLM's context window and allow for precise information retrieval.
Step 2: Create Embeddings and Store in a Vector Database
Convert the text chunks into vector embeddings using OpenAI's embedding model, then store them in a local FAISS vector database for fast similarity searching.
Step 3: Build the Retrieval Chain
Set up the retrieval pipeline. When a user asks a question, the system searches the vector database for the most relevant document chunks and passes them alongside the user's prompt to the LLM.
Complete Python Implementation
Save the following code as chatbot.py and run it to query your documents:
import os
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# 1. Load documents from a local directory
# Assumes you have a folder named 'docs' containing text files
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()
# 2. Split documents into manageable chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)
# 3. Embed chunks and store in local FAISS vector database
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(docs, embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
# 4. Define the LLM and the system prompt
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know.\n\n"
"{context}"
)
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
])
# 5. Create the retrieval-augmented generation (RAG) chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
# 6. Query the chatbot
query = "What is our company's remote work policy?"
response = rag_chain.invoke({"input": query})
print("Answer:", response["answer"])
Key Production Considerations
- Data Privacy: Sending documents to OpenAI's API means your data leaves your local infrastructure. If you handle highly sensitive or regulated data, replace
ChatOpenAIwith a local LLM (like Llama 3 via Ollama) and use a local embedding model. - Document Formats: For PDFs, replace
TextLoaderwithPyPDFLoader(requirespip install pypdf). For Microsoft Word files, useDocx2txtLoader. - Vector Database Scaling: FAISS runs in-memory and is ideal for prototyping or small document sets. For millions of documents, migrate to a managed cloud vector database like Pinecone, Milvus, or Qdrant.
Need this done fast? order a RAG assistant on Kwork.
I take on freelance fixes and builds in this area.