GuardLabs · Technical note

Build a Chatbot That Answers Questions from a PDF Manual

To build a chatbot that queries a PDF manual, you use a Retrieval-Augmented Generation (RAG) architecture. This pipeline extracts text from the PDF, breaks it into searchable chunks, stores it in a vector database, and uses a Large Language Model (LLM) to answer user queries based strictly on the retrieved document context.

Prerequisites and Environment Setup

Install the required Python libraries. This implementation uses LangChain for orchestration, Chroma as the vector database, PyPDF for parsing, and OpenAI for embeddings and generation.

pip install langchain langchain-openai chromadb pypdf tiktoken

Set your OpenAI API key as an environment variable before running the script:

import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

Step-by-Step Implementation

Place your PDF manual (e.g., manual.pdf) in your project directory. The script below loads the document, splits it into chunks to fit within the LLM's context window, generates vector embeddings, and executes a targeted query.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# 1. Load the PDF manual
loader = PyPDFLoader("manual.pdf")
docs = loader.load()

# 2. Split text into chunks with overlap to maintain context
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# 3. Embed chunks and store them in an in-memory vector database
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 4. Define the LLM and system prompt
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise.\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

# 5. Create the retrieval chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# 6. Query the chatbot
response = rag_chain.invoke({"input": "How do I perform a factory reset?"})
print("Answer:", response["answer"])

Limitations and Production Considerations

While the basic RAG pipeline works for simple text-based PDFs, moving this system to production requires addressing several structural limitations:

Complex Layouts: Basic PDF loaders like PyPDF struggle with multi-column pages, embedded tables, and images. For highly visual manuals, use advanced parsers like LlamaParse or Azure Document Intelligence to convert tables into Markdown format before chunking.
Hallucinations: LLMs can still hallucinate if the answer is missing from the manual. To mitigate this, enforce strict prompt constraints (e.g., "Do not use external knowledge; rely only on the provided context").
Database Persistence: The code above uses an in-memory Chroma database that clears when the script stops. For production, persist the vector database to disk or use a managed cloud vector database like Pinecone, Qdrant, or Milvus.
Chunking Strategy: Fixed-size chunking can split critical sentences in half. Consider semantic chunking or parent-document retrieval to ensure the LLM receives complete, coherent context.

Need this done fast? order it on Kwork.

Published 2026-06-23 2 min read All articles EN / RU / ES

Need help with this?

I take on freelance fixes and builds in this area.