Python RAG PDF Chat using Google Gemini – Retrieval Augmented Generation Guide Python

Haris Bin Nasir Avatar

·

·

Learn how to build a PDF chatbot with Python using Gemini API, LangChain, and ChromaDB. This tutorial walks you through creating a chatbot that can interact with and extract information from PDF documents.

If you prefer watching a video tutorial here is a link to that.

Prerequisites

Before we start, ensure you have the following libraries installed for building a PDF chatbot with Python:

  • Gemini API: A generative AI model by Google, useful for creating conversational AI.
  • LangChain: A library for building applications with LLMs through composability.
  • ChromaDB: An open-source vector database for embedding-based search.
  • PyPDF: A PDF processing library for Python.

You can install these libraries using pip:

pip install google-generativeai langchain-community chromadb pypdf

Setting Up the Environment

First, we need to import necessary libraries and set up the API key for Gemini to build our PDF chatbot with Python.

import os
import signal
import sys
import google.generativeai as genai
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

Getting the Gemini API Key

To interact with Google’s Gemini API, you need an API key. Here’s how to get it:

  • Visit the Google Gemini API page and apply for an API key.
  • Google offers a generous free tier with unlimited requests, limited to 60 requests per minute. For more than 60 requests per minute, you might need to opt for a pay-as-you-go plan.

Steps to create an API key:

  • On the Google AI for Developers page, click on the “Learn more about the Gemini API” button.
  • Next, click on “Get API key in Google AI Studio”.
  • Click on the “Get API Key” button positioned on the top left of the screen
  • Click on the “Create API” button and select a project to generate an API.
GEMINI_API_KEY = "your_api_key_here"

Handling Signals

To gracefully handle interruptions, we’ll set up a signal handler.

def signal_handler(sig, frame):
    print('\nThanks for using Gemini. :)')
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

Generating RAG Prompts

We need to create a function that generates prompts for our Gemini model using the query and context for our PDF chatbot with Python.

def generate_rag_prompt(query, context):
    escaped = context.replace("'", "").replace('"', "").replace("\n", " ")
    prompt = ("""
    You are a helpful and informative bot that answers questions using text from the reference context included below. 
    Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
    However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
    strike a friendly and conversational tone. 
    If the context is irrelevant to the answer, you may ignore it.
                QUESTION: '{query}'
                CONTEXT: '{context}'
              
              ANSWER:
              """).format(query=query, context=context)
    return prompt

Fetching Relevant Context

To provide meaningful answers, we’ll fetch the relevant context from our ChromaDB database.

def get_relevant_context_from_db(query):
    context = ""
    embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vector_db = Chroma(persist_directory="./chroma_db_nccn", embedding_function=embedding_function)
    search_results = vector_db.similarity_search(query, k=6)
    for result in search_results:
        context += result.page_content + "\n"
    return context

Generating Answers with Gemini

Using the Gemini API, we’ll generate answers based on the provided prompts.

def generate_answer(prompt):
    genai.configure(api_key=GEMINI_API_KEY)
    model = genai.GenerativeModel(model_name='gemini-pro')
    answer = model.generate_content(prompt)
    return answer.text

Welcome Message

Generate a welcome message to introduce the PDF chatbot.

welcome_text = generate_answer("Can you quickly introduce yourself")
print(welcome_text)

Main Chat Loop

Create a loop to continuously accept queries and provide answers.

while True:
    print("-----------------------------------------------------------------------\n")
    print("What would you like to ask?")
    query = input("Query: ")
    context = get_relevant_context_from_db(query)
    prompt = generate_rag_prompt(query=query, context=context)
    answer = generate_answer(prompt=prompt)
    print(answer)

Generating Embeddings

Now, let’s generate embeddings for the PDF documents. This will help us in fetching relevant context.

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

loaders = [PyPDFLoader('./report.pdf')]

docs = []

for file in loaders:
    docs.extend(file.load())

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(docs)
embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})

vectorstore = Chroma.from_documents(docs, embedding_function, persist_directory="./chroma_db_nccn")

print(vectorstore._collection.count())

Running the PDF Chatbot Application

To run the PDF chatbot application, execute the script. You will be prompted to input your queries, and the bot will respond with answers based on the text extracted from the PDF.

Get Source Code for free:

Conclusion

Building a PDF chatbot with Python using Gemini API, LangChain, and ChromaDB allows you to extract meaningful insights and interact with the content of any PDF document. By following the steps outlined in this tutorial, you can create a powerful and user-friendly chatbot that can handle various queries and provide comprehensive responses based on the text within the PDF. This project not only demonstrates the capabilities of these technologies but also opens up new possibilities for automating document analysis and enhancing user interaction with text-based data.

Happy Coding….!!!

Leave a Reply

Your email address will not be published. Required fields are marked *