Create a RAG Agent with LangGraph to Extract the information from a PDF File

Ferry Djaja
6 min readSep 23, 2024

In this blog, we will build a simple agent to extract the information from a PDF file with LangGraph. We will be using GPT-4o to extract the text, images and other layout information from PDFs and create a tool in LangGraph to extract the information. PDFs frequently feature multimedia elements like images and tables, alongside complex layouts. Our objective is to design an extraction agent that can efficiently retrieve any relevant data from these visually rich documents. Let’s get started !

First we will import the required libraries:

from io import BytesIO
import pypdfium2 as pdfium
import backoff
import asyncio
import json
import os
import base64
from PIL import Image
import operator

from typing import Annotated, Sequence, TypedDict, Literal

from openai import OpenAIError
from openai import AsyncOpenAI, OpenAI

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.messages import AnyMessage, BaseMessage, HumanMessage, SystemMessage
from langchain.tools.retriever import create_retriever_tool
from langchain_community.vectorstores import FAISS

from langgraph.graph.message import add_messages
from langgraph.graph import START, END, StateGraph…

--

--

Responses (1)