Extract Information from Non-English PDFs Using GPT-4o and LangGraph
6 min readOct 20, 2024
In this blog post, I want to show you how to get information from PDF files that have content in languages other than English. While it’s usually easy to extract data from English documents, this article goes further by showing how to extract data from non-English documents, focusing on those in Burmese and other languages. P.S. I am not familiar with Burmese languages.
To get information from PDF files, we’ll use GPT-4o from OpenAI. First, we’ll convert the PDF pages into images like I did in my previous blogs here.