Create a Receipt Parsing Using OCR and a Large Language Model

Ferry Djaja
10 min readSep 29, 2023

In this tutorial, I will go through how I leverage an OCR to capture data from receipts and then leverages a Large Language Model (LLM) to extract pertinent details such as the total amount, date and time of the receipt, and additional relevant information.

To perform OCR, I will utilize the docTR tool from Mindee as outlined below.

To retrieve the information from the receipt, I will use Azure’s OpenAI capabilities.

Construct the OCR Output Data

Let’s begin the installation process for docTR and the necessary libraries on your machine. I will not going through the detail of the installation process as you can find comprehensive instructions in the provided Git repository

Let’s test the installation if is successful without error by executing this below code with the provided receipt image in Jpeg.

import os
import json

# Let's pick the desired backend
# os.environ['USE_TF'] = '1'
os.environ['USE_TORCH'] = '1'

import matplotlib.pyplot as plt

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

# Read the file
doc = DocumentFile.from_images("receipt.jpg")
print(f"Number of pages: {len(doc)}")

If there is no error, you will get this output:

Number of pages: 1

Let’s proceed with the instantiation of a pre-trained model.

# Instantiate a pretrained model
predictor = ocr_predictor(pretrained=True)

Export the output in JSON format.

result = predictor(doc)

# JSON export
json_export = result.export()
print(json_export)

--

--