Table Extraction from Images and Information Retrieval using Deep Learning and a Large Language Model

Ferry Djaja
6 min readOct 10, 2023

In this tutorial, I will guide you through the process of extracting tables and their line items using Deep Learning libraries, OCR, and ultimately leveraging the LLM (Large Language Model) to extract the line items within the table.

To extract the items within the table, we will perform the following steps:

  • Utilize Deep Learning for table detection and extraction.
  • Employ OCR Deep Learning and a Large Language Model for table line items extraction.

Utilize Deep Learning for Table Detection and Extraction

A while back, I wrote an article on extracting table data using RetinaNet with Keras. I need to annotate the tables and then train the model using RetinaNet.

To achieve this goal and optimize performance, I’ll leverage the YOLO table extraction from Hugging Face.

Install the library.

!pip install ultralyticsplus==0.0.23 ultralytics==8.0.21

Load the model and configure the parameters, then input the image and execute the inference.

from ultralyticsplus import YOLO, render_result

# load model
model = YOLO('keremberke/yolov8m-table-extraction')

# set model parameters
model.overrides['conf'] = 0.25 # NMS confidence threshold
model.overrides['iou'] = 0.45 # NMS IoU threshold
model.overrides['agnostic_nms'] = False # NMS class-agnostic
model.overrides['max_det'] = 1000 # maximum number of detections per image

# set image
image = '/content/table.png'

# perform inference
results = model.predict(image)

# observe results
render =…