A pipeline for turning digital collections into structured data -- an LLM assisted, IIIF-native tool to jump into working with sources like digitized print directories.
-
Updated
Jun 16, 2026 - Python
A pipeline for turning digital collections into structured data -- an LLM assisted, IIIF-native tool to jump into working with sources like digitized print directories.
Some basic data and text extraction from the New York City Directories
Turn Old City Directory scans into searchable data. Automated pipeline handles column detection, OCR processing, and accuracy evaluation for historical document digitization.
Tulsa City Directories for 1921 and 1922
Add a description, image, and links to the city-directories topic page so that developers can more easily learn about it.
To associate your repository with the city-directories topic, visit your repo's landing page and select "manage topics."