-
LLMs Redefine Quality Panel Creation from Diverse Table Scans: Pipeline and Evaluation Using State-Level Early Car Adoption Tables
Multimodal LLMs are a breakthrough in converting historical tables into usable data. Currently, researchers must either manually digitize tables (time-consuming) or build specialized deep learning systems (requiring technical skills). LLMs allow researchers to use their domain expertise through simple English instructions instead of complex coding, adapting methods to particular document sets easily. Researchers demonstrate that an LLM-based pipeline produces highly accurate results, confirmed by comparing against human-processed data as a reference point. Testing on vehicle registration records, this method is 100× cheaper than outsourcing while reducing errors from 40% to 0.3%. Results match human-validated data quality, making historical economic research more accessible to non-technical experts.
tags:
Authors:
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
Status:
Working Paper
Updated:
Aug 2024