-
From Diverse Table Scans to Cohesive Datasets with LLMs
Multimodal LLMs do surprisingly well at processing scans of historical tables almost from start to finish. They especially shine when we need to combine tables from various sources with varying table headers for similar concepts, like car ownership. LLMs are not the "best" method for any given case, but they make creating cohesive historical datasets much easier.
tags:
Authors:
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
Status:
Slides
Updated:
Aug 2024
-
Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection
Text-based measures in economic research can be highly sensitive to model choice, potentially leading to contradictory conclusions. We demonstrate that domain-specific validation for model selection is critical for reliable analysis of technological change and innovation dynamics. As NLP models become increasingly powerful and accessible to economists, we can and should spend more time on selection and validation.
tags:
Authors:
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
Status:
Working paper coming soon (click "Read paper" for abstract)
Updated:
Aug 2024
-
PEAD.txt: Post-Earnings-Announcement Drift Using Text
Post-earnings announcement drift (PEAD) is one of the best known anomalies in Finance: Buy stocks with positive earnings surprises and sell stocks with negative earnings surprises, and you will keep making money on the drift. We show that you can generate a much larger drift (PEAD.txt) without even using the earnings number, but using the text of earnings call instead. There is much more to earnings calls than earnings number.
Authors:
Vitaly Meursault, Pierre Jinghong Liang, Bryan R. Routledge, Madeline Marco Scanlon
Status:
Published at JFQA (2023)
Updated:
Sep 2023