-
From Diverse Table Scans to Cohesive Datasets with LLMs
Multimodal LLMs do surprisingly well at processing scans of historical tables almost from start to finish. They especially shine when we need to combine tables from various sources with varying table headers for similar concepts, like car ownership. LLMs are not the "best" method for any given case, but they make creating cohesive historical datasets much easier.
tags:
Authors:
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
Status:
Slides
Updated:
Aug 2024
-
Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection
Text-based measures in economic research can be highly sensitive to model choice, potentially leading to contradictory conclusions. We demonstrate that domain-specific validation for model selection is critical for reliable analysis of technological change and innovation dynamics. As NLP models become increasingly powerful and accessible to economists, we can and should spend more time on selection and validation.
tags:
Authors:
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
Status:
Working paper coming soon (click "Read paper" for abstract)
Updated:
Aug 2024
-
One Threshold Doesn’t Fit All: Tailoring Machine Learning Predictions of Consumer Default for Lower-Income Areas
Machine learning (ML) models can help increase access to credit in lower-income areas if their introduction is paired with "fairness constraints," which are conceptually similar to the familiar Special Purpose Credit Programs (SPCP). Doing this at scale would require rethinking the protected attribute blindness requirements of the policy.
tags:
Authors:
Vitaly Meursault, Daniel Moulton, Larry Santucci, Nathan Schor
Status:
Accepted at JPAM
Updated:
May 2024
-
Operationalizing the Search for Less Discriminatory Alternatives in Fair Lending
The less discriminatory alternative (LDA) is a legal key provision for the US fair lending law. It requires lenders to adopt models that reduce disparate impact when they do not compromise their business interests. Systematically searching for such LDA models is quite challenging, however. Here, we show how a complex mixed integer programming algorithm allows us to set up the problem in a direct and intuitive way.
tags:
Authors:
Talia Gillis, Vitaly Meursault, Berk Ustun
Status:
Published at FAccT (2024)
Updated:
Jun 2024
-
PEAD.txt: Post-Earnings-Announcement Drift Using Text
Post-earnings announcement drift (PEAD) is one of the best known anomalies in Finance: Buy stocks with positive earnings surprises and sell stocks with negative earnings surprises, and you will keep making money on the drift. We show that you can generate a much larger drift (PEAD.txt) without even using the earnings number, but using the text of earnings call instead. There is much more to earnings calls than earnings number.
Authors:
Vitaly Meursault, Pierre Jinghong Liang, Bryan R. Routledge, Madeline Marco Scanlon
Status:
Published at JFQA (2023)
Updated:
Sep 2023