CITP Seminar: Decomposing Wage Gaps with a Foundation Model of Labor History

Date & Time Apr 26 2024 12:30 PM - 1:30 PM
Bendheim House
Keyon Vafa
Audience Restricted to Princeton University

Co-sponsored by CITP and Princeton Language and Intelligence

In-person attendance is open to Princeton University faculty, staff, students and alumni. Lunch will be available at noon.

This talk is also available via Zoom.

Social scientists frequently perform statistical decompositions of wage gaps, attributing group differences in wages to group differences in worker characteristics. Since the survey datasets used to estimate these decompositions are small, the included characteristics are typically low-dimensional, e.g. summary statistics about job history. These low-dimensional summaries risk inducing biased estimates. To mitigate this bias, we adapt machine learning methods to summarize worker histories with rich, low-dimensional representations that are learned from data. We take a “foundation model” approach, first training representations on a dataset of passively-collected resumes before fine-tuning them on the small survey datasets used for wage gap estimation. We discuss an omitted variable bias that can arise in this setting and propose a fine-tuning approach to minimize it. On data from the Panel Study of Income Dynamics, we show that full worker history explains a substantial portion of wage gaps that are unexplained by standard econometric techniques.

With collaborators Susan Athey and David Blei.


Keyon Vafa is a postdoctoral fellow at Harvard University as part of the Harvard Data Science Initiative. His research focuses on developing machine learning methods to help answer economic questions and also using insights from economics to improve machine learning models. He completed his Ph.D. in computer science at Columbia University in 2023, where he was advised by David Blei. During his Ph.D. he was an NSF GRFP Fellow and Cheung-Kong Innovation Doctoral Fellow. He is a member of the Early Career Board of the Harvard Data Science Review.

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

If you need an accommodation for a disability please contact Jean Butcher at at least one week before the event.