Using Zero-Shot LLM-Generated Survey Data for Geographically Explicit Population Synthesis 文章

ArXiv CS.AI2026-05-28NEWSen作者: Taylor Anderson, Sara Von Hoene, Orhan Yagizer Cinar, Emma Von Hoene, Amira Roess, Andrew Crooks, Hamdi Kavak

摘要

arXiv:2605.27401v1 Announce Type: cross Abstract: There is a growing interest in utilizing synthetic populations for a diverse range of applications. At the same time, we are witnessing a tremendous growth in artificial intelligence in all walks of life. This paper evaluates whether zero-shot large language model (LLM)-generated health survey data can serve as inputs to a conventional iterative proportional fitting (IPF) workflow for geographically explicit population synthesis. Using the 2023 Behavioral Risk Factor Surveillance System (BRFSS), we generate synthetic survey records for the U.S. states of Colorado and Mississippi with GPT-4.1 and Gemini-2.5-Pro. We use the generated data in an IPF-based synthesis pipeline and evaluate the resulting census tract-level synthetic populations against external benchmarks. Results show both LLMs capture several major state-level contrasts, indicating zero-shot generation produces geographically differentiated survey data.