Large language Models (LLMs) are increasingly used for structured information extraction, but their tendency toward over-generation becomes a critical failure mode when the target schema is large relative to any single input. In this talk, we present a system architecture that addresses this challenge through three interlocking design principles: retrieval-based candidate narrowing, online LLM routing to domain-specialized experts, and agreement-gated ensembling with targeted adjudication. Rather than exposing a single model to the full output space, we decompose the problem into concept selection and value extraction, treating the former as a retrieval-and-verification task and the latter as constrained generation over schema partitions. We discuss how this pipeline emerged iteratively from error analysis, where we found that concept over-selection was the dominant source of error. The system was developed and evaluated in the context of the MediQA-SYNUR 2026 shared task on extracting structured clinical observations from nursing dictation transcripts, where it ranked first on the official leaderboard. However, the architectural patterns generalize to any setting where LLMs must produce structured outputs against a large, heterogeneous ontology.
Sy Hwang is a doctoral student and a developer specializing in natural language processing, deep learning, machine learning, and big data processing at the Penn Institute for Biomedical Informatics. He received his BS from Purdue University and MS from the University of New Haven. Prior to joining the Penn Institute for Biomedical Informatics, he co-founded two startups in the San Francisco Bay Area, including a consultancy specializing in deep learning-based solutions for early-stage startups. Outside of work, he enjoys cooking, traveling, playing the guitar and participating in hackathons.