Large language models are increasingly used in interactive question-answering systems for finding scientific and consumer health information. Yet in many real-world cases, a clear answer may not exist. In this talk, I examine cases where models lack sufficient knowledge to answer—and why they often do anyway. Drawing on examples from consumer health QA, I show how models struggle when evidence is missing or poorly matched to the question, and how choices such as answer formats or prompt framing can induce overconfidence. This is supported by our empirical findings characterizing LLM refusal behavior in science and medical QA, showing that refusal performance varies substantially by question type and the contextual information that is provided or is available to the model. Unfortunately, refusal is rarely represented explicitly in most existing benchmarks and rarely assessed alongside its interactional outcomes: how users act upon refusal and/or uncertainty in model responses. Based on our prior work in meta-evaluation and user-centered studies of AI-mediated plain language communication, I close with a discussion of how to address this evaluation gap.
| 11:45am - 12:15pm: | Food and community socializing. |
| 12:15pm - 1:15pm: | Presentation with Q&A. Available hybrid via Zoom. |
| 1:30pm - 2:15pm: | Student meeting with speaker, held in the same room, following the talk. |
Lucy Lu Wang is an Assistant Professor at the University of Washington Information School, where she leads the Language Accessibility Research (LARCH) lab. She holds adjunct appointments in the Paul G. Allen School of Computer Science & Engineering, Department of Biomedical Informatics & Medical Education, and Department of Human Centered Design & Engineering at the University of Washington, and is a Research Scientist at the Allen Institute for AI (Ai2). Her work spans scholarly document understanding, document accessibility, scientific evidence synthesis, and health communication. She focuses on developing language technologies to improve access to and understanding of information in high-expertise domains like science and healthcare, with an emphasis on dataset development and evaluation practices. Her work on supplement interaction detection, document accessibility, and academic publishing trends have been featured in media such as Geekwire, Boing Boing, Axios, VentureBeat, and the New York Times.