Compositionality is a fundamental characteristic of both human vision as well as natural language. It allows us to recognize new scenes and understand new sentences as a composition of previously seen atoms (e.g. objects in images or words in a sentence). Although scholars have spent decades injecting compositional priors into machine learning models, these priors have fallen away with the recent rise of large-scale models trained on internet scale data. In this talk, I will first formalize the notion of compositionality for vision and language by drawing on cognitive science literature. WIth this formalization, we evaluate whether today’s best models (including GPT-4V and Gemini) are compositional, uncovering that they perform close to random chance. Next, we will draw on additional priors from neuroscience and cognitive science experiments on human subjects to suggest architectural changes and training algorithms that encourage the emergence of compositionality. Next, we will utilize the same formalism to evaluate generative models, embodied AI, and tool-usage, showcasing that they too are not compositional and demonstrate mechanisms to improve them. Finally, we will end by showing that compositionality can also be used to identify cultural and linguistic biases in datasets and model behaviors, and provide suggestions for how cultural diversity might be one way to improve models.
11:45am - 12:15pm: Food and community socializing.
12:15pm - 1:15: Presentation with Q&A. Available hybrid via Zoom.
1:15pm - 2:00pm: Student meeting with speaker, held in CSE2/Gates 371.
Ranjay Krishna is an Assistant Professor at the Paul G. Allen School of Computer Science & Engineering. His research lies at the intersection of computer vision and human computer interaction. This research has received best paper, outstanding paper, and orals at CVPR, ACL, CSCW, NeurIPS, UIST, and ECCV, and has been reported by Science, Forbes, the Wall Street Journal, and PBS NOVA. His research has been supported by Google, Amazon, Toyota Motor, Sony, Cisco, Toyota Research Institute, NSF, ONR, and Yahoo. He holds a bachelor’s degree in Electrical & Computer Engineering and in Computer Science from Cornell University, a master’s degree in Computer Science from Stanford University and a Ph.D. in Computer Science from Stanford University.