This transcript has been edited for clarity.
Hi, my name is Cecilia S. Lee, MD, MS, from Washington University in St. Louis, and I had the privilege of presenting this talk at the American Glaucoma Society meeting in February 2026. The talk focused on how artificial intelligence (AI) may make a difference in the surgical care of glaucoma patients.
I believe one of the major challenges in caring for glaucoma patients today is the wide variability in disease progression among patients. [Slide 1, 0:45] We are unable to reliably predict who will be a fast progressor, so the timing of surgery is inevitably reactive rather than proactive. In other words, we often decide to proceed with surgery only after vision loss or visual field loss has already occurred.
The first presentation covered our initial study [slide 2, 1:10], in which we extracted more than 1.7 million perimetry points from over 4,000 patients, representing approximately 32,000 Humphrey visual fields from the University of Washington. Our goal was to predict what a future Humphrey visual field would look like based on a single baseline visual field.
We used time as the undeniable ground truth, meaning the training relied on the input of a single visual field and an output that asked what the visual field would look like 2.5 years later, 3.3 years later, and so on. Not surprisingly, the model performed less well as the time interval between the baseline visual field and the future prediction increased. [Slide 3, 2:03] However, the study showed that even based on a single visual field, a deep learning model was able to predict what the future Humphrey visual field might look like.
The second project I presented aimed to reflect on what clinicians do in practice. We often merge multimodal information—such as disc photos, cup-to-disc ratio, OCT, and visual fields—to decide whether a patient is stable or progressing. This work focused on developing a policy-driven multimodal model in which information from disc photos and OCT is fused to predict the visual field, essentially imitating how clinicians make decisions in practice.
We showed that when disc photos and OCT are fused, and when each modality is allowed to contribute to visual field prediction based on its performance level, the resulting prediction policy network performs much better than relying on a single modality alone. [slide 4, 3:35] One nice thing about this is that the model can also indicate whether a prediction was based primarily on disc photography or OCT, which allows clinicians to better understand where the prediction is coming from. This ability improves interpretability and supports the use of model outputs in patient management.
The talk also highlighted the need for large longitudinal multimodal data sets to train progression models of this kind. Initiatives focused on building harmonized, AI-ready data sets at scale will be critical to enabling this framework. These data sets will allow us to train multimodal prediction models that can better and earlier guide treatment decisions and improve clinical outcomes. GP







