Vision Language Model for Clinical Decision Support System

Authors

  • Mohan Bhandari Samriddhi College
  • Smriti KC Padmakanya Multiple Campus
  • Anam Afaq Nepal College of Information Technology
  • Loveleen Gaur University of the South Pacific Suva

DOI:

https://doi.org/10.65091/icicset.v2i1.11

Abstract

In the field of assisted reproductive technology, the
differentiation of competent embryos ready to transfer is also
founded on the Gardner grading system, which is an assessment
of blastocysts that primarily focuses on morphology, but is
also largely subject to inter- and intra-observer variability. This
variability can decrease the integrity and continuity of the invitro
fertilization (IVF) process based on embryo classification to
determine viability and an improved probability of pregnancy. To
solve this problem, we present the application of Bootstrapping
Language Image Pre-training (BLIP), for blastocyst grading
along Gardner, and from that grading proposes an easy classification
scheme. Of 249 day-5 human blastocyst images and Gardner
grades, 204 images were used to fine-tune BLIP where BLIP
frames the grading task as medical image captioning. Select the
number of unique grades to determine that the number of model
outputs was greater than 10. The average training loss was 0.1010
and the fine-tuned model achieved a Recall-Oriented Understudy
for Gisting Evaluation score of 0.7391, Hamming accuracy of
0.8913 and a Metric for Evaluation of Translation with Explicit
ORdering of 0.3696 for the test set. These results demonstrate
that a fine-tuned vision language model can accurately approximate
complex morphological features in the blastocyst image and
predict their assigned grade classification.

Downloads

Published

2025-12-23

How to Cite

[1]
M. Bhandari, S. KC, A. Afaq, and L. Gaur, “Vision Language Model for Clinical Decision Support System”, ICICSET2025, vol. 2, no. 1, Dec. 2025.