Vision Language Model for Clinical Decision Support System
DOI:
https://doi.org/10.65091/icicset.v2i1.11Abstract
In the field of assisted reproductive technology, the
differentiation of competent embryos ready to transfer is also
founded on the Gardner grading system, which is an assessment
of blastocysts that primarily focuses on morphology, but is
also largely subject to inter- and intra-observer variability. This
variability can decrease the integrity and continuity of the invitro
fertilization (IVF) process based on embryo classification to
determine viability and an improved probability of pregnancy. To
solve this problem, we present the application of Bootstrapping
Language Image Pre-training (BLIP), for blastocyst grading
along Gardner, and from that grading proposes an easy classification
scheme. Of 249 day-5 human blastocyst images and Gardner
grades, 204 images were used to fine-tune BLIP where BLIP
frames the grading task as medical image captioning. Select the
number of unique grades to determine that the number of model
outputs was greater than 10. The average training loss was 0.1010
and the fine-tuned model achieved a Recall-Oriented Understudy
for Gisting Evaluation score of 0.7391, Hamming accuracy of
0.8913 and a Metric for Evaluation of Translation with Explicit
ORdering of 0.3696 for the test set. These results demonstrate
that a fine-tuned vision language model can accurately approximate
complex morphological features in the blastocyst image and
predict their assigned grade classification.