TY - JOUR
T1 - ArtInsight
T2 - A Multimodal AI Framework for Interpreting Children's Drawings and Enhancing Emotional Understanding
AU - Shah, Uzair
AU - Khan, Naseem
AU - Alzubaidi, Mahmood
AU - Agus, Marco
AU - Househ, Mowafa
PY - 2025/5/15
Y1 - 2025/5/15
N2 - Recent advancements in multimodal image-to-text models have greatly enhanced the interpretation of children's drawings for emotional understanding purposes. This paper introduces a framework that analyzes these drawings to fully automatically generate detailed reports, covering art descriptions, emotional themes, assessments, and personalized recommendations. Our approach involved annotating 5,000 images by exploiting a Large Language Model (ChatGPT) and by fine-tuning the BLIP (Bootstrapping Language-Image Pre-training) multimodal model. We performed fine-tuning in two steps: 1) we applied Low-Rank Adaptation (LoRA) to the image encoder to preserve its pre-trained features while adapting it to our task, and 2) we refined the text decoder to capture the language patterns needed for comprehensive assessments. The system processes children's artwork as input, using multimodal image-to-text techniques to derive meaningful insights. Although these reports are initial evaluations rather than formal clinical assessments, they provide a valuable starting point for understanding children's emotional and psychological states. This tool can assist art therapists, educators, and parents in gaining a deeper understanding of children's inner worlds. Our research highlights the intersection of artificial intelligence and child psychology, showing how technology can complement human expertise in nurturing children's emotional well-being. By offering a structured, AI-driven analysis of children's drawings, this framework creates new opportunities for early intervention, personalized support, and enhanced communication between children and their caregivers. The impact of this work may extend beyond individual assessments, potentially informing broader strategies in child development, art therapy, and educational practices.
AB - Recent advancements in multimodal image-to-text models have greatly enhanced the interpretation of children's drawings for emotional understanding purposes. This paper introduces a framework that analyzes these drawings to fully automatically generate detailed reports, covering art descriptions, emotional themes, assessments, and personalized recommendations. Our approach involved annotating 5,000 images by exploiting a Large Language Model (ChatGPT) and by fine-tuning the BLIP (Bootstrapping Language-Image Pre-training) multimodal model. We performed fine-tuning in two steps: 1) we applied Low-Rank Adaptation (LoRA) to the image encoder to preserve its pre-trained features while adapting it to our task, and 2) we refined the text decoder to capture the language patterns needed for comprehensive assessments. The system processes children's artwork as input, using multimodal image-to-text techniques to derive meaningful insights. Although these reports are initial evaluations rather than formal clinical assessments, they provide a valuable starting point for understanding children's emotional and psychological states. This tool can assist art therapists, educators, and parents in gaining a deeper understanding of children's inner worlds. Our research highlights the intersection of artificial intelligence and child psychology, showing how technology can complement human expertise in nurturing children's emotional well-being. By offering a structured, AI-driven analysis of children's drawings, this framework creates new opportunities for early intervention, personalized support, and enhanced communication between children and their caregivers. The impact of this work may extend beyond individual assessments, potentially informing broader strategies in child development, art therapy, and educational practices.
KW - Art Therapy
KW - Children’s Drawings
KW - Emotional Assessment
KW - Image-to-Text Models
UR - https://www.scopus.com/pages/publications/105005823239
U2 - 10.3233/SHTI250471
DO - 10.3233/SHTI250471
M3 - Article
C2 - 40380579
AN - SCOPUS:105005823239
SN - 0926-9630
VL - 327
SP - 808
EP - 812
JO - Studies in Health Technology and Informatics
JF - Studies in Health Technology and Informatics
ER -