Multimodal Skin Disease Classification Using Deep Learning And NLP-Based Chatbot
DOI:
https://doi.org/10.62643/Keywords:
Skin Disease Classification, Multimodal Learning, Deep Learning, Natural Language Processing (NLP), Medical Image Analysis, Convolutional Neural Networks (CNN), Clinical Decision Support System, Dermatological Image Processing, Healthcare Chatbot, Artificial Intelligence in Healthcare, Symptom-Based Diagnosis, Computer- Aided Diagnosis, Transfer Learning, Patient Interaction Systems, Automated Skin Disease DetectionAbstract
Human skin serves as a vital reflection of overall internal health, often displaying early symptoms of
underlying organ-related disorders. Detecting these signs at an early stage is essential for timely diagnosis and
effective treatment. However, the protective and diagnostic role of the skin is frequently overlooked. This
research aims to develop a multimodal skin disease classification system that integrates ensemble-based
transfer learning models (DenseNet169, ResNet50, EfficientNetV2) with Natural Language Processing (NLP)
through a Telegram chatbot interface. The primary goal is to enhance the chatbot’s ability to deliver
personalized and precise skin-related assessments by considering user-provided details such as skin type,
exposure to chemicals, and previous treatments. By combining image-based analysis with contextual textual
data, the proposed system improves both diagnostic accuracy and personalization. EfficientNetV2 enhances
computational efficiency and extracts high-resolution features, while the Swin Transformer captures both
global and local patterns through hierarchical vision transformers, enabling better generalization across
diverse skin diseases. Additionally, geospatial mapping (MAP integration) is incorporated to visualize the
distribution and frequency of skin disease cases across different geographic regions, supporting
epidemiological studies and public health monitoring. The chatbot includes a self-learning mechanism that
refines its responses over time based on user interactions, thereby improving engagement and system
performance. This hybrid approach effectively integrates fine-grained visual features with contextual
understanding, resulting in robust classification outcomes. A total of 2,274 images were evaluated.The
proposed model achieved 85.5% accuracy and an AUC of 97.72% in image classification, while the NLP
component attained 95.62% accuracy, delivering a comprehensive and personalized diagnostic solution.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.













