Data is the Fuel for AI in Dermatology
In the age of artificial intelligence, data serves as the crucial foundation for innovation. Just as fuel powers engines, datasets drive AI model training, determining their accuracy, reliability, and effectiveness. When developing AI-powered solutions for skin disease detection—whether AI Skin scanner mobile apps, cloud-based platforms, or diagnostic software—the first and most essential step is data collection and preparation. Without high-quality, diverse, and well-labelled images of skin conditions, even the most advanced neural networks will fail to perform effectively in real-world applications.
This article provides a comprehensive review of 10 publicly available skin disease datasets that can serve as valuable resources for AI research. Whether you are a data scientist, healthcare AI developer, or entrepreneur in the digital health sector, this guide will help you navigate the landscape of dermatology datasets, understand their strengths and limitations, and make informed decisions for your AI model development.
For each dataset, we analyze:
- The number of images and variety of skin conditions covered
- The source and labelling quality of the data
- The image types (clinical vs. dermoscopic) and resolution quality
- The dataset’s licensing terms and accessibility
- Key advantages and drawbacks
Furthermore, we explore why publicly available datasets often fall short of commercial AI solutions, including issues like limited image diversity, class imbalances, and legal constraints. We also discuss the critical next steps in AI development—data preprocessing, model training, regulatory compliance, and deployment.
Finally, we introduce Skinive.Cloud, a cutting-edge AI-powered skin analysis API engine that provides an alternative to building an AI model from scratch. With access to a big proprietary dataset of millions of images, CE-Mark certification, and seamless Whitelabel API integration, Skinive.Cloud allows Skin Health & Beauty businesses to implement AI-driven skin analysis solutions quickly, cost-effectively, and without regulatory roadblocks.
If you’re looking to develop a dermatology AI solution, this article is your starting point. Read on to discover the best datasets for your project and learn how to accelerate your AI development with industry-leading technology.
Top 10 Opensource Skin Disease Datasets
1. ISIC Archive
- URL: https://www.isic-archive.com
- Number of Images: 85,000+
- Disease Categories: Melanoma, basal cell carcinoma, squamous cell carcinoma, and benign skin lesions
- Data Collection & Labeling: Dermatologists and oncologists
- Image Type & Quality: High-resolution dermoscopic images
- Usage Conditions: Free for research use
- Strengths: Large dataset, expert annotations, widely used in AI research
- Limitations: Imbalanced classes (more benign lesions than malignant cases)
2. HAM10000
- URL: https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000
- Number of Images: 10,015
- Disease Categories: 7 skin conditions, including melanoma and dermatofibroma
- Data Collection & Labeling: Dermatologists
- Image Type & Quality: High-resolution dermoscopic images
- Usage Conditions: Open-source (Kaggle)
- Strengths: Well-labeled dataset with balanced classes
- Limitations: Limited number of images
3. DermaMNIST
- URL: https://medmnist.com
- Number of Images: 10,015 (resized for AI training)
- Disease Categories: 7 skin conditions
- Data Collection & Labeling: Medical professionals
- Image Type & Quality: Lower-resolution dermoscopic images
- Usage Conditions: Open-access
- Strengths: Lightweight dataset ideal for quick experiments
- Limitations: Lower image resolution affects model accuracy
4. SD-198
- URL: https://derm.cs.sfu.ca
- Number of Images: 6,584
- Disease Categories: 198 skin conditions
- Data Collection & Labeling: Stanford University researchers
- Image Type & Quality: Clinical images (macro photos)
- Usage Conditions: Request-based access
- Strengths: Wide variety of conditions
- Limitations: Limited public access
5. PAD-UFES-20
- URL: https://www.kaggle.com/datasets/mahdavi1202/skin-cancer
- Description: A dataset from the Federal University of Espírito Santo with real-world clinical images.
- Size: 2,298 images.
- Categories: 8 disease types.
- Annotations: Metadata with demographic information.
- Availability: Publicly available.
- Best for: General dermatology AI applications.
6. PH^2 Dataset
- URL: https://www.fc.up.pt/addi/ph2%20database.html
- Description: A dermoscopic dataset for melanoma analysis.
- Size: 200 images.
- Categories: Includes melanoma, atypical nevi, and Benign Nevus.
- Annotations: Pixel-level segmentation masks.
- Availability: Available upon request.
- Best for: Segmentation and melanoma classification research.
7. Derm7pt Dataset
- URL: https://github.com/jeremykawahara/derm7pt
- Description: Focuses on the seven-point melanoma checklist criteria.
- Size: 1,011 images.
- Categories: Melanoma and non-melanoma skin cancer.
- Annotations: Detailed feature annotations.
- Availability: Free for research use.
- Best for: Explainable AI and feature-based classification.
8. Fitzpatrick 17K
- URL: https://github.com/mattgroh/fitzpatrick17k
- Description: A dataset addressing skin tone diversity in AI models.
- Size: 16,577 images.
- Categories: Covers a broad range of skin conditions.
- Annotations: Labeled with Fitzpatrick skin types.
- Availability: Available via Google Dataset Search.
- Best for: Reducing AI bias in skin disease detection.
9. BCN20000
- URL: https://paperswithcode.com/dataset/bcn-20000
- Description: A dataset for skin cancer classification developed by the Barcelona Supercomputing Center.
- Size: 26,426 images.
- Categories: 8 types of skin lesions.
- Annotations: Diagnosed by dermatologists.
- Availability: Free for academic use.
- Best for: AI model training for clinical dermatology.
10. SIIM-ISIC Melanoma Classification Dataset
- URL: https://www.kaggle.com/competitions/siim-isic-melanoma-classification
- Description: A Kaggle-hosted dataset designed for melanoma classification challenges.
- Size: 33,126 images.
- Categories: Melanoma vs. benign lesions.
- Annotations: Binary classification labels.
- Availability: Available on Kaggle.
- Best for: Benchmarking AI models in melanoma detection.
▶️ Video: How to Build a World-Class ML Model for Melanoma Detection
If you’re looking to apply AI techniques in dermatology, check out the YouTube video “How to Build a World-Class ML Model for Skin Cancer Detection.” It’s an excellent resource for learning about advanced machine learning strategies in skin disease diagnosis.
The Next Steps in AI Development for Dermatology
Even with a dataset, AI model training requires:
- Preprocessing & Augmentation: Cleaning and standardizing images.
- Hiring Data Scientists: Skilled professionals to build and fine-tune AI models.
- Computational Resources: High-performance GPUs and cloud computing for training deep learning models.
- Continuous Experimentation: Multiple iterations to achieve optimal accuracy.
Once the AI model is trained, the next step is to develop a mobile, web, or desktop application with skin analysis functionality. However, before launching the product, it must pass rigorous medical certification processes, including CE-Mark, FDA, ISO 13485, HIPPA, GDPR… ensuring compliance as a medical device.
The entire process, from dataset collection to certification, can take years and cost hundreds of thousands or even millions of dollars…
Why Free Datasets Are Often Insufficient for AI Training?
While these publicly available datasets provide a solid foundation for research, they often fall short in real-world applications due to:
- Data Imbalance: Most datasets contain more benign lesions than malignant cases, affecting model training.
- Low Image Quality: Many datasets have varied resolutions, limiting AI accuracy.
- Limited Diversity: Public datasets often lack images across different age groups, ethnicities, and skin types.
- Legal & Ethical Restrictions: Using some datasets in commercial applications may require additional permissions.
For commercial applications, it is often necessary to collect and label data independently, ensuring high-quality, diverse, and legally compliant datasets.
A Faster and More Cost-Effective Solution: Skinive.Cloud
Skinive.Cloud offers an AI-powered skin analysis API with significant advantages:
- Built on a massive dataset (3+ million images) verified by dermatologists and oncologists.
- CE-Mark and GDPR-compliant (medical-grade software), ready for commercial use.
- Whitelabel solution: Easily customizable for your brand.
- Seamless API integration into mobile, web, and desktop applications.
- Continuously improving AI models without additional development costs.
- Cost-effective: Avoid the high costs of developing your own AI solution.
Beyond Technology: Expert Support for Your Business
At Skinive, we provide not just technical support but also business consultation to help you achieve your goals. We have extensive experience in integrating AI-powered skin analysis into various industries, including:
- Health & Beauty Apps (like AI Skin Scanner app)
- Telemedicine Platforms (EMR/EHR Systems)
- E-commerce for Skincare Products
- Insurance Companies
- Hospitals & Diagnostic Labs
- Beauty Clinics & SPAs
Start Your AI-Powered Skin Analysis Journey Today
Instead of spending years on research, development, and certification, you can integrate Skinive.Cloud today and bring an AI-driven skin analysis solution to market faster and more affordably.
🔗 Learn more at Skinive.Cloud
📞 Schedule a call with our sales team today!