Impact on healthcare. A case study on Diabetic Retinopathy.

Abstract

We developed an automatic screening/diagnostic system for diabetic retinopathy using an ensemble of deep neural networks followed by a random forest classifier. Our system has a sensitivity of 95% and a specificity of 65%.

    Healthy
    0%
    Mild
    0%
    Moderate
    0%
    Severe
    0%
    Proliferative
    0%
    
    							

Problem overview

Diabetic retinopathy (DR), a major microvascular complication of diabetes, has a significant impact on the world's health systems. In Mexico alone this disease affects more than 11 million people [1]. Globally, the number of people with DR will grow from 126.6 million in 2010 to 191.0 million by 2030, and it is estimated that the number with vision-threatening diabetic retinopathy (VTDR) will increase from 37.3 million to 56.3 million, if prompt action is not taken. Despite growing evidence documenting the effectiveness of routine DR screening and early treatment, DR frequently leads to poor visual functioning and represents the leading cause of blindness in working-age populations. DR has been neglected in health-care research and planning in many low-income countries, where access to trained eye-care professionals and tertiary eye-care services may be inadequate. Demand for, as well as, supply of services may be a problem. Rates of compliance with diabetes medications and annual eye examinations may be low, the reasons for which are multifactorial [2].

Motivation

With the intention of developing an automatic diagnostic system for the screening of patients with possible diabetic retinopathy, we used recent advances in computer vision and deep learning to train an ensamble of neural networks to detect this disease and its level of progression.

Model overview

Data

For training and validation, 85,000 high-resolution images were used, each one consisting of a digital slit lamp capture labeled with its diagnosis made from a clinician, who rated the severity of the disease. Each image is labeled as being [0] no DR, [1] mild DR, [2] moderate DR, [3] severe DR or [4] proliferative DR. The per-class representation in the dataset is as follows:

Class Number of images
No DR 62,920
Mild DR 5,650
Moderate DR 12,440
Severe DR 2,020
Proliferative DR 1,690

The data was randomly divided between train (90%) and test (10%) sets. Test results were used for early-stopping during training and to choose some metaparameters of the neural networks.

An example image from original data.
An example image from original data.

Preprocessing

The eye is detected and the image is rescaled and adjusted so that the eye is always in the center with a fixed size. RGB channels are locally normalized with a moving gaussian kernel in order to highlight local image variability. This allows the model to be agnostic to global light intensity and other factors depending on the particular camera used.

An example image from original data.
This image represent the final image from a Proliferative DR study used for neural network training.

Neural Networks

Several neural networks were trained using different architectures (InceptionV3, Resnet50). The training leveraged transfer learning from an Imagenet model, and was done in stages from the top-most layers gradually diminishing the learning rate. Two weeks of 2-gpu servers were used for the training of each model.

An example image from original data.
After training, a neural network is capable to evaluate preprecessed images, this image shows the heatmap where damage is being found on a Proliferative DR patient.

Random Forest

We trained a Random Forest to combine the results of the different neural networks on both eyes of the patient with other statistics from the images, to predict the final probabilities that a particular image corresponds to a certain level of DR. This stage assigns to each image a vector with the probabilities of each class.

Label aggregation

Most guidelines recommend annual screening for those with no retinopathy or mild diabetic retinopathy, repeat examination in 6 months for moderate diabetic retinopathy, and an ophthalmologist referral for treatment evaluation within a few weeks to months for severe or proliferative diabetic retinopathy [3].

Following other studies such as [3], we define a negative case as no-DR or mild-DR, and a positive case as moderate, severe or proliferative DR. The vector of probabilities is therefore simplified into the probability of being a positive DR case. We can now create a ROC curve to choose the threshold for our prediction. A family of models with different sensibility and specificity. In figure X we can see the different possibilities. Among these we chose a model with 95% sensitivity and a corresponding 65% specificity so that it serves as a good first screening layer in a diagnostic pipeline.

In a similar fashion, we created a Red alert using only severe and proliferative DR as positive cases and looking for a sensitivity of 0.9. These two alerts, yellow and red have the following statistics:

Class Yellow alert Red alert
No DR 18% 1%
Mild DR 57% 2%
Moderate DR 90% 38%
Severe DR 98% 89%
Proliferative DR 98% 91%
No DR or Mild DR 35% (general specificity = 65%) 1%
Moderate, Severe or Proliferative 95% (general sensivity) 50%
Severe or Proliferative 98% 90%

Table 1. The probability of triggering the Yellow or Red alerts when the patient has a certain class level of retinopathy. We see that the Red alert is only likely to be triggered with Moderate, Severe or Proliferative DR. Yellow alert is more conservative and is able to detect 95% of all positive cases. In combination, both alerts can be extremely useful for the early detection of diabetic retinopathy.

Further steps for improving of the model performance:

  • A more robust labeling following the example of [3] would definitely decrease the prediction error. In order to do this we will collaborate with a team of ophthalmologists for systematic robust diagnosis and localization of wounds.
  • During the Random Forest stage, the inclusion of additional data from the patients (such as glucose levels, age, etc) would be very valuable.
  • Currently the model uses an ensemble of 3 neural networks. Bringing this to at least 10 could prove very effective in increasing the accuracy of the model. In addition, working with larger (better resolution) images could allow us to detect smaller wounds. These two only amount to having more computing power during training.

External References

  1. Federación Mexicana de Diabetes, A.C. (2016). La Retinopatía Diabética se convertirá en la principal causa de baja visión en México. http://fmdiabetes.org/la-retinopatia-diabetica-se-convertira-la-principal-causa-baja-vision-mexico/
  2. Zheng, Y., He, M., & Congdon, N. (2012). The worldwide epidemic of diabetic retinopathy. Indian Journal of Ophthalmology, 60(5), 428–431. http://doi.org/10.4103/0301-4738.100542
  3. Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD (2016). Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. Google Research. http://jamanetwork.com/journals/jama/fullarticle/2588763

Download this case study.


What are you waiting for to become a data hero?