Course notes of: AI for medical specialization
The best way to get deeper understanding of machine learning is to apply machine learning to different areas. One of the most exciting applications is the field of medicine. This course will focus on teaching how we can diagnose pneumonia from unstructured data or identify heart attack risk based on patient’s lab result.
It is a three course specialization.
- First course will focus on building machine learning models for diagnosis. Diagnosis is all about identifying disease. Diagnosis means, the process of determining which disease or condition explains the person’s symptoms, signs, and medical results.
- Second course will focus on future health of a patient, known as prognosis.
- Third course will teach AI for treatment that is process of medical care and information extraction. It will focus on the machine learning models that will help to identify the effect of a particular treatment on a patient.
Prerequisites for this course are as follows:
- Basics of deep learning including supervised learning, convolutional networks and loss function.
- Basics of Python
Starting with Medical Image Diagnosis course week 1:
Following three examples of medical diagnosis tasks where deep learning has achieved incredible performance.
- Dermatology: It is one of the branch of medicine that deals with the skin. One of the tasks that dermatologist perform is to look at the mole of the skin and identify whether it is cancer or not. Early detection can have enormous impact on skin cancer outcomes.
2. Ophthalmology: It deals with the diagnosis and treatment of eye disorders. One of the disorders is diabetic retinopathy which is a damage to the retina caused by diabetes and it is a major cause of blindness.
Study shows that attempt to replicate diabetic retinopathy screening programs in middle and low income countries have not been successful. There are several challenges faced by these countries such as:
- There are 70 millions of people with diabetes in India and equal number of people with pre-diabetic or undiagnosed diabetes.
- Currently detecting DR is a manual and tiring process and requires trained clinician. The primary care infrastructure in countries like India are at their infancy and Standard retinal cameras are too costly.
Hence there is an urgent need to develop cost effective screening and treatment pathways that can cover majority of population with diabetes.
3 Histopathology: Medical specialty involving examination of tissues under microscope. One of the tasks that pathologists do is look at scanned microscopic images of tissue called whole slide images, and determine the extent to which a cancer has spread. This is important to help plan treatment, predict the course of the disease, and the chance of recovery.
While above mentioned branches have achieved amazing results with deep learning, following are few challenges, data scientists face while training the algorithms on medical images.
- The class imbalance: No equal number of examples of non-disease and disease medical data set. It is a reflection of real world where we see a lot more examples of normal conditions than the diseased one. This creates a problem for learning algorithm as it will generate low probability of having disease for everyone and misidentify example with a disease.
Solution: 1) Modify the loss function in such way that the generated loss has no bias towards positive examples. 2) Resample the data set: Include equal number of positive and negative examples. During resampling there is a possibility of not including all the negative samples and repetition of few of the positive examples resulting into equal number of positive and negative examples.
- Multitask: In real world one image may help in identifying different diseases. For example x-ray image can help in identifying pneumonia, Mass or edema(Access fluid in the lungs) problem but single algorithm may not perform well for all the disease identification.
Solution: One way we can recognize all the diseases is to train different models to perform distinct function or the other way is to train one algorithm with using features that are more common to identifying more than one disease and adding the loss function calculated for each class. Here we can also consider data imbalance problem by calculating weighted loss of positive and negative examples of each class as shown in below image.
- Data set size: In real world medical data size is very less ranging from 10 thousand to 100 thousand and to achieve accurate results, large data is required.
- Solution: In the field of medicine, convolutional neural networks are used as models to process 2D images like x-rays and 3D images like CT Scan. Various models such DenseNet, Inception, RexNet have been identified to work well on medical data set. The standard is to try out multiple models on the desired tasks and see which ones work best. These models need large amount of dataset for training and medical field lacks the number of images available to train sufficiently to these models. Hence one of the most acceptable ways of training known as Transfer learning. In transfer learning you can train the algorithm on a larger database to train on generic features such identification of edges. For example, identifying edges in the image of penguin can also help in identifying edges in X-ray image. Training the model on general images is known as pre-training. Then trained network can be used as a starting point for medical imaging task by copying over the learned features. The network can further be trained to identify x-rays or other medical images. The second step is known as fine tuning.
Generate more samples: This technique involves contrast change, rotation, flipping, zoom-in, zoom out and other techniques that help in adding more variation to the existing data set. The goal is to increase the samples by preserving the labels. This technique is known as data augmentation.
When we apply the training to the data set, we split the data in train and test subset.
As the name suggests, train data set is used to train the model i.e development and selection. It is further split in to training and validation set. where training data is used for development of the model and validation set is used for hyper parameter tuning and evaluation of the performance of the model. Test data is used to test the correctness of the model.
Three key challenges while training and testing the algorithm:
- Patient Overlap: Sometimes, Deep learning networks are good at memorizing the training data set hence they perform well on test data. For example, training set contains x-ray image of a patient with some necklace and test set also contains, same patient’s another x-ray image with necklace. Such extra information helps deep learning networks to memorize and predict accurate results on test set.
Solution: Ensure that a patient’s x-ray image occurs only once in any of the tests. This can be achieved by dividing the images by patient resulting into same patient’s images will be present in one set only.
2. Set Sampling: Sampling the test data from data set may result in imbalance sampling where diseased examples might be absent and thus testing the performance of the algorithm may not give correct results. It is a common issue in medical field where we usually have a small data set and then sampling the test and validation data becomes more difficult.
Solution: Sample x% of examples of minority class and follow the same strategy for validation data set.
3. Ground Truth(Reference Standard): Major question while testing a model is to determine the correct label for an example. The right label is called ground truth in machine learning and reference standard in the context of medicine. In case of chest x-ray differentiating between disease might be complex and may result in disagreement between experts.
Solution: In such cases we can use consensus voting where we form a group of experts and consider majority vote. The other method is to go for more definitive tests in order to confirm the results.
Some of the research papers related to study of AI in medicine.
Stay Tuned, for week 2’s notes ☺️