“It is algorithms – not data set – that will prove transformative”. This is the captivating sentence that opens the article “Predicting the future”, recently published in the New England Journal of Medicine . The reason seems related to the importance of machine learning: could you briefly explain it?
We’re beginning to see a new kind of use for computers in medicine. Today, we use computers to apply rules: we get an alert if we try to prescribe ciproflox with coumadin. We already know these rules—we just forget sometimes, because it’s late at night, or early in the morning, or we didn’t know the patient was on coumadin.
In the not so distant future, we will have algorithms that will tell us things about our patients that we didn’t already know: whether they are likely to die if we start a given chemotherapy regimen, their risk of heart attack in the next 7 days based on their individualized medical history. It will be very exciting, not just for medical practice, but also for driving new ways of understanding health and disease using very complex data.
Why do you think that “letting the data speak for themselves can be problematic”? How can we address this concern?
When we develop algorithms, we worry a lot about “overfitting”: we can predict a given outcome extremely well in one dataset, but because of quirks in the data, when we transfer that into the real world it doesn’t work nearly as well. In many non-medical data science competitions, they handle this problem by releasing one set of data to the people competing to develop the algorithms—then after teams upload their final algorithms, they test them on a completely separate dataset for ‘validation’. This principle of separate data for model development and validation is extremely important—as anyone who has done non-machine learning prediction algorithms knows already.
Is the quality of data related to their quantity?
Ha. Probably inversely.
Correlation doesn’t imply causation: this is a big problem in clinical epidemiology. Should we expect a solution from machine learning?
I doubt it. There are some people doing extremely interesting work in this space, using machine learning to create better risk adjustment methods, propensity scores, and instrumental variables—but ultimately these algorithms find correlations—that’s their strength. It’s unlikely that they will solve any of the fundamental problems of causal inference in observational data sets.
“Algorithms are only as reliable as the data they are based on.” 
Which are the areas of medicine you and Prof. Emanuel think will be disrupted by transforming data into knowledge?
It will certainly help doctors with prognosis: understanding when a patient will die, whether a cancer will metastasize. Predicting the future is something these algorithms are extremely good at.
They will also improve diagnostic accuracy: suggesting high-value tests, and reducing overuse of testing. This is more complicated, and further away in time, because even doctors often don’t have the ‘gold standard’ for the machine to learn how to make a proper diagnosis! This makes it harder to train algorithms.
A related point is that algorithms will take over much of the work of radiologists and anatomical pathologists, who work with digital data… that could be sent to a machine instead. Eventually they will also interpret streaming data from anesthesiology and critical care.
So in 20 years, radiologists won’t exist in anywhere near their current form: they will have to adapt and change, a bit like construction workers, who are doing very different jobs today than they were before mechanization 100 years ago. Bank tellers don’t hand out cash anymore, but they do handle far more complex transactions than they used to. Technology doesn’t always eliminate jobs; sometimes it changes them, and those that adapt can turn out to be big winners.
 Obermeyer Z, Emanuel EJ. Predicting the Future – Big Data, Machine Learning, and Clinical Medicine. N Engl J Med 2016;375:1216-9.
 Parikh RB, Obermeyer Z, Bates DW. Making Predictive Analytics a Routine Part of Patient Care. Harvard Business Review 2016; 21 aprile – Ultimo accesso 14 ottobre 2016.