In the paper “Finding the missing link for biomedical data” , KD Mandl, IS Kohane, and you stated that “big data becomes transformative when disparate data sets can be linked at the individual person level.” How is the situation in the USA?
Healthcare is highly fragmented in the USA, with patients being treated at many hospitals and clinics and switching insurers frequently. Because of patient privacy concerns, data are intentionally kept isolated within these various silos, and there is strong reluctance to creating a national patient identifier. This makes combining the health information about a person very challenging.
Which are the opportunities offered by big data to healthcare research?
The term “big data” can have different meanings to healthcare research. New informatics technologies, such as federated query tools that can search across multiple hospitals, are providing investigators with access to traditional health information, such as electronic health records and administrative claims data, on tens of millions of patients. In this case, where big data refers to the number of patients, investigators have a large enough sample size to see subtle variations in diseases across different demographic and geographic subpopulations, identify small effects of genes, or monitor for rare side effects of medications. Biomedical big data can also mean looking beyond the healthcare system for other sources of data that relate to patient health. For example, grocery shopping patterns obtained from stores might improve models that predict rates of obesity and type 2 diabetes; wearable devices that track exercise might provide insight into response rates of cholesterol-lowering drugs; physical distance from patients’ homes to hospitals and pharmacies might influence health care utilization and costs; and patients’ Facebook friends might influence lifestyle choices and compliance with medical treatments.
In the Jama paper you presented a framework with different sources of health information, including data outside the healthcare system like social media…
Yes, and the first challenge in using big biomedical data effectively is to identify what the potential sources of health information are and to determine the value of linking these together. For example, electronic health records can provide depth by including clinical notes and images about individual patient encounters, while claims data can provide longitudinality with summary billing codes over an extended period of a patient’s medical history. Social media, credit card purchases, census records, and numerous other types of data, despite varying degrees of quality, can help assemble a holistic view of a patient, and, in particular, shed light on social and environmental factors that may be influencing health. However, it is not necessary to link all these types of data. The key is to select the ones that will be most effective in answering the research question.
“The first challenge in using big biomedical data effectively is to identify what the potential sources of health care information are and to determine the value of linking these together.”
How should we address growing concerns about privacy?
Privacy and security concerns present a social challenge in linking big biomedical data. As more data are linked, they become increasingly more difficult to deidentify. One constructive response would be to regulate what is legal and ethical, to ensure that benefits outweigh risks, and to include patients in the decision-making process. An alternative approach would simply be to put the onus entirely on the patients and give them control over their data. However, as has been seen for far less private data, individuals are likely to share their data publicly only to regret it later when those data were used in unanticipated circumstances. It may therefore be timely to convene a public forum whereby the relevant stakeholders, including citizens, the health care community, and commercial data vendors could meet to frame the policy from which legislation and ultimately technical protections for big biomedical data linkage will devolve.
Which are your main research projects in the area of social network analysis?
One of my other research interests is how scientists find collaborators and form teams. This is important because there is growing consensus that the grand societal challenges of the 21st century will only be solved if experts from diverse backgrounds can come together and work effectively as a group. One of the products of my research is a social networking website for scientists called Profiles Research Networking Software (RNS). The website is used at dozens of universities around the world to help people find experts in different areas. Profiles RNS automatically creates online profiles of investigators by linking various sources of data, such as an institution’s internal human resources system, journal articles from PubMed, and publicly available grant and patent databases. Investigators can login to Profiles RNS to post additional content or control privacy settings.
“It may therefore be timely to convene a public forum whereby the relevant stakeholders, including citizens, the health care community, and commercial data vendors could meet to frame the policy from which legislation and ultimately technical protections for big biomedical data linkage will devolve.”
 Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. Jama 2014; 25;311:2479-80.