The movement toward sharing data from clinical trials has divided the scientific community, and the battle lines were evident at a recent summit sponsored by the Journal. On one side stand many clinical trialists, whose lifeblood — randomized, controlled trials (RCTs) — may be threatened by data sharing. On the other side stand data scientists — many of them hailing from the genetics community, whose sharing of data markedly accelerated progress in that field.
At a time when RCT funding is shrinking, trialists know that sharing data adds substantial costs to clinical trial execution; a requirement to share data might mean that fewer trials, and smaller ones, will be conducted. Many trialists also worry that complex data will be misinterpreted by people who weren’t involved in generating them, and who may therefore produce misleading results. Furthermore, journal publications are the currency of academic advancement. Researchers often invest 5 to 10 years gathering trial data, expecting to write several papers after their primary publication. An expectation that data will be shared quickly may therefore create a disincentive for conducting RCTs.
Data scientists promoting data sharing are joined by some members of the medical community, who point to abundant unpublished studies with negative results as missed learning opportunities and invitations to wasteful repetition of trials. Some proponents see resistance to data sharing as motivated purely by self-interest. As Isaac Kohane, chair of the Department of Biomedical Informatics at Harvard Medical School, recounted, when geneticists began aggregating their data, there were notable holdouts who, fearing being scooped, withheld data and slowed the community’s progress. Yet as Ewan Birney, a geneticist who codirects the European BioInformatics Institute, noted at the summit, “once everyone has done it for a little bit of time, you will forget you had these arguments.”
And everyone may have to do it soon. The National Institutes of Health now requires that grant applicants outline a data-sharing plan, as do the Cancer Moonshot, the Gates Foundation, and the Wellcome Trust. But many details need to be worked out, from incentive structures to sustain data generation, to standards for data exchange, to identification of the subset of clinical questions for which sharing is most cost-effective. Indeed, the focus of the data-sharing summit was less about whether to share data and more about how best to do so.
Perhaps the most incisive question, posed by Rory Collins, a University of Oxford epidemiologist and trialist, was the most obvious one: What problem are we trying to solve? The advancement of science depends on the open exchange of ideas and the opportunity to replicate or refute others’ findings. But will data sharing address our current system’s shortcomings in a way that advances science? For example, though it’s troubling when trials provide incomplete information about adverse events, requiring the sharing of individual patient data from every trial might not be the best way to fix that problem. A more effective solution, Collins suggests, may be publishing, alongside the primary trial results, an easily accessible appendix containing adverse-event data in tabular form. Proponents of data sharing also believe it will allow other investigators to generate new insights and hypotheses. But will such insights advance health in a way that justifies the cost?
Preliminary evidence reveals less enthusiasm than anticipated for using shared RCT data. In 2007, GlaxoSmithKline created the website clinicalstudydatarequest.com (CSDR), where data from at least 3049 trials are currently available, from 13 industry sponsors. According to the independent panel that reviewed research proposals, in the first 2 years, 177 proposals were submitted, most of them for a new study and publication, but despite substantial investment by industry sponsors, only four manuscripts have been submitted for publication thus far.1 Brian Strom, one of the panel members, noted that because industry analyzes its data so exhaustively in anticipation of intensive interrogation from the Food and Drug Administration, it’s possible that nonindustry data will yield more new findings. But industry’s resources also far exceed academia’s, so relatively speaking, data sharing’s costs for academics will be far greater.
The more substantial challenge described by investigators seeking to use others’ data, however, seems to be the burdensome nature of analyzing data behind a firewall. Rather than receiving data to analyze themselves, investigators submitted to CSDR statistical inquiries that were run by the repository’s managers, a time-consuming process. The consensus, according to Strom, was that “true data sharing would be preferable to data access on a dedicated website.”
So what is true data sharing? Does the work required to overcome these challenges, or the outputs achieved, differ when sharing is imposed from the outside rather than motivated by prospectively identified common goals? Clinical investigators, after all, have long collaborated in efforts to address unanswered questions. More than 30 years ago, for instance, Collins led the creation of the Clinical Trial Service Unit, a global consortium of trialists who sought to share data and pool their results. Though it required a tremendous time investment and endless communication among investigators to understand each other’s data sets, there was a shared sense of purpose and pride in the clinically meaningful results. For instance, though it was believed that tamoxifen reduced recurrence but did not improve survival among women with breast cancer, through a planned combined analysis with longer follow-up and more patients, the group found that there was a survival benefit — transforming the standard of care.2
Collins, therefore, firmly believes in data sharing’s benefits, but he recommends considering all the likely hitches. Comparing the relative ease of data sharing in 1995, when a cholesterol-treatment meta-analysis was prospectively planned,3 to current “clunky” and more time-consuming processes, Collins notes, “It is ironic that as a result of the data-sharing agenda and the formalization of the systems, it is now more difficult to get access to the data.” The aggregation of vast genetics databases suggests that these technical and bureaucratic challenges are growing pains. But clinical trial data sets may be sufficiently complex that streamlining data exchange will require extensive input from the data’s generators. Ideally, the trialist community will create uniform standards and data will be collected with those standards in mind.
A greater risk may be to the clinical trialist community. It’s assumed that data sharing will advance the public health, but will the public benefit if there are steep declines in the number and size of clinical trials? Though more “open” science may yield as-yet-unimagined innovations, unplanned and retrospective secondary analyses can only generate, not test, hypotheses. The type of hypothesis testing that can advance treatment of disease will always depend on active and motivated clinical trialists asking questions prospectively.
And though tinkering with data-exclusivity periods and designations of academic credit may reduce the disincentives created by data sharing, I think there is something at stake here that incentives can’t solve: our capacity to rationally weigh trade-offs as we debate how best to advance science. While the recent summit was civil and collaborative, the tenor of the broader data-sharing conversation has framed the matter as one of trialist self-interest versus public good. But such a frame vastly oversimplifies the situation — and tends to entrench people in polarized positions, articulated with righteous indignation.
The indignation of data-sharing advocates arises in part from the claim that the absence of data sharing slows the development of cures. In addition, at a political moment when promises of data democratization overshadow faith in traditional expertise, reservations about data sharing are easily dismissed as elitist — as are the experts who point out misunderstandings of a topic they’ve spent years studying. The value placed on transparency also contributes: any resistance to greater openness is branded as secrecy and deceit. Finally, the deepest (and perhaps most valid) source of moral outrage may be the sentiment that clinical trial data aren’t ours to begin with, that they should belong to the patients who put themselves at risk to participate. And in principle, patients want their data shared.
But patients also want better treatments for their diseases. And though data sharing may sometimes lead to better treatments, it may also divert limited resources to types of research that are less fruitful than RCTs, impeding the evidence generation required for improving care. The irony in the framing of this debate is that to share data in a way that advances knowledge, we must be open to one another’s experience and expertise, setting aside ideology in pursuit of more objective truths. Fulfilling this obligation, as we refine the scientific process, will require not only sharing what we find but also resisting the temptation to demonize those who see different paths to our shared goal.
Article link: http://www.nejm.org/doi/full/10.1056/NEJMp1704482