Science In The Information Age
Monday, March 1, 2010
As part of our monthly series on ethics in science and technology, we'll look at how the Internet has changed access to scientific studies and how the public can benefit and be harmed by it.
The Ethics Center Forum "Who owns the data?" is Wednesday, March 3, at 5:30 p.m. at the Reuben H. Fleet Science Center in Balboa Park.
MAUREEN CAVANAUGH (Host): In this day and age of immediate internet access, some are calling on science to break down the old barriers that separated the public from the latest scientific research. And that argument is substantially strengthened when the research has been funded by public money. Many scientists say they're all for sharing research data, but as with many things, the devil is in the details. Just putting data on the internet doesn't mean that it's good science. And yet, who decides when and how scientific breakthroughs should be released to the public? As part of our monthly series on ethics in science and technology, this morning we're discussing the issue of access to scientific research. I’d like to welcome my guests. Stanley Maloy is professor of biology and the dean of the College of Sciences at SDSU. Professor Maloy, welcome to These Days.
STANLEY MALOY (Dean of College of Sciences, San Diego State University): Good morning, Maureen.
CAVANAUGH: And Philip Bourne is professor of pharmacology at UC San Diego and founding Editor-in-Chief of the open access journal PLoS Computational Biology. Professor Bourne, welcome.
PHILIP BOURNE (Editor-in-Chief, the journal PLoS Computational Biology, UC San Diego): Good morning, Maureen.
CAVANAUGH: And we invite our listeners to join the conversation. When do you think the public should have access to scientific information? Have you had trouble finding research you think is critical to your health or some other endeavor? Call us with your questions and comments. 1-888-895-5727 is the number, that’s 1-888-895-KPBS. Professor Maloy, let me start with you maybe for an overview of this. How has the internet changed the way scientists and the general public access information?
MALOY: Well, in the old days, really the only way you could get a lot of this data was by going to the library at a university where they paid for a lot of volumes, and a lot of people did not have access to that information so the professors were the keeper of the knowledge. That changed with the internet where it’s so easy to post data and everyone can access it immediately.
CAVANAUGH: And I wonder, so with benefits come dangers. What are the dangers of that kind of instant access?
MALOY: So this is the tricky part. It seems obvious to most people that availability of data would be a slam dunk. Why not do it? But there are several reasons that there are problems. One reason is if you release data, there may be privacy issues where some personal aspect of you or your family could be released to the public. There’s the threat of misuse. Someone may take that data to do something nefarious like build a new bioterrorism agent. Somebody may misinterpret your data. An example of that is data on climate change where people may be taking that data and coming to a conclusion that’s not generally accepted. And I think that from the public perspective, there’s one more thing that’s a big thing and that is for many of these discoveries to become valuable to society, industry has to take them and build a product, make a huge investment. So if the release of the data prevents something from being patented, it may ultimately keep us from benefiting from that science.
CAVANAUGH: I understand. So let me ask you, Professor Bourne, among – within the scientific community, what are the ethical issues involved in instant access to scientific data?
BOURNE: Well, I think Stan has begun to address that issue himself to some extent. I think the – there’s a concern that the data that goes out there is – I mean, science is built on a sort of premise of good results and bad results. It advances by the fact that people ultimately distinguish the two. But at the time the data goes out there, it may be that people are not aware that the data is ultimately going to be proved to be negative and, you know, so that, you know, the question is whether you put that data out there so there’s an ethical issue that you have to address and as scientists, we’re depending on our institutions and our organizations. Well, how we approach that is somewhat different. So, for example, in a public university, we are encouraged to make public essentially all of the information that we have eventually through publication and depositing data on the internet in various places. But at the same time, we have this dilemma whereby we really need to be careful so that that information is not misinterpreted. So I think that is, you know, sort of – But, overall, I think our job is to put as much good data out there as we can.
CAVANAUGH: I’m wondering, Professor Maloy, does the idea of putting that data out there as quickly as possible, does the – does a change according to what that data is, and I’m trying to form this question because last time we talked about the accessibility of scientific research, we had a call from a listener who wanted crucial information, really, about a disease that was being researched and a specific study that was being conducted by a university but couldn’t get her hands on it. And I’m wondering, if it’s medical data and there’s people actually suffering from a disease, does that really make it more important for the information to come out more quickly?
MALOY: You know, in many ways that’s true, that we all want access to medical information that could help us inform our own health decisions. But that kind of information is also some of the trickiest information with respect to privacy. So an example of this is that Jim Watson, a very famous scientist, got the Nobel Prize for the co-discovery of the structure of DNA. His genome was sequenced, all the DNA from his body, and that was released to the public. One of the things that was discovered in there was that he carries a particular mutation in a gene called ApoE. Now that’s a detail that doesn’t matter so much but what matters is that that mutation makes him more likely to get certain kinds of diseases. If it was only Jim Watson that this information affected, that would be okay. He could make the decision to release that or not. But this information may say something about his cousin or his uncle or other people, and ultimately it could even affect their ability to get a job. So that type of information, sometimes you can take a lot of information from different places and, in the aggregate, learn about the individual. So, in fact, the information that you might think is most important to get out there right away may be the information that’s most critical for us to be very, very careful about.
CAVANAUGH: I’m speaking with Professor Stanley Maloy and Professor Philip Bourne, and we’re talking about scientific research, internet access of scientific research and how quickly some data should be made accessible to the public. We’re taking your calls at 1-888-895-5727. Let’s talk with John. He’s calling from La Jolla. Good morning, John. Welcome to These Days.
JOHN (Caller, La Jolla): Hello. I know your conversation’s geared more towards the life science and medical type science but I just wanted to make a comment on Isaac Newton. He actually published quite a bit. He’s one of the greatest physicists of all time and there was a lot of resistance to his new ideas and he wound up deciding to stop publishing and just writing a book like after every 20 – every 20 years he’d write a book and I think – believe he wrote two of them. And so it’s just – it’s interesting that that – In fact, he invented calculus something like 20 years before he actually published the results on calculus just because there was so much resistance to his work. So…
CAVANAUGH: Well, thank you for that, John. Thank you for reminding us that it used to take some time. Professor Bourne, do you want to comment?
BOURNE: Well, I think in some sense we’re starting to see that happen today. There’s the notion of blogging now so scientists are more and more going out there and blogging and making their thoughts and their information available in a way that’s nontraditional. And so ideas that often don’t get published appear in these different kinds of online forms and, ultimately, some of them rise to the surface as very important ideas. And there’s a variety of different publishing models that also support this idea beyond the more conservative traditional journals that we’ve been used to in the past.
CAVANAUGH: Let’s talk a little bit more about what it used to be like and what kind of pressures scientists are facing today. Professor Bourne, in the past, someone could take their research and they could think about it, then they could write a paper and they could have the paper looked at and then it would be finally published. Am I basically right in that?
BOURNE: Yes, that’s correct.
CAVANAUGH: And now, what kind of pressures do scientists have on them to really get this stuff out?
BOURNE: Well, it’s actually changing very slowly but it is – there is signs of change on the horizon. I think the – it’s a balance between reward for doing something and the desire to disseminate. And so that reward still comes very much from publishing in the best journals that have the highest impact fact which means the prints will get read the most. But that’s starting to change as people are actually using these other mediums as I indicated. So – But it’s a slow process but I think, ultimately, some radical people like myself would actually say that the future of the scientific journal as it currently stands may somewhat be in question in five to ten years.
CAVANAUGH: Really? And what are the drawbacks of publishing what might be called raw data? I want to go to you again, Professor Bourne.
BOURNE: In a sense what I indicated before, and that is that, you know, it can be misinterpreted. You know, some data is straightforward. Other data, you know, requires a particular skill set that could be acquired over a long period of time to really make and ascertain what that data’s really saying. I think as scientists, we also have the responsibility for actually providing what I would call metadata which is essentially data about that data that, you know, essentially gives some indication to a wide audience how it’s best to be used.
CAVANAUGH: And I’m wondering, Professor Maloy, let’s say someone goes out and collects a great deal of data about a certain scientific study and it’s difficult to collect that data and it takes a long time and they get it on the internet as quickly as they can because they’re very public spirited and they want that information out there and then someone else comes along and says, oh, hey, that’s great, and incorporates that into something that becomes valuable in some way, either a medical advance or a technological advance. I’m wondering, is there anything – are there any sort of procedures in science that go back and basically name or – that original data gatherer as the source of this? Or is this brand new?
MALOY: There’s several cases where this has happened recently that are very, very tricky. In one case, some data that had been posted on the web but not yet published, was – someone else published it in a journal.
MALOY: That journal and the National Institutes of Health came back and said you have to retract this article, and they rebuked the people who published that data. The question is, is that enough? So you have to understand why this matters to the scientists. It matters to the scientists because for them to keep doing science, they need to get grants. And to get grants, you have to have these publications that Phil just mentioned and the publications are – it’s critical that you have new data in those publications. So it matters a lot to be the first one to release some critical data like this if we want to keep science going the way we’ve funded it so far.
CAVANAUGH: I understand. Well, you kind of answered the question that I was about to ask already but I’m going to ask it anyway. How has the way this scientist – how has the internet changed the way scientists conduct research? Apparently, for some scientists, they just go on the internet and they pluck out some research. I would imagine that doesn’t happen quite often but how, in general, has it changed the way scientists actually approach collecting data and collecting their research?
MALOY: Well, I think this is a really important question and there’s a group of people who really use computational data to build their research projects. Now Phil is one of the world experts on this so this is an area that Phil might comment on.
CAVANAUGH: Yes, please.
BOURNE: Well, you know, that’s certainly true. I’m in the field of what’s called bioinformatics, which essentially takes a lot of public data and interprets it for, hopefully, improvements in the life sciences, including work on diseases and so forth. And I think, you know, there’s a – the rule here is that we always attribute and there are actually licensing agreements that exist, particularly not so much around the data itself but around the publications associated with that data that allow different kinds of access to it but with attribution. So I think it’s always important to attribute the right people. Now in an online world where you have blogging – blog posts, for example, then that kind of, you know, it becomes more of a problem because, yes, you can – you essentially name someone but, you know, names are very common. There is not necessarily a unique identifier and that’s true of publications as well, I suppose. So there are schemes afoot where we will have a more unique identification so, essentially, all our scholarly output can be attributed to the appropriate – to our, you know, to those who produce it.
CAVANAUGH: And so if I’m understanding you correctly, let’s say a certain amount of research is being undertaken on a certain disease and you will go through the entire internet looking for various research projects, the latest information on that disease, and compile that information?
BOURNE: As – You would try and do that as much as possible but I think it’s important – an element of all of this that is, I think, important to this discussion is the amount of data in publication that’s coming along is just absolutely astronomical. We’re awash in data. We’re awash in publications. Just to give you an example, in the life sciences, there’s an aggregator called Pub Med and they’re currently adding 1.2 million scientific publications each year. You know, for us to keep up becomes very difficult. And so, in a sense, we need to filter and, you know, so we can’t look at everything. And, you know, I think the idea of having more accessibility sets the foundation to do better filtering because you’ve got more information and you need to filter so that we’re going to see – It’s not going to be a matter of just reading, it’s going to be using the computer to actually help us decide what to read.
CAVANAUGH: We’re talking about internet access to scientific research and data. We’re taking your calls at 1-888-895-5727. Doug is calling from La Jolla. Good morning, Doug, and welcome to These Days.
DOUG (Caller, La Jolla): Hi there.
DOUG: I have a point and it’s actually in the biopneumatics realm. I used to work at a gene hunter and one of the gene hunting firms and we did these large association studies with lots of family trees and occasionally we did (audio dropout) found issues with parentage where people didn’t actually know who their parents were but thought they did. And this was all very, very sensitive information and you’d have to be incredibly careful about releasing because, as you can imagine, there could be some very big legal ramifications if it was ever able to be traced back to individuals. Now, there was lots of attempts – You know, we only had bar codes, we never had names and stuff like that but it’s, you know, very, very sensitive and very questionable how you release this data out into the wild world of the web. And that’s my comment.
CAVANAUGH: How did you do that, Doug?
DOUG: Well, this was at the early ages and this was corporate so a lot of that stuff was not publicly funded and didn’t go out to the public for analysis but when we did do it, we released the – everything just as bar codes. We tried to make sure that there was no way except for the original scientists to be able to link that back to individual names.
CAVANAUGH: Thank you. Thank you for your call, Doug. That leads me, Professor Maloy, to the difference between public – publicly funded research and then corporate privately funded research. Do we have, since this is going to be an ethics forum, do we have, as the general public, any right to the information that is generated by private funds, do you think, ethically speaking?
MALOY: Well, I think ethics and reality here may be very tricky issues. In reality, data that’s acquired by private funding, private corporations, at this point in time, is their data, it’s not publicly available data. If that data could improve human health then most of us would say that ethically that’s data that should be released because by the more people who know about it, the faster we’ll be able to come up with some kind of therapy that will improve health. Now, from a corporate standpoint, though, you’ve invested money, you have to make money. They want to be the ones who make money off of that data and so, as a rule for many companies, that data is considered very privileged internal data.
CAVANAUGH: And, Professor Bourne, how do scientists share data that have been generated through private resources? Is there a great deal of sharing or is that really guarded like life and death?
BOURNE: No, I think there is – generally, that data becomes available when it really doesn’t have any value to the company anymore. And I think you have to understand on both sides—and this is actually true to some extent also in the public sector—that there has to be some kind of business model for which all this takes place. So, I mean, Stan nicely outlined what happens in industry. I mean, I think if you take a very simple-minded view of what happens in a public institution as essentially we’re generating data with federal funds, then that data is then used in part by the private sector and hopefully some useful product comes from it which generates state and federal tax revenue that essentially pays for the research and, you know, we go full circle. So I think we always have to acknowledge that we need that circle to operate.
CAVANAUGH: And is that model operating well today? Or does it need to be looked at?
BOURNE: I think it’s operating fairly well but as we say, there’s, you know, these rapid changes going on and, you know, I think the – certainly the technology and the access is ahead of the legislation and even within our own institutions, private or public, how we go about dealing with the situation.
CAVANAUGH: Right, Professor Maloy, is there any protocol as to when data should be released and why? Is there any sort of procedure that everyone shares?
MALOY: Well, I think that it depends on how the data’s acquired. For a number of types of information, for example, genomic information, the sequence information we spoke about earlier, there are federal guidelines that that data should be released immediately but there are also embargoes present. So the idea is you’ll release the data, people can think about it, but potentially they won’t scoop you and publish before you’ve had a chance to analyze that data. That’s probably the optimal situation. It allows everyone to think about that data, to work with the data, to develop new types of products while at the same time not destroying the benefits for the person who worked so hard to get that data.
CAVANAUGH: Exactly. I’m wondering, Professor Bourne, there are people who try to do what your organization does, especially if they’re challenged by some disease that they have or a family member has, they try to collect and compile all the information, all the research data that is available on that particular disease. I wonder, what are the downsides of doing that? What kind of expertise do you need in order to make a really good assessment of what you’re looking at?
BOURNE: Well, I think it comes down to the authenticity of the information that you’re gathering. And certainly, you know, I think without, you know, just by going to the scientific literature, for example, you know for the most part that that material is being peer reviewed so that a certain number of people have actually looked at it and believed it to be correct, have actually – and then it appears. So there is that kind of gatekeeping, I suppose, but we’re in an interesting time, again brought about by the internet. And now there’s this notion of the so-called out – crowd sourcing and wisdom of crowds where in the case of Wikipedia, for example, you know, many people, many eyeballs look at this information and a picture emerges. And there’s good data to show that some of that is, you know, is highly credible information. I think as scientists we use Wikipedia quite a lot. So, you know, but that doesn’t necessarily have the same kind of authenticity associated with it but it has some. So, you know, I think we need to grapple with these new models of how we review, you know, scientific information.
CAVANAUGH: And that’s exactly what you’re going to be doing at this ethics forum. As you continue this conversation, I wonder, I know that you think, Professor Bourne, that perhaps the journals, the scientific journals, are going to have a run for their money at least in a few years. What do you think might be some of the outcomes as scientific, Professor Maloy, as scientists really sort of churn over this idea that this – the internet access, this fast, public access to their research data is something that’s only going to continue?
MALOY: I think that there are a lot of changes associated with this. One of the changes is that a lot of science is supported by professional societies. So, for example, I’m associated with the American Society for Microbiology. They have a whole set of journals that have functioned on this previous model where you don’t have the open access, and that’s how they get most of their income. That’s how they support their members, that’s how they support their causes. So they need a new business model to be able to change. And they’re going to have to do that very quickly or they’re going to be left out and not be there for the scientists and the people they support. I think there’s one other thing that’s really important and it relates to what Phil just alluded to about the mass marketing of some of these ideas. There are times when people don’t want to accept the data even when the data’s clear. There are two recent cases that are really striking. One is the changes in the regulations for mammography, right. That’s a case where based upon data there was a recommendation for a change and that scared a lot of people who had been told one thing for a long time. Another example, and this is an example in strong support of the need for release of data, is the recent evidence that data saying there was a link between autism and vaccines was completely wrong. That had a profound affect on a lot of people’s lives.
CAVANAUGH: There’s so many aspects to this. I want to thank you both so much for coming in and speaking to us about it, and I want to let everyone know that the conversation will continue. The next Ethics Center Forum "Who Owns The Data?" is this Wednesday at 5:30 in the afternoon at the Reuben H. Fleet Science Center in Balboa Park. The event is free. It’s open to the public, and you can go to KPBS.org/thesedays for more information. Thank you both for being here.
BOURNE: Thank you.
MALOY: Thank you, Maureen.
CAVANAUGH: And if you’d like to post a comment, go online, KPBS.org/thesedays. Coming up, a unique education program that teaches U.S. law south of the border, that’s as These Days continues here on KPBS.
To view PDF documents, Download Acrobat Reader.