McGill experts open up on open science

"The digital revolution is upon us. Get on this train or be under it."

On June 12, as part of the International Economic Forum of the Americas conference in Montreal, a panel of McGill experts participated in a round-table discussion on open science, the no-barrier approach to scientific research that allows research data and materials to move freely from one research team to another, between disciplines.

l to r: Martha Crago, Doina Precup and Alan Evans at the Open Science round table.

What are the challenges facing the open science movement? How can machine learning experts work alongside scientists to foster concrete results for health care? What are the perspectives for this unprecedented interdisciplinary collaboration? These were some of the questions tackled by the panel.

Participating in the discussion were Guy Rouleau, Director, Montreal Neurological Institute and Hospital; Doina Precup, Associate Professor, School of Computer Science; and Alan Evans, James McGill Professor of Neurology and Neurosurgery, Psychiatry and Biomedical Engineering. The event was moderated by Martha Crago, Vice-Principal, Research and Innovation.

This a partial transcript of the panel discussion and the Q&A session with the audience that followed. Some of the questions and answers have been edited for brevity.

Martha Crago: What about the threat of being scooped? We’ve heard examples of this, someone publishes some data and someone else publishes even more papers based on that data.

Alan Evans: I think it’s more of a mythology about it. People get concerned that someone is going to take their data and run away with it.

We have changed from the laboratory environment in which you publish one paper a year and you sit very protectively on it, to an environment where lots of people get involved and analyze the data and more science comes out of that. It is good for society.

The question being asked is, is it good for the individual?

I would argue that if you publish one paper a year, you get a certain amount of recognition. But if 100 people use your data and refer back to you on a continuing basis, your profile goes through the roof. That’s true at all levels, whether you are a graduate student or a senior scientist. It’s great for your career if everybody uses your data.

Martha Crago to Doina Precup: You’re in computer science. How does data sharing and Intellectual Property work for computer science?

Doina Precup: Computer science, specifically machine learning, is a very open environment. We need data in order to build and test. The more open datasets there are out there, the more the research community can develop and perfect our methodology.

It’s very much the credo of the community that the data should be open, the code should be open – so the algorithms are available for everybody to try out. The papers should be open as well, and, in fact, this model of open publication is what we’ve had in the community for the past 20 years.

Guy Rouleau introduces the panel.

Martha Crago to Guy Rouleau: You are a member of a university department at the same time you are leading the Montreal Neurological institute. What will open science do for our criteria for tenure and promotion?

Guy Rouleau: One of the biggest barriers to adopting open science, is the methods for promotion. It’s a real and important issue.

We are working with the University of Virginia and with the Chan Zukerburg Foundation to organize a series of workshops to revisit this and see if there are ways of measuring people for work that is done in the open.

For example, if you publish a paper and your data leads to 200 papers, you’re the key to that, you’re at the heart of that. We should take this into account when doing promotions.

Martha Crago to Doina Precup: How does it work in your world? You got promoted. How do you do that in a world which is very open about its publications, its information and its codes.

Doina Precup: Computer Science does take this into account for its tenure and promotion committees. For example, if someone releases a piece of code… we report how many people have downloaded that piece of code. We also sometimes report evidence on other papers that have been published that utilize that piece of code.

The community does recognize that that is a valuable contribution. In some of the subfields of computer science – software engineering, for example – this is considered just as valuable, if not more valuable, than publishing one paper. People really embrace that as a way of contributing to the community.

Martha Crago: [In the medical setting] how do you think we will protect the individual, the patient, in research using this sort of data?

Alan Evans: It’s a very interesting question. There are lots of technical advances in double- and triple-encryption which can allow you to hide the information about an individual.

Obviously, there is always the potential for data to be hacked. It’s important that you de-identify and anonymize the data before it gets put out into the public sphere so that people can’t get back to the personal identifying information.

Our experience is that, in general, patients are very happy to see their data being used for more than just the narrow question that it was originally collected for. Of course, it is good for them and it’s good for society if more information can be gained from that data.

Martha Crago: Does Canada have any particular advantages in this global open science?

Guy Rouleau: The Canadian government is adopting an open government position. They are opening up so that data from the Canadian government will be available.

We have the Structural Genomics Consortium, based in Toronto, which is arguably the world leader as a project in open science. It involves six countries with a budget over $300 million.

And what we’ve done at the Neuro [with the Tanenbaum Open Science Institute], again, we’re the first ones.

There is a lot of momentum in Canada with open science and I think it fits with the way a lot of Canadians think.

Alan Evans: A resounding yes. Canada is large enough to have an impact on the world, but we’re small enough to be organizable. In Canada in general, and Quebec in particular, there is a tremendous collegiality. We share more easily. We are used to working together.

Yes, we have our squabbles but there is a fundamental decency in Canada that encourages people to work together.

You are probably aware that Montreal is one of the capitals of the world of Artificial Intelligence, well suffice it to say that we are sitting on a gold mine, scientifically, of AI and the brain. The next 10-20 years are going to be so exciting for Canada, Quebec and Montreal in particular. The open science premise allows us to promulgate those principals globally.

I’ve been at McGill for 35 years and I’ve never been so excited about the future.

Martha Crago to Doina Precup: You are part of this AI scene in Montreal. What do you think AI brings to the world of neuroscience?

Doina Precup: I think AI in Montreal is very much driven by the desire for us to develop algorithms that aren’t just interesting but that make a positive social impact in many ways. From this point of view, medical applications are really an excellent match.

The fact that there are now open datasets that are available is wonderful because our students have access to that data, we can use it in classes to teach different methodologies.

Question from the audience: Have you ever been worried that in the future some other countries might use your data, now available to the public, to spare their own R&D costs?

Doina Precup: I grew up in Romania, a communist country. We did not have the resources. It cost too much to gather data.

There is tremendous opportunity for people living in developing countries to participate actively in the research process and in the start-up development process with open data. There are tremendously bright people in these countries, well-educated people, who can really benefit.

If they develop something, wonderful. We all benefit.

Alan Evans: Those people who take our data, most of them will be putting results back into the public domain that will benefit us. We all win.

Yes, there will be people who will abuse the system. There will always be people who are going to try to end run it. But, overall, there will be a lot of exchange and that is a good thing.

There is a lot of public data out there, not all from institutions, but individual scientists and scientific groups, who have put their data out into the public domain already. And we have been the beneficiaries of that for many years.

A young scientist at the Neuro, Yasser Iturria-Medina, took that data, a lot of Alzheimer’s Disease data, and he analyzed it in some very creative ways. He came up with a completely new insight into the underpinnings of the disease. He’s actually taken it one step further and, just last week, published a paper on fingerprinting individual profiles that will allow us to customize an intervention for each individual. That’s all coming from publicly data that we never collected.

On the other hand, we’ve been contributors to open science. Over many years, we collected a particular dataset called Big Brain, which is a very large digital dataset of the human brain. We put it in the public domain and 25,000 groups have downloaded that data around the world. We have no idea what most of them are doing with that data and that’s OK because that data will help generate many, many publications that will be good for society.

Guy Rouleau: When I go around to other institutions, most people think I’m crazy to do this. It is kind of heretical to the way we were raised. However, if you look at young people now, they are already all onboard. They believe in this. They say “sharing and propagating information, this is great.”

Question from the audience: Are you trying to change the corporate world?

Guy Rouleau:  I don’t think we’re changing the corporate world, but we are adopting things that will have some effect on the corporate world.

This is basically the consequence of the internet and of being able to deal with large data sets and be able to share. What we want to do now would have been impossible 10 years ago. We’re just one manifestation of the changing world due to the increasing interconnectivity of people.

We’re not changing, but tweaking, the corporate world.

Question from the audience: In parallel to open science there are also open scientific contests, that are often won by people outside academia. Where will we see a merging of the world of open science and the world of science driven by incentives?

Alan Evans: The practice of science is changing before our eyes.

The data is out there. Who would have thought, as an example, that Wikipedia would have succeeded? If I told you that we were going to build a digital encyclopedia that anybody can write it, you would have thought it was a recipe for chaos.

But there are all kinds of checks and balances from the community and good stuff emerges.

The same kind of principals attend open science. All the data will be out there in one form or another. Lots of people will analyze it. Some people will do bad science but the community will correct them. It will bootstrap its way along.

It’s no longer the monolithic paper. Somebody puts a paper out there and it takes so much time for somebody else to collect another dataset and either agree with or refute that data – that takes years.

Now you have a situation in which hundreds of people can analyze that data – and the preponderance is the good stuff will rise to the top, like Wikipedia.

We’re not selling this, we’re just trying to hang on by our fingertips to something that is happening anyway. The digital revolution is upon us. Get on this train or be under it.