15 October 2009

Open Source Mathematics

This is incredibly important:

On 27 January 2009, one of us — Gowers — used his blog to announce an unusual experiment. The Polymath Project had a conventional scientific goal: to attack an unsolved problem in mathematics. But it also had the more ambitious goal of doing mathematical research in a new way. Inspired by open-source enterprises such as Linux and Wikipedia, it used blogs and a wiki to mediate a fully open collaboration. Anyone in the world could follow along and, if they wished, make a contribution. The blogs and wiki functioned as a collective short-term working memory, a conversational commons for the rapid-fire exchange and improvement of ideas.

The collaboration achieved far more than Gowers expected, and showcases what we think will be a powerful force in scientific discovery — the collaboration of many minds through the Internet.

You can read the details of what happened - and it's inspiring stuff - in the article. But as well as flagging up this important achievement, I wanted to point to some interesting points it makes:

The process raises questions about authorship: it is difficult to set a hard-and-fast bar for authorship without causing contention or discouraging participation. What credit should be given to contributors with just a single insightful contribution, or to a contributor who is prolific but not insightful? As a provisional solution, the project is signing papers with a group pseudonym, 'DHJ Polymath', and a link to the full working record. One advantage of Polymath-style collaborations is that because all contributions are out in the open, it is transparent what any given person contributed. If it is necessary to assess the achievements of a Polymath contributor, then this may be done primarily through letters of recommendation, as is done already in particle physics, where papers can have hundreds of authors.

The project also raises questions about preservation. The main working record of the Polymath Project is spread across two blogs and a wiki, leaving it vulnerable should any of those sites disappear. In 2007, the US Library of Congress implemented a programme to preserve blogs by people in the legal profession; a similar but broader programme is needed to preserve research blogs and wikis.

These two points are also relevant to free software and other open endeavours. So far, attribution hasn't really been a problem, since everyone who contributes is acknowledged - for example through the discussions around the code. Similarly, preservation is dealt with through the tools for source code management and the discussion lists. But there are crucial questions of long-term preservation - not least for historical purposes - which are not really being addressed, even by the longest-established open projects like GNU.

For example, when I wrote Rebel Code, I often found it hard to track down the original sources for early discussions. Some of them have probably gone for ever, which is tragic. Maybe more thought needs to be given - not least by central repositories and libraries - about how important intellectual moments that have been achieved collaboratively are preserved for posterity to look at and learn from.

Talking of which, the article quoted above has this to say on that subject:

The Polymath process could potentially be applied to even the biggest open problems, such as the million-dollar prize problems of the Clay Mathematics Institute in Cambridge, Massachusetts. Although the collaborative model might deter some people who hope to keep all the credit for themselves, others could see it as their best chance of being involved in the solution of a famous problem.

Outside mathematics, open-source approaches have only slowly been adopted by scientists. One area in which they are being used is synthetic biology. DNA for the design of living organisms is specified digitally and uploaded to an online repository such as the Massachusetts Institute of Technology Registry of Standard Biological Parts. Other groups may use those designs in their laboratories and, if they wish, contribute improved designs back to the registry. The registry contains more than 3,200 parts, deposited by more than 100 groups. Discoveries have led to many scientific papers, including a 2008 study showing that most parts are not primitive but rather build on simpler parts (J. Peccoud et al. PLoS ONE 3, e2671; 2008). Open-source biology and open-source mathematics thus both show how science can be done using a gradual aggregation of insights from people with diverse expertise.

Similar open-source techniques could be applied in fields such as theoretical physics and computer science, where the raw materials are informational and can be freely shared online. The application of open-source techniques to experimental work is more constrained, because control of experimental equipment is often difficult to share. But open sharing of experimental data does at least allow open data analysis. The widespread adoption of such open-source techniques will require significant cultural changes in science, as well as the development of new online tools. We believe that this will lead to the widespread use of mass collaboration in many fields of science, and that mass collaboration will extend the limits of human problem-solving ability.

What's exciting about this - aside from the prospect of openness spreading to all these other areas - is that there's a huge opportunity for the open source community to start, er, collaborating with the scientific one in producing these new kinds of tools that currently don't exist and are unlikely to be produced by conventional software houses (since spontaneously collaborative communities can't actually pay for anything). I can't wait.

Follow me @glynmoody on Twitter or identi.ca.

3 comments:

Anonymous said...

Is that Henry clay the economist? p

Glyn Moody said...

no, apparently it's Landon T. Clay.

Anonymous said...

We have the software to track contributions to the letter. Git and other DCVS do it.

A git-wiki, decetralized wiki could do what you ask. But it doesn't exist in a user-friendly way.

We need your help to bring that inti existence.