03 June 2009

Why Chemical Software Will be Open Source

Here's an important post from Mr Open Chemistry, Peter Murray-Rust:


“Chemical software will be Open Source”

This statement expresses both a simple truth (Simple Future, see WP) and an aspiration (Coloured Future – Software shall be free). The latter is what I have been advocating on this blog – the moral, pragmatic, utilitarian value of Open Source. The former simply states that it will happen. IOW a betting person could lay a wager.

The heart of Peter's argument is this:

there is a particular aspect to “Chemoinformatics” - the software that supports the management of chemical compounds, reactions and their measured and computed properties:

There have been no new developments in the last decade

What I mean by this is that there have been no new algorithms or information management strategy to have come out of commercial chemoinformatics manufacturers. Chemical search, heuristic properties and fingerprints, molecule docking are “solved” problems. And advance comes from packaging, integration and parameter_tweaking/machine_learning. Only the last adds to science and since the commercial manufacturers are secretive then we can’t measure this (and I believe this to be mainly pseudoscience in its practice – you can make extravagant plans without independent assessment). So the advances from the manufacturers have been engineering – ease of use, deployability, interoperation with third-party software – but not functionality.

So the Open Source community – the Blue Obelisk – is catching up. I believe that OSCAR is already the best chemical language processing tool, that OPSIN will soon be as good as any commercial name2structure parser and that OSRA will do the same for chemical images.

What this essentially means is that chemoinformatics has become commoditised; and as history has shown us time and again, once that happens, the advantages of open source in terms of aggregated, distributed development kick in. It is proprietary software that does not scale - ironically, given the prevailing wisdom to the contrary - and which therefore always falls behind open source projects once a particular domain has matured.

This is not to say that free software never innovates, as I've discussed elsewhere; simply that in new sectors open source's advantages are less clear than they are in mature ones. Peter's point is that chemoinformatics in particular is ripe for open source to produce better versions of existing tools; and the implication is that as successive areas of science software become similarly mature, so free software offerings will move in and ultimately take over.

Follow me @glynmoody on Twitter or identi.ca.

No comments: