Showing posts with label peter murray-rust. Show all posts
Showing posts with label peter murray-rust. Show all posts

17 June 2009

The Doctor Who Model of Open Source

I often write of the way in which other domains are learning from open source and its successes. But that's not to say the traffic is all one way: increasingly, the other opens have much to *teach* open source, too.

For example, Peter Murray-Rust is one of the leading exponents of open data and open chemistry, notably through the Blue Obelisk group:


The Internet has brought together a group of chemists/programmers/informaticians who are driven by wanting to do things better, but are frustrated with the Closed systems that chemists currently have to work with. They share a belief in the concepts of Open Data, Open Standards and Open Source (ODOSOS) (but not necessarily Open Access). And they express this in code, data, algorithms, specifications, tutorials, demonstrations, articles and anything that helps get the message across.

Here's an interesting point he raised recently:

How do we sustain Open Source in a distributed world? We are facing this challenge with several of our chemical software creations/packages. People move, institutions change. Open Source does not, of itself, grow and flourish – it needs nurturing. Many packages require a lot of work before they are in a state to be usefully enhanced by the community - “throw it over the wall and it will flourish” does not work.

Many OS projects have clear governance and (at least implicitly) funded management. Examples are Apache, Eclipse, etc. Many others have the “BDFL” - Benevolent Dictator For Life with characters such as R[M]S, Linus, Guido Python, Larry Perl, etc. These command worldwide respect and they have income models which are similar to literary giants. These models don’t (yet?) work for chemistry.

Instead the Blue Obelisk community seems to have evolved a “Doctor Who” model. You’ll recall that every few years something fatal happens to the Doctor and you think he is going to die and there will never be another series. Then he regenerates. The new Doctor has a different personality, a different philosophy (though always on the side of good). It is never clear how long any Doctor will remain unregenerated or who will come after him. And this is a common theme in the Blue Obelisk.

The rest of the post fleshes out this analogy - well worth reading.

Follow me @glynmoody on Twitter or identi.ca.

17 May 2009

Openists of the World, Unite!

As I have observed recently (probably ad nauseam for some readers - apologies, but it needs saying), the openness that lies behind open source, open access and the rest feeds naturally into at least partial solutions for the political malaise affecting many countries, including, notably, the UK.

So it's great to see some of my fellow openists coming to the same conclusions:

I would not normally write about politics on this blog but Non-Brits may not have caught the raw anger of the UK electorate about the betrayal of trust by their elected representatives (members of Parliament). I believe that “web democracy” is now essential for modern government. By web democracy I mean the processes that so many of us have developed in our own work. I am not suggesting that conventional government is replaced by Web processes but that web processes should be used to supplement the process of government and be baked into that process. That is why Net Neutrality matters so much.

Heartening, too, that mainstream media are starting to join the dots, and are realising that the enemies of openness are precisely the ones with something to hide:

An investigation by The Sunday Telegraph has established that backers of a Bill two years ago which aimed to exempt Parliament from the full force of the Freedom of Information Act have benefited from thousands of pounds paid under the second home expenses system.

Openness, everywhere, now.

16 March 2009

Opening Minds about Closed Source

One of the most exciting experiences in blogging is when a post catches fire - metaphorically, of course. Often it happens when you least expect it, as is the case with my rant about Science Commons working with Microsoft, which was thrown off in a fit of pique, without any hope that anybody would pay much attention to it.

Fortunately, it *was* picked up by Bill Hooker, who somehow managed to agree and disagree with me in a long and thoughtful post. That formed a bridge for the idea into the scientific community, where Peter Murray-Rust begged to differ with its thesis.

Given all this healthy scepticism, I was delighted to find that Peter Sefton is not only on my side, but has strengthened my general point by fleshing it out with some details:

Looking at the example here and reading Pablo’s Blog I share Glyn Moody’s concern. They show a chunk of custom XML which gets embedded in a word document. This custom XML is an insidious trick in my opinion as it makes documents non-interoperable. As soon as you use custom XML via Word 2007 you are guaranteeing that information will be lost when you share documents with OpenOffice.org users and potentially users of earlier versions of Word.

He also makes some practical suggestions about how the open world can work with Microsoft:

In conclusion I offer this: I would consider getting our team working with Microsoft (actually I’m actively courting them as they are doing some good work in the eResearch space) but it would be on the basis that:

* The product (eg a document) of the code must be interoperable with open software. In our case this means Word must produce stuff that can be used in and round tripped with OpenOffice.org and with earlier versions, and Mac versions of Microsoft’s products. (This is not as simple as it could be when we have to deal with stuff like Sun refusing to implement import and preservation for data stored in Word fields as used by applications like EndNote.)

The NLM add-in is an odd one here, as on one level it does qualify in that it spits out XML, but the intent is to create Word-only authoring so that rules it out – not that we have been asked to work on that project other than to comment, I am merely using it as an example.

* The code must be open source and as portable as possible. Of course if it is interface code it will only work with Microsoft’s toll-access software but at least others can read the code and re-implement elsewhere. If it’s not interface code then it must be written in a portable language and/or framework.

Great stuff.

Update: Peter has written more on the subject.

14 March 2009

Why We Need Open Data

Despite the good-natured ding-dong he and I are currently engaged in on another matter, Peter Murray-Rust is without doubt one of the key individuals in the open world. He's pretty much the godfather of the term "open data", as he writes:

Open Data has come a long way in the last 2-3 years. In 2006 the term was rarely used - I badgered SPARC and they generously created a set up a mailing list. I also started a page on Wikipedia in 2006 so it’s 2-and-a-half years old.

The same post gives perhaps the best explanation of why open data is important; it's nominally about open data in science, but its points are valide elsewhere too:

* Science rests on data. Without complete data, science is flawed.

* Many of todays global challenges require scientific data. Climate, Health, Agriculture…

* Scientists are funded to do research and to make the results available to everyone. This includes the data. Funders expect this. So does the world.

* The means of dissemination of data are cheap and universal. There is no technical reason why all the data in all the chemistry research in the world should not be published into the cloud. It’s small compared with movies…

* Data needs cleaning, flitering, repurposing, re-using. The more people who have access to this, the better the data and the better the science.

Open data is still something of a Cinderella in the open world, but as Peter's comments make clear, that's likely to change as more people realise its centrality to the entire open endeavour.

24 January 2009

Seven things people didn't know about me...

...And probably didn't want to. Thanks to that nice Mr Mark Surman, I have been not only tagged but also subjected to fiendishly-clever emotional blackmail in the accompanying email:


I realize this is corny. But corny can be fun. This kind of fun is something I dare you to have.

The rules are:


Link to your original tagger(s) and list these rules in your post.


Share seven facts about yourself in the post.


Tag seven people at the end of your post by leaving their names and the links to their blogs.


Let them know they’ve been tagged.

Sigh. So, here goes:

1. As I child, I kept frog spawn (still abundant in those far-off days), fascinated by the extraordinary metamorphosis it underwent. Once, among the many froglets that emerged, one had six legs, and two had five (all extra forelimbs.)

2. At primary school, I was one of the ugly sisters in “Cinderella”. I still remember the rather fetching pink and lime-green dress that I wore.

3. I spent most of my free time at secondary school playing bridge. Unfortunately, I used the Blue Club system, which, according to Wikipedia, is no longer popular, making it even more of an utter waste of time.

4. I was Senior Wrangler in the 1977 Tripos. Barely anyone knows what that means; even fewer care. 100 years ago, it would have guaranteed me a pampered college fellowship for life. I regard it as lucky escape.

5. My first post-university job was as a maths supply teacher for 30+ 15-year-olds in Catford, South London, most of whom were larger than me, but rather less interested in mathematics than I was. I lasted two months before being escaping to publishing.

6. I was taken off a train at near-gunpoint in Belarus for travelling without a transit visa. At 5 o'clock in the morning. I then had to rush to the immigration office attached to the Grodno border station and get a visa before the waiting train left for Vilnius with all my luggage on board.

7. I am powerless in the presence of honey-roasted cashews. An interesting case of where traditional mathematics breaks down, and 1+1=3.

The rules say I must now pass on this poisoned chalice to others, but unlike Mark I won't add any pressure: please feel free to ignore if you wish, or have already been tagged – I did search, but happily Google is not yet omniscient.

The names below are all key people in the UK world of openness in various ways, and I think it would be interesting to find out more about them. They are (in alphabetical order):

OpenStreetMap's Steve Coast

Open data defender Peter Murray-Rust

Alfresco's John Newton

Sun's Simon Phipps

BT's JP Rangaswami

Boycott Novell's Roy Schestowitz

Open government enthusiast Tom Steinberg

11 May 2008

How Microsoft Uses Open Against Open

To my shame, Peter Murray-Rust put up a reply to my post below in just a few hours, where it had taken me days to answer his original posting. So with this reply to his reply, I'm trying to do better.

Peter includes this disclaimer:

Before diving in I should get a potential conflict of interest out of the way. We are about to receive funding from Microsoft (for the OREChem project (see post on Chemistry Repositories). This does not buy an artificial silence on commenting on Microsoft’s practice, any more than if I accept a grant from JISC or EPSRC I will refrain from speaking my mind. Nor do I have to love their products. I currently hate Vista. However I need an MS OS on my machine because it makes it easier to use tools such as LiveMeeting (a system for sharing desktops). I’ve used LiveMeeting once and I liked it. OK, Joe did the driving because he knows his way round better than me, but I can learn it. Not everything MS does is bad and not everything it does is good.

Now, I have not the slightest doubt about Peter's future independence, but I do think it's an interesting comment.

It shows that even such a key defender of openness as Peter finds he "needs an MS OS on my machine because it makes it easier to use tools such as LiveMeeting (a system for sharing desktops)". I presume that Microsoft's money comes without strings, but inevitably its availability will make buying its own software easier. Where a cash-strapped project would cast an interested eye over free alternatives, and be willing to pay the price of grappling with new software, those with enough funding - from Microsoft or elsewhere - may well just opt for the familiar.

This is doubtless happening all over the place in science, which means that many simply forget that there are alternatives to Microsoft's products. Instead - quite understandably - they concentrate on the science. But what this implies is that however open that science may be, however much it pushes forward open access and open data, say, its roots are likely remain in the arid soil of closed source, and that Microsoft's money has the effect of co-opting supporters of these other kinds of openness in its own battle against the foundational openness of free software.

A Word in Your Ear

A little while back I gave Peter Murray-Rust a hard time for daring to suggest that OOXML might be acceptable for archiving purposes.

Here's his response to that lambasting:


My point is that - at present - we have few alternatives. Authors use Word or LaTeX. We can try to change them - and Peter Sefton (and we) are trying to do this with the ICE system. But realistically we aren’t going to change them any time soon.

My point was that if the authors deposit Word we can do something with it which we cannot do anything with PDF. It may be horrible, but it’s less horrible than PDF. And it exists.

There are two issues here. The second concerns translators between OOXML and ODF. Although in theory that's a good solution, in practice, it's not, because the translators don't work very well. They are essentially a Microsoft fig-leaf so that it can claim using OOXML isn't a barrier to exporting it elsewhere. They probably won't ever work very well because of the proprietary nature of the OOXML format: there's just too much gunk in there ever to convert it cleanly to anything.

The larger question is what needs to be done to convince scientists and others to adopt ODF - or least in a format that can be converted to ODF. I don't have any easy answers. The best thing, obviously, would be for people to start using OpenOffice.org or similar: is that really too much to ask? After all, the thing's free, it's easy to use - what's not to like?

Perhaps we need some concerted campaign within universities to give out free copies of OOo/run short hands-on courses so that people can see this for themselves. Maybe the central problem is that the university world (outside computing, at least) is too addicted to its daily fixes of Windows and Office.

03 May 2008

OOXML? For Pete's Sake, No

Peter Murray-Rust is one of the key figures in the world of open data and open science, and deserves a lot of the credit for making these issues more visible. Here's an interesting post in which he points out that PDF files are not ideal from an archiving viewpoint:


I should make it clear that I am not religiously opposed to PDF, just to the present incarnation of PDF and the mindset that it engenders in publishers, repositarians, and readers. (Authors generally do not use PDF).

He then discusses in detail what the problems are and what solutions might be. Then he drops this clanger:

I’m not asking for XML. I’m asking for either XHTML or Word (or OOXML)

Word? OOXML??? Come on, Peter, you want open formats and you're willing to accept one of the most botched "standards" around, knocked up for purely political reasons, that includes gobs of proprietary elements and is probably impossible for anyone other than Microsoft to implement? *That's* open? I don't think so....

XHTML by all means, and if you want a document format the clear choice is ODF - a tight and widely-implemented standard. Anything but OOXML.

09 March 2008

The World's Leading Anti-Scientific Society

Science is a pradigmatically open endeavour. It proceeds by sharing knowledge freely, allowing others to build on your work. If any domain should display openness in depth, it is science. That seems to have escaped the notice of the American Chemical Society, which pompously declares itself "the world's leading scientific society", as Peter Murray-Rust explains:

CAS identifiers have come to be accepted as a primary identifier system for chemistry - thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void - it cannot be worked out like an InChI. InChI and CAS serve different purposes - CAS can be related to any substance including mixtures of molecules such as kerosene - InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.

CAS numbers are copyright CAS/ACS who have the legal right to regulate their use - as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS - about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).

An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.

I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.

Fortunately, Peter points out that there is a solution:

The use of CAS numbers has been abandoned by organisations such as PubChem for exactly this reason. PubChem now has nearly 20 million substances. It holds records for all compounds that are likely to occur on MSDS. It’s highly respected (although ACS lobbied the US government to limit Pubchem’s activities). It is part of the NIH and now - with the NIH mandate - effectively safe from the ACS. It provides a credible alternative.

We (including Wikipedia) should now switch from using CAS numbers to using PubChem IDs wherever possible. It won’t be a simple transition - certainly we shan’t find 100% overlap. But it will solve all the common substances and therefore 90%+ use of CAS numbers.

We shall need software. We and others are now developing the next generation of chemical informatics software using RDF (Resource Description Framework). RDF allows the description of ambiguities and ontologies. This will allow chemical information to be gleaned directly from authoritative sources using robots. (Of course some of the authorities are currently conservative and do not allow access to their material because of restrictive copyright and licences, but that is starting to change, even in chemistry). As information becomes more open, the CAS system will be increasingly isolated in a world of chemical commerce.

Clearly, it's time to kill off this pernicious closed CAS system, which is damaging science, by boycotting it entirely. And while we're at it, I suggest we might as well get rid of the world's leading *anti*-scientific society too. (Via Open Access News.)

Update: There seems to be some movement as far as using CAS numbers on Wikipedia, but I can't tell whether that's just a one-off, highly limited solution, or part of a larger move to make ACS knowledge freely available to all such open projects. We shall see.

03 December 2007

A Question of Open Chemistry

I've written about open science and open notebook science before, but here's an excellent round-up of open chemistry:

The next generation of professional chemists are far more likely to be in tune with web-based chemistry, treating blogs and social networking sites as professional tools in the same manner as email. For Open Chemistry advocates, the inevitable passage of time may be enough to usher in their revolution.

(Via Open Access News.)

23 November 2007

Openness: Purity of Essence

I wrote a piece for Linux Journal recently warning that Microsoft was beginning to hijack the meaning of the phrase "open source". But the problem is much bigger than this: the other opens face similar pressures, as Peter Murray-Rust notes.

In some ways it's even more serious for fledgling movements like open access and open data: there, the real meaning has barely been established, and so defending it is harder than for open source, which has had a well-defined definition for some time. Given the importance of labels, this is a matter that needs to be addressed with some urgency before "open access" and "open data" become little more than bland marketing terms.

14 November 2007

Unlocking the Value of Open Innovation

It's a truism that there are more clever people out there than in here, wherever "here" may be. So it makes sense to try to tap into that cleverness - which is precisely what open source and cognate movements attempt to do. Now it looks like business is slowly getting the hang of this:

Barrick’s Unlock the Value program is a unique opportunity for scientific problem solvers. We invite proposals for an economically viable way to recover silver from silica-encapsulated ore. For proposals judged to have merit, Barrick will:

* Fund your research
* Pay you a consulting fee
* Provide resources and expertise
* Help you develop and test your idea

For a method or technology that is successfully implemented, Barrick will pay a performance bonus of $10,000,000.

(Via Peter Murray-Rust.)

13 October 2007

FROG Hops into the Open Source Commons

FROG - FRee Online druG 3D conformation generator - is not a program I was aware of, but it sounds pretty cool:

Frog is an on-line service aimed at generating 3D conformations for drug-like compounds starting from their 1D or 2D descriptions. Given the atomic constitution of the molecules and connectivity information, Frog can identify the different unambiguous isomers corresponding to each compound, and generate single or multiple low-to-medium energy 3D conformations, using an assembly process that does not presently consider ring flexibility. Tests show that Frog is able to generate bioactive conformations close to those observed in crystallographic complexes.

Cooler still, its code is being released under the GPL:

On behalf of the OpenBabel project, I am pleased to announce that Dr. Bruno Villoutreix (INSERM, University of Paris 5) and Dr. Pierre Tufféry (INSERM, University of Paris 7) have generously donated their code to OpenBabel. This code will be incorporated into OpenBabel under the GPL in the coming months, making fast and accurate SMILES-to-3D conformer generation available to the open source community for the first time.

The open source commons just got richer. (Via Peter Murray-Rust.)

04 September 2007

The Right to Roam and the Right to Read

Peter Murray-Rust has been coming out with some cracking posts recently. First, there was the charming story of OUP demanding that he pay $48 to use his own paper, whose copyright he holds, and which is CC-licensed, for teaching purposes.

Now he has a wonderful post contrasting the legally-enshrined right of public access to the wilderness to the lack of a right of public access to academic papers.

18 July 2007

More Parallel Universes

Some while back I wrote a piece called "Parallel Universes" looking at the surprising similarities between the world of open source and open access. So I was interested to see that there's trouble 't mill over the use and misuse of the term "open access":

I don't know and I don't care what [Nature editor] Maxine means by "open" or "free". I care what the BBB [Budapest-Bethesda-Berlin] Declarations mean. Peter is not defining terms however he likes; he is working with published, widely accepted definitions. He is well within his rights to expect that other people will indeed use the same definitions: that is, after all, the point of having developed and published them. Nature does NOT have "many open access projects and products", it has one (barely) OA journal and the excellent Precedings, together with a number of commendable free-to-read initiatives (blogs, Nature Network, the various free-to-read web special collections, etc). "Open Access" is not a fuzzy buzzword that Maxine is free to define as she sees fit, and if she is going to start abusing it as marketing for Nature then she most certainly does need telling off.

Which is all rather similar to a discussion taking place in the computer world about who has the right to call themselves "open source".

12 September 2006

Open Data: Past, Present and Future

Peter Murray-Rust has an interesting post on the concept of open data, its (short) history and its present status, with some good links. As he notes:

There seem to be several related threads:

* scientific data deemed to belong to the commons (e.g. the human genome)
* infrastructural data essential for scientific endeavour (e.g. GIS)
* data published in scientific articles which are factual and therefore not copyrightable
* data as opposed to software and therefore not covered by OS licenses and potentially capable of being misappropriated. (this is a very general idea)

He points out that "the current usages are sufficiently close that we should try to bring them together", a move that would help open data's future greatly.