Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

03 April 2008

British Library = National Disgrace

I've noted before that there's something rotten at the heart of the British Library, which insists on locking down knowledge in Microsoft's proprietary formats. Now NoOOXML starts to pull all the threads together:


the company Griffin Brown, of which the BRM convenor Alex Brown is the director, sent out a press release 13 March 08 celebrating the 10th anniversary of XML:

Recent moves by Microsoft to standardise its Office products around XML file formats merely confirms that most valuable business data in the future will be stored in XML. … Alex Brown is convenor of the ISO/IEC DIS 29500 Ballot Resolution Process, and has recently been elected to the panel to advise the British Library on how to handle digital submission of journal articles.

What's the betting those digital submissions end up in OOXML?
(Via Boycott Novell.)

11 February 2008

XML People: Tim B on TimBL

Here's a rather wonderful document by Tim Bray, one of the key people in the XML world, and someone who evidently knows everyone else there:


XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It’s really long.

It's also really good for its witty pen portraits of XML notables. Here's a sample: Tim B on TimBL:

TimBL is thin, pale, and twitchy, a well-bred British baby-boomer who circumlocutes and temporizes and gets to the point slowly. Englishly, he deplores confrontation and can find a way to paint any blood-feud in the colours of unfortunate misunderstanding. His publications suggest strong idealism, an overriding vision of the future of information space. His detractors say he’s a good second-rate programmer who was at the right place at the right time and got lucky. The McArthur foundation says he’s a genius. I can’t figure out what he’s getting at half the time, or why he does things, but I’ve known a couple of real geniuses and that’s not necessarily a symptom.

However, I take exception to that idea of TimBL being "a good second-rate programmer who was at the right place at the right time and got lucky." Not so much because it's insulting Sir Tim, but because I think it misses the point entirely. Like RMS's, TimBL's greatest contribution is not actually technical: it is ethical.

Had he not put his code into the public domain - after briefly flirting with the idea of licensing it under the GNU GPL - the Web would not have become the greatest invention of the late 20th and early 21st centuries. It is for his inspired altruism that we salute Sir Tim - not for anything so trivial as a markup language.

02 January 2008

Remembrance of Things Past

One of the key issues in the battle between ODF and OOXML is access to documents over long timeframes. It's not just a matter of which format is better now, but which will be better in a hundred years time (assuming all the computers haven't melted by then).

Against that background, the following is interesting:


After you install Office 2003 SP3, some Microsoft Office Excel 2003, Microsoft Office PowerPoint 2003, Microsoft Office Word 2003, and Corel Draw (.cdr) file formats are blocked. By default, these file formats are blocked because they are less secure. They may pose a risk to you.

Leaving aside the fact that Microsoft is trying to protect you from its own earlier formats, there's an important issue here. Most people will blithely apply this and other Service Packs, trusting in the great god Bill to do the right thing. And then one day, they will need to access some old - but crucially important - file saved in the earlier format. All the previous versions of Microsoft Office may well have been discarded: then what?

Well, you could always edit the registry, bearing in mind:

Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall the operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.

Important These steps may increase your security risk. These steps may also make the computer or the network more vulnerable to attack by malicious users or by malicious software such as viruses. We recommend the process that this article describes to enable programs to operate as they are designed to or to implement specific program capabilities. Before you make these changes, we recommend that you evaluate the risks that are associated with implementing this process in your particular environment. If you decide to implement this process, take any appropriate additional steps to help protect the system. We recommend that you use this process only if you really require this process.

Er, maybe not.

There's no reason to suppose that things will be any different for OOXML, which may - who knows? - turn out to be just as dangerous as those risky old Office formats. And so there you will be, with an XML file legible only in part, with an admixture of effectively random 1s and 0s, a vague memory of its original form and contents, and a deep sadness in your heart.

08 October 2007

ODF - Oh My Word

In the red corner:


So what about the OpenDocument Foundation? We fall into the middle area of trying to perfect the conversion to XML regardless of the fact that our two groups have the world caught between a rock and a hard place.

And in the blue corner:

The OpenDocument Foundation seems to try to clothe themselves in the mantle of the open source community and pontificate on how the big bad vendors treat interoperability. But are they speaking as a non-profit or as a vendor? Take their DaVinci plugin, for example. Where is the source code? Why isn't this open source? Are we to follow the Foundation's claim of 100% interoperability, based on blind faith, without seeing some proof in the form of working code? I've been working on document conversions and document file formats of one kind or another for almost 20 years. I've never seen 100% fidelity conversions of anything but trivial formats. Extraordinary claims require extraordinary evidence. But we have nothing here, just white papers from two guys without a garage.

Ouch. Who would have thought document standards could be such fun?

09 August 2007

Welcome Back, HTML

Younger readers of this blog probably don't remember the golden cyber-age known as Dotcom 1.0, but one of its characteristics was the constant upgrading of the basic HTML specification. And then, in 1999, at HTML4, it stopped, as everyone got excited about XML (remember XML?).

It's been a long time coming, but at last we have HTML5, AKA Web Applications 1.0. Here's a good intro to the subject:

Development of Hypertext Markup Language (HTML) stopped in 1999 with HTML 4. The World Wide Web Consortium (W3C) focused its efforts on changing the underlying syntax of HTML from Standard Generalized Markup Language (SGML) to Extensible Markup Language (XML), as well as completely new markup languages like Scalable Vector Graphics (SVG), XForms, and MathML. Browser vendors focused on browser features like tabs and Rich Site Summary (RSS) readers. Web designers started learning Cascading Style Sheets (CSS) and the JavaScript™ language to build their own applications on top of the existing frameworks using Asynchronous JavaScript + XML (Ajax). But HTML itself grew hardly at all in the next eight years.

Recently, the beast came back to life. Three major browser vendors—Apple, Opera, and the Mozilla Foundation—came together as the Web Hypertext Application Technology Working Group (WhatWG) to develop an updated and upgraded version of classic HTML. More recently, the W3C took note of these developments and started its own next-generation HTML effort with many of the same members. Eventually, the two efforts will likely be merged. Although many details remain to be argued over, the outlines of the next version of HTML are becoming clear.

This new version of HTML—usually called HTML 5, although it also goes under the name Web Applications 1.0—would be instantly recognizable to a Web designer frozen in ice in 1999 and thawed today.

Welcome back, HTML, we've missed you.

24 April 2007

Ars Nova: The Art of Misrepresentation

A nice summary by Rob Weir of Microsoft's increasingly desperate campaign to undermine ODF in every way possible through artful and persistent misrepresentation of the facts. It begins with a real killer opening par:

Tim Anderson has an interesting article up on his ITWriting blog, "Microsoft’s Jean Paoli on the XML document debate". Of course, I treat anything Jean Paoli says on XML with such attention as I usually reserve for listening to the isorhythmic motets of Philippe de Vitry. Like de Vitry, Paoli can be understood on several different levels: What is he saying? And what is he really saying.

16 March 2007

But Is It Cricket?

This raises some interesting issues about what exactly copyright covers:

A cricketing website has found what it hopes is an inventive way to bypass copyright laws to show users action from the Cricket World Cup.

Despite the fact that Sky Television has the exclusive rights to broadcast the live action from the West Indies, Cricinfo.com is using computer animation to provide ball-by-ball coverage to non-Sky viewers.

...

Wisden said it had carefully consulted lawyers before going ahead with the simulations in this week's World Cup. "Cricinfo 3D is based on public domain information gathered by our scorers who record a number of factors such as where the ball pitched, the type of shot played and where the ball goes in the field," said a Wisden statement. "That data is then fed as an xml to anyone who has Cricinfo 3D running on their desktops and the software generates an animation based on this data."

The issues is whether the information about the match is in the public domain, and can thus be fed into a simulation, or whether the rights that Sky has bought cover that information in some way.

I'd say not, because you generally can't copyright (or patent) pure information: for intellectual monopolies to be granted, you need to go beyond the facts to add artistic expression in the case of copyright, or non-obvious inventive steps in the case of patents. Cricinfo 3D seems to be a new artistic interpretation of pure data, independent of Sky's own "artistic" images of the game (i.e., the camera shots they take).

Not that intellectual monopolies are known for their strict adherence to the laws of logic....

08 February 2007

Pipe Dream: Re-wiring the Net

The online world is awash with XML feeds. The great thing about XML is that you can grab it and do stuff with it very easily, because it's basically a structured text file. For example, you can feed one XML stream into another, combine them, and keep on piping them around. A bit like Unix pipes.

Hey, now that's an idea:

Pipes is a hosted service that lets you remix feeds and create new data mashups in a visual programming environment. The name of the service pays tribute to Unix pipes, which let programmers do astonishingly clever things by making it easy to chain simple utilities together on the command line.

What's particularly cool about this new service is the graphical approach, which looks a lot like programming flowcharts. The currently-available pipes are rather limited at the moment - this is still very new - but it's not hard to imagine some very rich stuff coming out of this. Bravo Yahoo. (Via GigaOM.)

30 January 2007

Behind and Beyond Halloween

The publication of the first Halloween memo in 1998 was a pivotal moment in the history of free software. For the first time, it was clear that internally Microsoft was worried by this new threat, despite its outward-facing bravado and rhetoric.

Of course, there was no confirmation from the company that the memo was genuine, so there was always a theoretical possibility that they were faked in some way, although the internal evidence seemed overwhelming. But now, Groklaw reports, we have official proof of their genuine nature. The posting also offers an interesting meditation on how all this feeds into Microsoft's current attempts to "go legit" with the ECMA standardisation of its Office XML formats.

28 December 2006

From ODF to UOF and Back Again

Since both of the ODF and UOF office formats are based on XML, it isn't (theoretically) hard to move between them. Nonetheless, it's good to know that someone has actually put together code to do exactly that:

Peking University recently released a program to convert office documents between OpenDocument Format and the Specification for the Chinese office file format based on XML (UOF for short). Both standards are XML office document standards, UOF being a "National Standard of the People's Republic of China". The converter, which took nearly a year to complete, enables users to convert text, spreadsheet and presentation documents between ODF and UOF.

22 December 2006

XXX for XML on its Xth Birthday

Back in the good old Web 1.0 days, XML was really hot. Here's a useful reminder that (a) XML is 10 years old (gosh, doesn't time fly when you're having fun?) and (b) it's still hot.

Last month marked ten years since the World Wide Web Consortium (W3C) Standard Generalized Markup Language (SGML) on the Web Editorial Review Board publicly unveiled the first draft of Extensible Markup Language (XML) 1.0 at the SGML 96 conference. In November 1996, in the same hotel, Tim Bray threw the printed 27-page XML spec into the audience from the stage, from whence it fluttered lightly down; then, he said, "If that had been the SGML spec, it would have taken out the first three rows." The point was made. Although SGML remains in production to this day, as a couple of sessions reminded attendees, the markup community rapidly moved on to XML and never looked back.

Two areas stand out in this report on the conference: XQuery and Darwin Information Typing Architecture (DITA). Here's to the next X.

12 September 2006

Greetings, OpenDocument XML.org

OASIS may not be the grooviest organisation, but it's certainly helped ODF achieve respectability remarkably quickly. Now it's set up something called OpenDocument XML.org:

This is a community-driven site, and the public is encouraged to contribute content. Use this site to:

* Learn. Knowledge Base pages provide reliable background information on OpenDocument.
* Share. OpenDocument Today serves as a community bulletin board and directory where readers post news, ideas, opinions, and recommendations.
* Collaborate. Wiki pages let users work with others online and add new pages to the site.

14 July 2006

Some Microsofties See the OSS Light

I don't know whether this is big enough to call a trend yet, but it's striking that several ex-Microsofties are setting up new companies based around open source. The latest one is Ohloh, whose site explains:

We're mapping the open source world by collecting objective information on open source software. Search our site for the most current software metrics and project information on open source software projects.

eWeek has some details on the ex-Microsoft people involved:

Collison and Jason Allen, a former development manager for XML Web Services at Microsoft and now vice president of engineering at Ohloh, co-founded the new company. Other former Microsoft executives involved in the startup include Paul Maritz, who served as a member of the executive committee and manager of the overall Microsoft company from 1986 to 2000. Maritz is an investor in the company, along with Pradeep Singh, who spent nine years at Microsoft in various management positions and left in 1994 to found Aditi Technologies, an Indian outsourcing company, Collison said.

I think one of the reasons for this move from the dark side can be found in another quotation from the same story:

"unlike 1999 one can do a startup on very thin capital, and that is the way we are going about it," Collison said. "One would have to be insane these days to take a traditional Series A round [of venture capital funding] with the open-source software and outsourcing opportunities that are out there."

In other words, it is the open source infrastructure that makes low-cost startups possible; and once you start using open source yourself, you begin to find that it's rather good, and realise that potential customers might think so too....

16 May 2006

Is the Tide Turning for OpenDocument Format?

Hm, what's this: an analyst starting to say downright nice things about ODF? From the article by Ingrid Marson:

There is a 70 percent probability that ISO will not approve multiple XML document formats [i.e., Microsoft's rival to ODF], according to a research note published by Gartner last week. It also predicted, with the same probability, that "by 2010, ODF (OpenDocument Format) document exchange will be required by 50 percent of government and 20 percent of commercial organizations."

Cynical old dog that I am, these probabilities look a little rosy to me. Nonetheless, what is astonishing is not the numbers themselves, but that Gartner - never one to stick its neck out on open source - made the prediction. Maybe the tide is turning?

Update 1: Hardly a surprise to learn that IBM will be supporting ODF in Lotus Notes, but nonetheless welcome news, since it can only add to the momentum building behind the new standard.

Update 2: The Gartner document can be found here.

Update 3: And now KDE has joined the ODF Alliance.

15 March 2006

Microsoft Goes (a Bit More) Open Source

Many people were amazed back in 2004 when Microsoft released its first open source software, Windows Installer XML (WiX). But this was only the first step in a long journey towardness openness that Microsoft is making - and must make - for some time to come.

It must make it because the the traditional way of writing software simply doesn't work for the ever-more complex, ever-more delayed projects that Microsoft is engaged upon: Brooks' Law, which states that "Adding manpower to a late software project makes it later," will see to this if nothing else does.

Microsoft itself has finally recognised this. According to another fine story from Mary Jo Foley, who frequently seems to know more about what's happening in the company than Bill Gates does:

Beta testing has been the cornerstone of the software development process for Microsoft and most other commercial software makers for as long as they've been writing software. But if certain powers-that-be in Redmond have their way, betas may soon be a thing of the past for Microsoft, its partners and its customers.

The alternative is to adopt a more fluid approach that is a commonplace in the open source world:

Open source turned the traditional software development paradigm on its head. In the open source world, testers receive frequent builds of products under development. Their recommendations and suggestions typically find their way more quickly into developing products. And the developer community is considered as important to writing quality code as are the "experts" shepherding the process.

One approach to mitigating the effects of Brooks' Law is to change the fashion in which the program is tested. Instead of doing this in a formal way with a few official betas - which tend to slow down the development process - the open source method allows users to make comments earlier and more frequently on multiple builds as they are created, and without hindering the day-to-day working of developers, who are no longer held hostage by artificial beta deadlines that become ends in themselves rather than means.

23 February 2006

The Blogification of the Cyber Union

I suppose it was inevitable that Google would go from being regarded as quite the dog's danglies to being written off as a real dog's breakfast, but I think that people are rather missing the point of the latest service, Google Page Creator.

Despite what many think, Google is not about ultra-cool, Ajaxic, Javascripty, XMLifluous Web 2.0 mashups: the company just wants to make it as easy as possible for people to do things online. Because the easier it is, the more people will turn to Google to do these things - and the more the advertising revenue will follow.

Google's search engine is a case in point, and Blogger is another. As Blogger's home page explains, you can:

Create a blog in 3 easy steps: (1) Create an account (2) Name your blog (3) Choose a template

and then start typing.

Google Page Creator is just the same - you don't even have to choose a name, you just start typing into the Web page template. In other words, it has brought the blog's ease of use to the creation of Web sites.

This blogification of the Internet is a by-product of the extraordinary recent rise of blogs. As we know, new blogs are popping up every second (and old ones popping their clogs only slightly more slowly). This means that for many people, the blog is the new face of the Web. There is a certain poetic justice in this, since the original WorldWideWeb created by Tim Berners-Lee was a browser-editor, not simply a read-only application.

For many Net users, then, the grammar of the blog - the way you move round it and interact with its content - is replacing the older grammar of traditional Web pages. These still exist, but they are being shadowed and complemented by a new set of Web 2.0 pages - the blogs that are being bolted on by sites everywhere. They function as a kind of gloss explaining the old, rather incomprehensible language of Web 1.0 to the inhabitants of the brave new blogosphere.

Even books are being blogified. For example, Go It Alone!, by Bruce Judson, is freely available online, and supported by Google Ads alongside the text (like a blog) that is broken up into small post-like chunks. The only thing missing is the ability to leave comments, and I'm sure that future blogified books (bloks? blooks?) will offer this and many other blog-standard features.

Update: Seems that it's "blook" - and there's even a "Blooker Prize" - about which, more anon.

13 February 2006

XML Made Extravagant and Extraordinary

One of the most interesting areas in the world of open standards is the OpenDocument format, which promises to do to Microsoft Office what GNU/Linux is doing to Windows Server. I'm on various mailing lists related to this, and on one of them, from the standards body OASIS, a press release turned up in my inbox today. It proudly informed me that "its members have approved the Election Markup Language (EML) version 1.0 as an OASIS Standard, a status that signifies the highest level of ratification."

The OASIS press release told me that "EML provides a high-level overview of the processes within an electronic voting system and XML schemas for the various data interchange points between the e-voting processes," but naturally I wanted more than this dry description. So I went off to find out more. And the place I turned to was one of the most extraordinary sites on the Internet: the Cover Pages (hosted, in fact, by the self-same OASIS).

This, basically, is the fount of all wisdom for XML standards. And since XML lies at the heart of open data (and OpenDocument), this makes the Cover Pages one of the central sites for the open world. Naturally, it had all the details on EML. And here to whet your appetite are a few more of the XML Applications listed:

Weather Markup Language
Intrusion Detection Message Exchange Format
Historical Event Markup and Linking
Open Philanthropy Exchange
Green Building XML
Robotic Markup Language
Meaning Definition Language

The only question I have is how one man - since the Cover Pages seem to be the work of Robin Cover - can possibly stay on top of what seems to be all human life, neatly expressed as an XML application. Gaze, wonder and be grateful.

03 February 2006

Open Source's Best-Kept Secret

Ajax is short for Asynchronous Javascript + XML; it enables a Web page to be changed in the browser on the fly, without needing to refer back to the original server. This leads to far faster response times, and is behind many of the most interesting developments on the Web today; Gmail is perhaps the most famous example. Essentially it turns the browsers into a lightweight platform able to run small apps independently of the operating system (now where have we heard that before?).

The news of an Open Ajax project that will simplify the creation of such sites is therefore welcome. However, what is most interesting about the announcement is not the luminaries who are lining up behind it - IBM, Oracle, Red Hat and Yahoo amongst others - but the fact that it is yet another Eclipse project.

To which most people would probably say, Who? For Eclipse is open source's best-kept secret. It stands in the same relation to Microsoft's Visual Studio development tools as GNU/Linux does to Windows, and OpenOffice.org to Microsoft Office. Where these address respectively the system software and office suite sectors, Eclipse is aimed at developers. It is another example of IBM's largesse in the wake of its Damascene conversion to open source: the project was created when the company released a large dollop of code under the Eclipse Public License.

What's interesting is how Eclipse has followed a very similar trajectory to GNU/Linux: at first it was ignored by software companies, who preferred to stick with their own proprietary rivals to the Microsoft juggernaut. Later, though, they realised that divided they would certainly fall, and so united around a common open standard. The list of "Strategic Members" and "Add-in Providers" reads like a Who's Who of the world's top software companies (bar one).

This illustrates another huge - and unique - strength of open source: the fact that it represents neutral ground that even rival companies can agree to support together. The mutual benefit derived from doing so outweighs any issues of working with traditional enemies.

Even though Eclipse is relatively little known at the moment, at least in the wider world, it is not a particular bold prediction to see it as becoming the most serious rival to Microsoft's Visual Studio, and the third member of the open source trinity that also includes GNU/Linux and OpenOffice.org.