open...: algorithms

Showing posts with label algorithms. Show all posts

03 May 2011

Do the Maths

Long-time readers of this blog will know that I like to point out that software patents shouldn't be allowed because (among other reasons) software routines are just algorithms, and algorithms are just maths, which is pure knowledge. Well, a splendid chap has gone much further than my vague handwaving, and *shown* this explicitly:

Google has just been ordered to pay $5M for infringing patent 5,893,120 (hereafter "Patent 120"). This patent covers a very simple data structure and the algorithms for manipulating it. In fact much of the text of the patent is a pseudo-code implementation in a Pascal-like language. So I thought I would provide a practical demonstration of what has, until now, been a theoretical proposition; the reduction of a software patent to set of mathematical formulae.

...

Of course a judge isn't going to know the Lambda Calculus from a lump of rock, but that is what expert witnesses are for. Get a professor of mathematics from an internationally recognised university to testify that these are formulae in the Lambda Calculus, and that the Lambda Calculus is part of mathematics, and you have a sound legal proof. The only thing the patent holders could do is find another professor to testify differently.

Of course, that doesn't stop the lawyers from trying to wriggle out by saying that the patent is for the *application* of maths, and therefore is perfectly legitimate, because it leaves the "knowledge" untouched.

But what this conveniently overlooks is that such patents block anyone else from using that maths in the given field (and knowing lawyers, probably in other fields, too). That effectively turns knowledge into an abstract, useless, glass bead game.

If knowledge is to have any relevance in the real world, it must be applicable there, and not just disembodied and theoretical. Thus these software patents - even if "only" on the application of maths - remain monopolies on knowledge itself; and that way lies madness.

Follow me @glynmoody on Twitter or identi.ca.

25 April 2011

Do Creatorless Creations Deserve Copyright?

Copyright has its convenient myths. The principal one is that copyright is intellectual *property*, which taps into our natural tendency to support tangible property. The other, more subtle, is that copyright is necessary to fan the flame of the creativity.

In fact copyright inheres in just about anything in fixed form, however banal and trivial - and not just to sonnets and symphonies. But even for these hopeless, quotidian artefacts, there might be some logic to offering the incentive of copyright in the hope that by accident an occasional masterpiece is produced as a result.

But what about this?

This month, Wolfram Alpha’s WolframTones, modestly subtitled “A New Kind Of Music.” (Yes, that would be the same breathtaking humility that led them to originally price the Wolfram Alpha app at a hilarious $50. Fortunately, they subsequently bought a clue.)

It is pretty cool, in a geeky sort of way: music generated by fractally complex cellular automata, in the style of your choice—classical, dance, rock/pop, hip-hop, etcetera. Every composition is unique, and can be downloaded as a ringtone.

That's interesting, but the real kicker is the following:

They lay claim to the copyright on all the generated music, mind you, raising the interesting question of what counts as “fair use”

But this isn't just about "fair use", it goes to the heart of what exactly we mean by creativity. Why should something produced algorithmically be regarded as creative? If there is any creativity, it's at the level of programming - and programs are already covered by copyright - so why is another layer of protection needed?

Nor is this a unique case, as a recent story of a "robot journalist" writing news stories indicates.

Copyright is designed to encourage creativity; but if output is produced algorithmically,there is no need to provide any incentive, since machines cannot (yet) respond to such things, and the incentive to create the program that produces the output is rewarded by copyright in the lines of code. So surely, by logic, such creatorless creations do not need copyright?

Follow me @glynmoody on Twitter or identi.ca.

08 July 2010

Free Software Coder Bullied over Algorithm

As long-suffering readers of this blog will know, one of the many reasons I am against software patents is that software consists of algorithms, and algorithms are just maths, so a software patent is a patent on knowledge - the purest knowledge there is (a mathematician writes).

Sometimes defenders of software patents deny that software is just algorithms (don't ask me how, but some do). So I was particularly interested to read about this poor hacker being contacted over - you guessed it - algorithms, pure and simple:

Landmark Digital Services owns the patents that cover the algorithm used as the basis for your recently posted “Creating Shazam In Java”. While it is not Landmark’s intention to alienate those in the Open Source and Music Information Retrieval community, Landmark must request that you do not ship, deploy or post the code presented in your post. Landmark also requests that in the future you do not ship, deploy or post any portions or versions of this code in its current state or in any modified state.

As you can see, there is no way of disguising the fact that this claims to be a patent on an *algorithm* - that is, on maths, which is knowledge and therefore unpatentable.

But it gets worse. As the poor chap points out:

I've written some code (100% my own) and implemented my own methods for matching music. There are some key differences with the algorithm Shazam uses.

That is, he didn't copy the code, and it's not even the same approach.

But wait, there's more.

As he notes:

Why does Landmark Digital Services think they hold a patent for the concepts used in my code? Even if my code works pretty different from the Shazam code (from which the patents came).

What they describe in the patent is a system which:
1. Make a series of fingerprints of a media file and/or media sample
(such as audio, but could also be text, video, multimedia, etc)
2. Have a database/hashtable of fingerprints as lookup
3. Compare the set of hashtable hits using their moment in time it happened

This is very vague, basically the only innovative idea is matching the found fingerprints linearly in time. Because the first two steps describe how a hashtable works and creating a hash works. These concepts are not new nor innovative.

Moreover:

I've also had contact with other people who have implemented this kind of algorithms. Most notible is Dan Ellis. His implementation can be found here: http://labrosa.ee.columbia.edu/~dpwe/resources/matlab/fingerprint/

He hasn't been contacted (yet), but he isn't planning on taking his MatLab implementation down anyway and has agreed for me to place the link here. This raises another interesting question, why are they targetting me, somebody who hasn't even published the code yet, and not the already published implementation of Dan?!

And if they think its illegal to explain the algorithm, why aren't they going after this guy? http://laplacian.wordpress.com/2009/01/10/how-shazam-works/

This is where I got the idea to implement the algorithm and it is mentioned in my own first post about the Java Shazam.

So, moving to that last site, we find a detailed analysis of the algorithm - which is all pretty obvious. How did he do that?

So I was curious how it worked, and luckily there is a paper [.pdf] written by one of the developers explaining just that. Of course they leave out some of the details, but the basic idea is exactly what you would expect: it relies on fingerprinting music based on the spectrogram.

In other words, the description of the algorithm by the company's programmers shows that it "is exactly what you would expect".

At every level, then, this is an obvious, algorithmic, mathematical approach. And yet someone in Holland - a country that doesn't recognise software patents at all - finds himself under pressure in this manner for some code he wrote independently implementing that general, algorithmic mathematical idea.

Now explain to me how patents promote innovation, please...

Update: Re-reading the post I realise that things are even more ridiculous. Here's what the company wants:

we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm.

Now, you recall that the algorithm is the thing that the company claims to have a patent on. The original idea behind a patent was that in return for its grant, the inventor would *reveal* all the details of his or her invention so that others could use it once the patent had expired, as a quid pro quo. So if the company claims a patent on its invention, it must *by definition* reveal the algorithm.

Against that background, this demand to remove an explanation of the algorithm is simply absurd, and contradicts the very nature of a patent - it's like asking the USPTO not to reveal the patents it grants.

Follow me @glynmoody on Twitter or identi.ca.

23 December 2009

All Hail the Mighty Algorithm

As long-suffering readers of this blog will know, one of the reasons I regard software patents as dangerous is because software consists of algorithms, and algorithms are simply maths. So allowing software patents is essentially allowing patents on pure knowledge.

Against that background, this looks pretty significant:

Industries, particularly high tech, may be waiting for the U.S. Supreme Court decision, expected this coming spring, in the Bilski case to decide some fundamental questions of when you can patent business methods. But in the meantime, there’s a newly published decision from the Board of Patent Appeals and Interferences that establishes a new test to determine whether a machine or manufactured article that depends on a mathematical algorithm is patentable. The ruling is a big deal because it’s one of the few precedential decisions that the BPAI issues in a given year, and it will have a direct impact on patents involving computers and software.

For a claimed machine (or article of manufacture) involving a mathematical algorithm,

1. Is the claim limited to a tangible practical application, in which the mathematical algorithm is applied, that results in a real-world use (e.g., “not a mere field-of-use label having no significance”)?
2. Is the claim limited so as to not encompass substantially all practical applications of the mathematical algorithm either “in all fields” of use of the algorithm or even in “only one field?”

If the machine (or article of manufacture) claim fails either prong of the two-part inquiry, then the claim is not directed to patent eligible subject matter.

Now, the devil is in the details, and what impact this has will depend upon its interpretation. But what I find significant is that algorithms are foregrounded: the more people concentrate on this aspect, the harder it will be to justify software patents.

Follow me @glynmoody on Twitter or identi.ca.

04 June 2009

Knuth: Every Algorithm is Sacred

One of my computer heroes, Donald Knuth, has sent a message to the head of the EPO, hoping to convince her that every algorithm is sacred, and should not be delivered up to become the personal, exclusive, proprietary possession of any one person or company:

Basically I remain convinced that the patent policy most fair and most suitable for the world will regard mathematical ideas (such as algorithms) to be not subject to proprietary patent rights. For example, it would be terrible if somebody were to have a patent on an integer, like say 1009, so that nobody would be able to use that number "with further technical effect" without paying for a license. Although many software patents have unfortunately already been granted in the past, I hope that this practice will not continue in future. If Europe leads the way in this, I expect many Americans would want to emigrate so that they could continue to innovate in peace!

Follow me @glynmoody on Twitter or identi.ca.

06 May 2009

EPO: FSFE Does It by the Numbers

Yesterday I was praising Red Hat's submission to the EPO in its pondering of the patentability of software. Today, it's the FSFE's turn. They've produced a fairly short but sweet document, which has a sentiment close to my heart:

4.(a) Does the activity of programming a computer necessarily involve technical considerations?

No. The reverse is almost invariably true. Any software program is the result of programming, which is in essence combining a series of algorithms, and algorithms are matematics.

Got it in one.

23 March 2009

Have I Got News for Them

This is just incredible:

Major media companies are increasingly lobbying Google to elevate their expensive professional content within the search engine's undifferentiated slush of results.

Many publishers resent the criteria Google uses to pick top results, starting with the original PageRank formula that depended on how many links a page got. But crumbling ad revenue is lending their push more urgency; this is no time to show up on the third page of Google search results. And as publishers renew efforts to sell some content online, moreover, they're newly upset that Google's algorithm penalizes paid content.

Let's just get this right. The publishers resent the fact that the stuff other than "professional content" is rising to the top of Google searches, because of the PageRank algorithm. But wait, doesn't the algorithm pick out the stuff that has most links - that is, those sources that people for some reason find, you know, more relevant?

So doesn't this mean that the "professional content" isn't, well, so relevant? Which means that the publisher are essentially getting what they deserve because their "professional content" isn't actually good enough to attract people's attention and link love?

And the idea that Google's PageRank is somehow "penalising" paid content by not ignoring the fact that people are reading it less than other stuff, is just priceless. Maybe publishers might want to consider *why* their "professional content" is sinking like a stone, and why people aren't linking to it? You know, little things like the fact it tends to regard itself as above the law - or the algorithm, in this case? (Via MicroPersuasion.)

15 November 2008

Of Lawyers and Software Patents

Regular readers will know that I have a bee in my bonnet about the non-patentability of software, largely because of the fact that software is made up of algorithms, algorithms are maths, and maths is not patentable: QED. So, as you might expect, the following, from a patent attorney, makes me go a funny colour:

Software is not a mathematical equation, nor is it a mathematical language. How anyone who writes software or professes to understand software could argue to the contrary is beyond me. Do people who write software actually think they are sitting down and writing mathematical equations and stringing them together? It is absurd to have such a narrow view of software.

The good news is that I do not intend to rebut this (and the rest of the post) here, because the comments to it, and those on Groklaw discussing it, are so good, and so varied, that it would be superfluous. If you ever come across people who have doubts about the non-patentablility of software, just point them towards those comments.

13 October 2008

Symbian's Patently Terrible “Triumph”

Although I've written elsewhere about the recent court case of Symbian v Comptroller General of Patents, noting that it was bad news, I hadn't realised quite how bad the news was until I went through the complete judgment. It's plain that the judges in question, who to their credit tried their level best to understand this mysterious stuff called software, failed to grasp the central issue of what software is. As a result, they have passed down a judgment that is so seriously wrong it will cause a huge amount of damage in the future unless it is revoked by a higher court....

On Open Enterprise blog.

08 October 2008

Bad News on the UK Software Patent Front

Why is there always this Jesuitical casuistry when it comes to software?

We have the following:

what goes on inside a computer can be said to be closer to a mathematical method (which is, of course, not patentable by virtue of art 52(2)(a)) than what goes on inside other machines.

But before that the same judge has said:

It can also be said in favour of Symbian's case that it would be somewhat arbitrary and unfair to discriminate against people who invent programs which improve the performance of computers against those who invent programs which improve the performance of other machines.

Well, no more unfair than not allowing physicists to patent the laws they discover, or the theorems that mathematicians prove. The point is, software is not "closer to a mathematical method", it *is* a mathematical method, or rather a concatenation of them.

All this juridical "on the one hand" and "on the other" in the interests of "balance" does not change this. The current decision is seriously bad news, because it opens the door to even more weaselly patent applications that contort themselves into the magic position to gain the favour of whichever Jesuit is on duty that day.

As a result of which, new software becomes much *harder* and more expensive to write - even to the point of impossibility, if patent thickets get too thick. Hardly what the great and glorious patent system is supposed to do, is it...?

19 March 2008

Court Backslides on UK Software Patents

On Open Enterprise blog.

12 November 2007

Patently Outrageous

Europe does not allow software patents, but that doesn't stop some people - patent lawyers, mostly - from circumventing that clear and specific intention. One of them has not only written a book on how to sneak software patents through the system, but is now challenging an eminently sensible ruling on the subject by the UK authorities last year.

But the bits that stick in my craw are the following sections of the accompanying press release:

High-tech businesses can obtain a European-level monopoly over the distribution of computer disks and internet downloads of programs that configure an apparatus to perform a patented process. Now, in Britain, they cannot.

and

“A lot of people think there is no problem here because disks and downloads are protected by copyright,” noted Nicholas Fox, of Beresford & Co, the patent attorney acting for the high-tech five. “However, that is just not true. Copyright protection only protects code against copying. In contrast, patent protection enables a company to monopolise an invention even if competitors independently come up with the same idea.

Got that? These poor little companies just absolutely must have a monopoly on ideas to stop others from coming up with the same idea *independently*, because, you know, intellectual monopolies - like all monopolies - are just so good for society, and we can't allow other people to have the same ideas on their own without paying, oh my word no, because - heavens! - art and science might actually progress. And we can't have that, can we?

It's sad enough writing a book on how to get around a clear legal statement of intent; but brazenly demanding the right to a monopoly in what amounts to mathematical knowledge (as all software is, embodied in logical operations and algorithms) really takes the biscuit.

18 October 2007

Of Open Source, Open Access and Donald Knuth

I often witter on about open access, assuming people know what I'm talking about. But if you'd like a little historical background, try this, which explains why people interested in open source should also be interested in open access:

Like all things that has to do with the Internet, the computer scientists are ahead of the curve in the flight from the old model of scientific publishing.

In probably one of the biggest shocks of the scientific publishing world, in 2003, the entire editorial board of the prestigious Journal of Algorithms resigned en masse. They subsequently re-formed as the editorial board of a new journal with the similar-sounding name of ACM Transactions of Algorithms.

In a sharply worded letter, the co-founder of the journal (and legendary computer scientist) Donald Knuth, explained the reasons for the mass defection. The reason being that Elsevier had been gouging the subscribers of the Journal of Algorithms for years. It had reached the point where the only defense was to bail ship.

24 September 2007

What Muhammad ibn Musa al-Khwarizmi Knew

Nice to see algorithms getting some respect:

Algorithms, as closely guarded as state secrets, buy and sell stocks and mortgage-backed securities, sometimes with a dispassionate zeal that crashes markets. Algorithms promise to find the news that fits you, and even your perfect mate. You can’t visit Amazon.com without being confronted with a list of books and other products that the Great Algoritmi recommends.

Its intuitions, of course, are just calculations — given enough time they could be carried out with stones. But when so much data is processed so rapidly, the effect is oracular and almost opaque. Even with a peek at the cybernetic trade secrets, you probably couldn’t unwind the computations.

Maybe; but the point is, they are just calculations. Which is why the idea of patenting any of them - as raw algorithms, business methods, or software - is, er, patently ridiculous.

18 July 2007

Seeing the Power of the Visual Commons

I've written before about Microsoft's Photosynth, which draws on the Net's visual commons - Flickr, typically - to create three-dimensional images. Here's another research project that's just as cool - and just as good a demonstration of why every contribution to a commons enriches us all:

What can you do with a million images? In this paper we present a new image completion algorithm powered by a huge database of photographs gathered from the Web. The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large. For many image completion tasks we are able to find similar scenes which contain image fragments that will convincingly complete the image. Our algorithm is entirely data-driven, requiring no annotations or labelling by the user.

One of the most interesting discoveries was the following:

It takes a large amount of data for our method to succeed. We saw dramatic improvement when moving from ten thousand to two million images. But two million is still a tiny fraction of the high quality photographs available on sites like Picasa or Flickr (which has approximately 500 million photos). The number of photos on the entire Internet is surely orders of magnitude larger still. Therefore, our approach would be an attractive web-based application. A user would submit an incomplete photo and a remote service would search a massive database, in parallel, and return results.

In other words, the bigger the commons, the more everyone benefits.

Moreover:

Beyond the particular graphics application, the deeper question for all appearance-based data-driven methods is this: would it be possible to ever have enough data to represent the entire visual world? Clearly, attempting to gather all possible images of the world is a futile task, but what about collecting the set of all semantically differentiable scenes? That is, given any input image can we find a scene that is “similar enough” under some metric? The truly exciting (and surprising!) result of our work is that not only does it seem possible, but the number of required images might not be astronomically large. This paper, along with work by Torralba et al. [2007], suggest the feasibility of sampling from the entire space of scenes as a way of exhaustively modelling our visual world.

But that is only feasible if that "space of scenes" is a commons. (BTW, do check out the paper's sample images - they're amazing.)

07 June 2007

Microsoft, Its Rose and the Canker

Now here's an interesting thing:

Developing the Future is an annual report examining the impact of the software development industry on the UK economy, from both a local and global perspective. The report is a collaborative work with partners from the IT industry and academia. By exploring emerging trends, the report stimulates debate between stakeholders and calls for positive action to support the UK software industry.

It's interesting because:

The second edition of Developing the Future not only comprises original research commissioned by Microsoft on these fascinating themes, it also includes independent articles from luminaries such as Will Hutton, outlining unique perspectives on the massive change now taking place in Britain.

You'd pretty much expect this to be standard Microsoft propaganda, along the lines of its risible TCO "studies"; but you'd be wrong. Developing the Future is an extremely interesting look at major issues affecting UK software development in the near-future. It is one of the best-presented digital documents I have seen in a while, with excellent photography, and a nice clean design.

The contents aren't bad either: for the most part, the writing is neutral and fair. Only at one point is it clear that there is a canker at the heart of this rose, when the section on innovation starts wittering on about that mythical beast of "intellectual property", and comes out with this extraordinary self-evident truth:

The lack of intellectual property protection for algorithms, software or enhanced business processes are barriers to innovation.

Creating intellectual monopolies in something as fundamental as algorithms is about as sensible as handing out government monopolies on air and water. It's sad to see an otherwise forward-looking document stuck so firmly in the past, instead of promoting innovation and prosperity in the "Knowledge Economy" through the liberation of its wondrous, non-rivalrous, raw stuff: ideas.

01 March 2007

Undermining Digg

Digg occupies such an emblematic place in the Web 2.0 world that it's important to understand what's really going on with this increasingly powerful site (on the rare occasions that I've had stories dugg, my traffic has been stratospheric for a day or two before sagging inexorably down to its usual footling levels.)

So this story from Annalee Newitz on Wired News is at once fascinating and frightening:

I can tell you exactly how a pointless blog full of poorly written, incoherent commentary made it to the front page on Digg. I paid people to do it. What's more, my bought votes lured honest Diggers to vote for it too. All told, I wound up with a "popular" story that earned 124 diggs -- more than half of them unpaid. I also had 29 (unpaid) comments, 12 of which were positive.

Although it's worrying that Digg can be gamed so easily, there's hope too:

Ultimately, however, my story did get buried. If you search for it on Digg, you won't find it unless you check the box that says "also search for buried stories." This didn't happen because the Digg operators have brilliant algorithms, however -- it happened because many people in the Digg community recognized that my blog was stupid. Despite the fact that it was rapidly becoming popular, many commenters questioned my story's legitimacy. Digg's system works only so long as the crowds on Digg can be trusted.

Digg remains a fascinating experiment in progress; let's hope it works out.

19 April 2006

The Euston Manifesto

After the right espousing open source and related open goodness yesterday, today we have the left. More specifically, we have something called The Euston Manifesto (via Compromiso Social por la Ciencia). This may sound a bit like an Ealing Comedy, but it includes the following rather surprising paragraph:

14) Open source.
As part of the free exchange of ideas and in the interests of encouraging joint intellectual endeavour, we support the open development of software and other creative works and oppose the patenting of genes, algorithms and facts of nature. We oppose the retrospective extension of intellectual property laws in the financial interests of corporate copyright holders. The open source model is collective and competitive, collaborative and meritocratic. It is not a theoretical ideal, but a tested reality that has created common goods whose power and robustness have been proved over decades. Indeed, the best collegiate ideals of the scientific research community that gave rise to open source collaboration have served human progress for centuries.

09 April 2006

(Patently) Right

Paul Graham is a master stylist - indeed, one of the best writers on technology around. Reading his latest essay, "Are Software Patents Evil?" is like floating in linguistic cream. And that's the problem. His prose is so seductive that it is too easy to be hypnotised by his gently-rhythmic cadences, too pleasurable to be lulled into a complaisant state, until you find yourself nodding mechanically in agreement - even with ideas that are, alas, fundamentally wrong.

Take his point in this recent essay about algorithms, where he tries to argue that software patents are OK, even when they are essentially algorithms, because hardware is really only an instantiation of an algorithm.

If you allow patents on algorithms, you block anyone from using what is just a mathematical technique. If you allow patents on algorithms of any kind, then you can patent mathematics and its representations of physics (what we loosely call the Laws of Physics are in fact just algorithms for calculating reality).

But let's look at the objection he raises, that hardware is really just an algorithm made physical. Maybe they are; but the point is you have to work out how to make that algorithm physical - and that's what the patent is for, not for the algorithm itself. Note that such a patent does not block anyone else from coming up with different physical manifestations of it. They are simply stopped from copying your particular idea.

It's instructive to look at another area where patents are being hugely abused: in the field of genes. Thanks to a ruling in 1980 that DNA could be patented, there has been a flood of completely insane patent applications, some of which have been granted (mostly in the US, of course). Generally, these concern genes - DNA that codes for particular proteins. The argument is that these proteins do useful things, so the DNA that codes for them can therefore be patented.

The problem is that there is no way of coming up with an alternative to that gene: it is "the" gene for some particular biological function. So the patent on it blocks everyone using that genomic information, for whatever purpose. What should be patentable - because, let me be clear here, patents do serve a useful purpose when granted appropriately - is the particular use of the protein - not the DNA - the physical instantiation of what is effectively a genomic algorithm.

Allowing patents on a particular industrial use for a protein - not a patent on its function in nature - leaves the door open for others to find other chemicals that can do the same job for the industrial application. It also leaves the DNA as information/algorithm, outside the realm of patents.

This test of whether a patent allows alternative implementations of the underlying idea can be applied fruitfully to the equally-vexed questions of business methods. Amazon's famous "one-click" method of online making purchases is clearly total codswallop as a patent. It is a patent on an idea, and blocks everyone else from implementing that (obvious) idea.

The same can be said about an earlier patent that Oracle applied for, which apparently involved the conversion of one markup language into another. As any programmer will tell you, this is essentially trivial, in the mathematical sense that you can define a set of rules - an algorithm - and the whole drops out automatically. And if you apply the test above - does it block other implementations? - this clearly does, since if such a patent were granted, it would stop everyone else coming up with algorithms for conversions. Worse, there would be no other way to do it, since the process is simply a restatement of the problem.

I was heartened to see that a blog posting on this case by John Lambert, a lawyer specialising in intellectual property, called forth a whole series of comments that explored the ideas I've sketched out above. I urge you to read it. What's striking is that the posts - rather like this one - are lacking the polish and poise of Graham's writing, but they more than make up for it in the passion they display, and the fact that they are (patently) right.