The clue doesn't need to be in the name

Note: The following post is now very much out-of-date as I no longer refer to the Friends of Charles Darwin blog as The Red Notebook, and the URL scheme has been considerably simplified (as has the URL for this post). But I've left the old post in place, as I still think it makes some valid points.

"What's in a name? That which we call a rose
By any other name would smell as sweet."
     —William Shakespeare, Romeo and Juliet

The name of this, my 500th, Red Notebook post is The clue doesn't need to be in the name. Its uniform resource locator (URL) is:

I had some element of choice over the format of this blog's URLs. For example, I could have adopted a format beginning But that would have given me a major headache, had I ever decided to change the name of the blog.

Similarly, I chose to incorporate the year and month of each post (e.g. /2010/08/) into the URL. With hindsight, I didn't need to do that: my decision was based on an assumption about the WordPress blogging platform, which I later discovered to be incorrect (I won't bore you with the details).

Also thanks to my incorrect assumption, the final part of the URL rather redundantly repeats the year and month, as well as including the day number of the post. By default, WordPress suggests a final URL element based on each post's title (which would have made this post's URL ). But I don't like that format because, as you can see, it can lead to very lengthy URLs, and it creates problems if you ever decide to change the title of the post, which I frequently do between drafts.

My preferred, date-based format is not without its problems. Although it is unlikely that I will ever change the date of a post, it does mean that I have to introduce an inelegant kludge [or kluge, for any American readers out there] in the infrequent event of my wanting to post more than one post on the same day (again, I won't bore you with the details).

What I have done with my URLs, therefore, is to try to include some metadata about each post (namely the post's date) in the URL, to tell you a little something about each post.

As an IT professional of 24 years, I should have known better.

Computers are pretty good these days at linking things together, categorising, cross-referencing, and doing boring administrative stuff like that. The purpose of a blog post's URL is to give you a permanent link to the post. It doesn't need to contain metadata about the post. That is putting too much responsibility on the URL. By giving it two jobs, I have created an unnecessary conflict of interests.

Note that I talk about a blog post's URL as if there was only one of them. But this simply isn't the case. The U in URL stands for universal, not unique. There can be literally scores of URLs for any particular blog post. For example, here is a short selection of URLs for this particular post:

Like I said, though, those clever computers keep track of what's what, so all of the above links lead you straight back to this post.

Why all this talk about URLs on a blog which is supposed to be about Darwin and evolution and stuff like that? Well, because I think we commit pretty much the same mistake trying to build metadata and uniqueness (as opposed to universality) into species names.

The purpose of giving a species a name is to enable us to talk about it. Simple as that. It doesn't matter whether I refer to the bushy plant currently in flower at the top of my lane as a dog rose [English], hunderose [Danish], or stenros [Swedish], provided we all know what I'm talking about. Clearly, it would be a lot more convenient if we tried to standardise on a name, and Carl Linnaeus did a nice job of that by coming up with the name Rosa canina.

And, in this age of computers, it should be pretty easy for me to translate (as I just did, thanks to Wikipedia) the name with which I am familiar (dog rose) into the name with which, despite my classical education, I was not (Rosa canina). Juliet was right, whatever you call a rose, it will still be a rose.

But, there is an inherent problem with Linnaeus's method of naming species. It tries to make the names do too much. It tries to give species a universal label which people can refer to (which is a very good thing), but it also tries to classify the species. For example, the name Rosa canina tells us that the dog rose is a species of the genus Rosa, i.e. the roses. This is fine, provided we get our initial genus classification correct. But we frequently don't. So do we fix the problem of an incorrectly classified species by renaming the species, or do we keep the old name that people are already familiar with? The answer is, it depends: we decide on a case-by-case basis. Which is why the International Commission on Zoological Nomenclature recently voted 23 to 4 to reject a petition to rename, amongst other species, the geneticist's favourite fruit-fly Drosophila melanogaster: a decision which would wreak havoc with the massive scientific literature on the species. But the debate continues.

Similarly, convention has it that the scientific classifications for species should be unique: we should not have different names for the same species. Which is fine until you find out that you have inadvertently given the same species two names, in which case, convention has it, one of them has to go. Which is why the delightfully named 'thunder lizard' dinosaur Brontosaurus excelsus now goes by the much less satisfactory moniker Apatosaurus excelsus.

Worse still is when two different species are given the same 'unique' scientific name. This can happen, say, when scientists initially fail to realise that they are talking about different, closely related species. There is, for example, an ongoing debate about whether the giraffe should be reclassified into several different species.

It's a total mess, basically, and certainly puts my URL problems into perspective.

I don't have any solutions for the mess, but my gut feeling is that we should stop trying to put too much responsibility on species' names. In this age of computers, the clue doesn't need to be in the name; we can cross-reference things and look them up. Computers are very good for that sort of thing.

Species need names to act as universal labels, but those labels do not need to be unique, and they do not need to contain metadata. By all means let's keep using the very useful Linnean species names, but let's stop messing around with them!

Postscript (16-Aug-2010): I see that John S Wilkins of Evolving Thoughts has previously written on similar matters, albeit with more philosophical aplomb.

