Sunday, August 17, 2008

Authority control

It's been a while since I got all library-geek in here, probably because I'm neither in school anymore nor working at a library (still) (yet). Two little news items caught my eye recently, though, about the automated matching of terms on the Internet. The first is from Steve Johnson's August 13 column "Hypertext" in the Chicago Tribune:

From the People Are Still Smarter Than Computers Department: After the Russians invaded the Republic of Georgia last week, the Valleywag blog captured Google News displaying alongside its story on the attack a Google Maps image of Savannah and environs. Does that mean they'd also have the details on Gen. Sherman's march to the Black Sea?

The second is from the BBC's website; it's a much longer article, so I'll just post the link - but the city council of Birmingham in England printed up a bunch of leaflets about recycling, and the picture of the city skyline on them was of Birmingham, Alabama. I don't know how the city council got the photo, but I imagine it had something to do with an Internet search that couldn't differentiate the two Birminghams.

For those of you who know this already, sorry to come off condescending, but librarians call this kind of differentiation authority control. You see it in the Library of Congress subject headings all the time - it tells you if a book is by this John Smith or that John Smith, or if the word "records" refers to LPs, archives, or electronic catalog records. The only way to get authority control in a vast amount of information seems to be to have humans do it - so far, anyway. I did read an article in cataloging class about assigning algorithms that would say, okay, when "apple" is near computer words, it's probably talking about Apple computers, and when it's near words about food or farming, it's talking about the fruit. That's certainly not foolproof, but on the other hand, no one's going to index the Internet. Librarians have definitely tried.

This could get very metaphysical, obviously. The meanings of words are personal, and political, and obviously up for debate. And no system is going to be able to control for metaphorical and other creative uses of words - what would that algorithm do with "apple of my eye?" I can't decide if this whole thing gives me hope that humans do a superior job of organizing information and people will recognize this, or if people will just be content with incorrect and incomplete information. Something tells me the latter is probably more likely.

1 comment:

Clare said...

This is a fantastic post, my dear. Highly enjoyed it. The word nerd in me was especially titillated, and it was great to get the LD on word searches through the librarian lens.

I only have one apprehension about the topic: "authority control" is a weak name for what you're talking about. It sounds redundant. Authority. Control. Same diff. :)