Technology Thursday
Image by Lyn Millett via Flickr
America's libraries and Google are at odds over the Google Books Settlement. I love libraries - I love the fact of them, I love their history, I love librarians (who, contrary to stereotype, are some of the most creative and intellectually-engaged people you could meet), and I love the actuality of libraries. My hometown library is now a must-visit place for us when we're visiting my family because, even though it's a small-town library, it has a fabulous children's area. Some of the work I've done in the past has involved librarians and librarian war stories are really amazing.
I remember once listening to a senior staff person from one of the major NYC public libraries talk about the complexities they have to deal with in an urban, multi-lingual, heterogeneous environment. Really fascinating stuff that had never come across my radar before. Anyway, I love libraries. The American Library Association's Library Bill of Rights and their issues write-ups are worth revisiting periodically.
But I am also a geek-girl. And I love Google too. Not the way I love libraries, mind you. But I love their mission ("organize the world's information and make it universally accessible and useful" -- funny, actually; that sounds an awful lot like what libraries are trying to do) and my vague sense from talking to various people who work there, is that they are doing software right. And I appreciate that. Some very good friends of my husband and I work or have worked there over the years. Almost every project I do in my day job involves someone from Google and they're invariably smart and thoughtful people. As a company, Google is not always in the right, but (and maybe it's just good pr), I do think they try.
But Google and the libraries are not seeing eye-to-eye over Google's plan to scan the world's books and make them searchable. There has been controversy of various sorts over Google's plan to digitize basically every book they can find. First and foremost, of course, are the copyright issues to be resolved. But there are subtler issues, as well, as described in this piece that the The American Library Association (ALA) and the Association of Research Libraries (ARL) created, called "A Guide for the Perplexed: Libraries and the Google Library Project Settlement." The Google Books Settlement is an arrangement that Google worked out with some authors and publishers. Here's an overview (from Google's perspective.) Note, though, that not everyone agrees with Google's perspective on this. Amazon in particular has been very critical. (See here and here for starters.)
There are technical concerns, as well. As I said, Google does software well, but they're not perfect. I recently came across this critique over how Google is handling (or mishandling) the metadata associated with the books that it's scanning. (Metadata is data about the books, e.g., the data it was published, the author, the publisher, the number of pages, and so on.) Some dates are wrong, there are classification errors, and so on. Geoff Nunberg, the writer of this post conjectures that these errors may be attributed to Google's probabilistic automation algorithms that are attempting to automatically classify based on what the algorithms can learn from context rather than (necessarily) simply accepting whatever card catalog data libraries provide. The writer notes:
Google's machine classification will certainly improve, extracting metadata mechanically simply isn't sufficiently reliable for scholarly purposes. After some early back-and-forth, Google decided it did want to acquire the library records for scanned books along with the scans themselves, and now it evidently has them, but I understand the company hasn't licensed them for display or use — hence, presumably, the odd automated stabs at recovering dates from the OCR that are already present in the library records associated with the file.
What Google is doing is bound to be useful to scholars and everyday users, but automation, however cleverly done, has risks. How they choose to deal with these issues on the technical side is just as important as how all of the legal and copyright issues work out. And, by the way, the head of Google Books' Metadata Team weighs in productively in the comment thread of Nunberg's critique--good reading if you keep scrolling down. In any event, for people who love books and love libraries and love technology, it should be interesting to watch this all unfold!
* Post edited to remove implication that Nunberg was more definitive about what was happening technically than he was.
I love books and libraries and Google, but I admit the plan worries me a bit. If Google digitizes as many books as the company aims to, the errors along the way could not only be overlooked, they could be made permanent by virtue of easy access.
Posted by: Katherine | Thursday, September 10, 2009 at 08:27 PM
Lynn, this is a very interesting piece. Geek that I am (I too LOVE libraries and am daily thankful for Google), I've been following the debate over Google's endeavor. You've framed the issues nicely. I am going to read the ALA piece now.
Posted by: Stacy | Saturday, September 12, 2009 at 07:21 AM
I think eventually Google will work out the kinks. I think we are living in a VERY exciting time. To think that at some point all of humanity's knowledge can be accessed at your fingertips on your Kindle or iPod!
How will this change the world? We know that fire, the wheel, the printing press, etc. revolutionized humanity in profound ways. How will the access to all this knowlege change us and affect us? Is information and knowledege different? I think so. You can access information, but knowledge has to be internalized through learning and coupled with experience. This is a very profound topic!
Posted by: MRJ | Sunday, September 13, 2009 at 11:06 PM