The Book-Scanning Project: Recent Updates

I kept having to put little additional notices in the main Book-Scanning Project page. And if you come back to the page after a while, you'll never find all the additions. So now I'm keeping dated updates on this page.

August early, 2000

Yes, I know, I should have included the Library of Congress in my scanning list. I didn't. There is no good reason. (The bad reason is that back when I was originally testing sources, I had the ISBN checksum algorithm wrong. So every source appeared to be giving less than a 10% hit rate, including the LOC. After I caught the error, I never went back and added a LOC module.)

August 28, 2000

Dan Poirier reports that Amazon's search results page has changed, so the Amazon web searcher has to be jiggered. In, line 73, change re.compile('/Author=([^/"]*)') to re.compile('&field-author=([^/"]*)'). I haven't tried this myself.

August 29, 2000

Radio Shack is giving away free barcode scanners as part of some marketing program I don't understand. Skip the Penguin has put up a page about using the CueCat scanner for your own purposes (including cataloging books). Linux and Windows instructions included. Or see the Lineo page for another Linux CueCat driver.

Note that DigitalConvergence (the makers of CueCat) are being maximally assholic about this. Their lawyers insist that anyone who writes or distributes software which works with the CueCat is violating their intellectual property rights. They have also sued a web site that describes taking apart a CueCat, and (!) an on-line UPC barcode database. In my humble opinion, you should get a CueCat and do whatever you want with it, purely to aggravate DigitalConvergence.

September 23, 2000

Chris Taylor contributes the Java code equivalent to the Javascript quoted on the main page -- the code to convert an EAN barcode to an ISBN.

September 29, 2000

Chris Taylor has now also posted Java code for reading CueCat data and doing an Amazon search. He also points out CCScan, which is the CueCat-reading part of this.

October 4, 2000

Steve Alexander has written barcode_to_amazon, a web-page module which decodes CueCat output and looks up the result on Amazon. Written in DTML, but he also includes a Python version (derived from my code and Skip's).

November 6, 2000

John Deters supplies a newer update for's Amazon parser. Change lines 72 and 73:
strongex = re.compile('<strong>.*?</strong>', re.DOTALL)
authorex = re.compile('/Author=([^/"]*)')
strongex = re.compile('<b><font face=verdana,arial,helvetica>.*?</font></b>')
authorex = re.compile('&field-author=([^/"]*)')

January 9, 2001

Eric Hellman sends a bunch of links: is a Javascript page which will take raw CueCat scans and turn them to ISBNs. is a server which takes ISBNs and returns book data. (Replace the ########## with an ISBN.) He also comments: "the variably placed dashes in the isbn's actually conform to a standard which at one time would give you the country and publisher, but has been messed up a while."

March 13, 2001

I finally incorporated all the search-page parser fixes into the scripts. Amazon, AmazonUK, and Chapters all work correctly as of today. If you get the current versions of the shelvescripts.tar.Z package, you can ignore all the fixes prior to this date (the ones listed above).

April 13, 2001

Skip the Penguin points out a site about how to hotwire a CutCat so that it produces raw barcode data. However, in this mode, it does not generate a space between the main barcode number and the supplementary barcode number. (Some other barcode readers behave this way; mine is adjustable.) If you have a non-space-generating scanner, patch the and scripts with the following patches: upcfind.patch, makeisbn.patch

August 15, 2001

Neil K. Guy points out, which claims to have a good database of UPC barcodes. I haven't tried them, but they might be good for other things besides books.

March 29, 2002

Virgil has emailed me patches which get Amazon and working as of the present time. I haven't tested them, but here they are.

August 19, 2002

Nels Satterlund provides a modification for which accepts checksumless ISBNs -- a nine-digit string followed by a question mark. The code inserts the correct checksum digit in the ISBN output. Here is the code -- replace the existing mangle() function in

Last updated August 19, 2002.

The Main Story

Zarfhome (map) (down)