The Book-Scanning Project: Recent Updates
I kept having to put little additional notices in the main
page. And if you come back to the page after a while,
you'll never find all the additions. So now I'm
keeping dated updates on this page.
August early, 2000
Yes, I know, I should have included the
Library of Congress
in my scanning list. I didn't. There is no good reason.
(The bad reason is that back when I was originally testing
sources, I had the ISBN checksum algorithm wrong. So every
source appeared to be giving less than a 10% hit rate, including
the LOC. After I caught the error, I never went back and added
a LOC module.)
August 28, 2000
Dan Poirier reports that Amazon's search results page has changed, so
the Amazon web searcher has to be
jiggered. In shelve.py, line 73, change
I haven't tried this myself.
August 29, 2000
Radio Shack is giving away free barcode scanners as part of
some marketing program I don't understand.
Skip the Penguin has put up a page about
using the CueCat scanner
for your own purposes (including cataloging books). Linux and
Windows instructions included.
Or see the
Lineo page for
another Linux CueCat driver.
(the makers of CueCat)
are being maximally assholic about this. Their lawyers insist
that anyone who writes or distributes software which works with
the CueCat is violating their intellectual property rights.
They have also sued a web site that
describes taking apart
an on-line UPC barcode database.
In my humble opinion, you should get a CueCat and do whatever you
want with it, purely to aggravate DigitalConvergence.
September 23, 2000
Chris Taylor contributes the
Java code equivalent to the
EAN barcode to an ISBN.
September 29, 2000
Chris Taylor has now also posted
for reading CueCat data and doing an Amazon search.
He also points out
which is the CueCat-reading part of this.
October 4, 2000
a web-page module which decodes CueCat output and looks up the result on
Amazon. Written in DTML, but he also includes a Python version (derived
from my code and Skip's).
November 6, 2000
John Deters supplies a newer update for shelve.py's Amazon parser. Change
lines 72 and 73:
strongex = re.compile('<strong>.*?</strong>', re.DOTALL)
authorex = re.compile('/Author=([^/"]*)')
strongex = re.compile('<b><font face=verdana,arial,helvetica>.*?</font></b>')
authorex = re.compile('&field-author=([^/"]*)')
January 9, 2001
Eric Hellman sends a bunch of links:
will take raw CueCat scans and turn them to ISBNs.
is a server which takes ISBNs and returns book data. (Replace the
########## with an ISBN.)
He also comments:
"the variably placed dashes in the isbn's actually conform to a
standard which at one time would give you the country and publisher,
but has been messed up a while."
March 13, 2001
I finally incorporated all the search-page parser fixes into the scripts.
Amazon, AmazonUK, and Chapters all work correctly as of today.
If you get the current versions of the
you can ignore all the fixes prior to this date (the ones listed
April 13, 2001
Skip the Penguin points out a site about
to hotwire a CutCat so that it produces raw barcode data.
However, in this mode, it does not generate a space between
the main barcode number and the supplementary barcode number.
(Some other barcode readers behave this way; mine is adjustable.)
If you have a non-space-generating scanner, patch the upcfind.py and
makeisbn.py scripts with the following patches:
August 15, 2001
Neil K. Guy points out
which claims to have a good database of UPC barcodes. I haven't tried
them, but they might be good for other things besides books.
March 29, 2002
Virgil has emailed me patches which get Amazon and Amazon.uk working
as of the present time. I haven't tested them, but
here they are.
August 19, 2002
Nels Satterlund provides a modification for makeisbn.py which accepts
checksumless ISBNs -- a nine-digit string followed by a question mark.
The code inserts the correct checksum digit in the ISBN output. Here
the code -- replace the existing
function in makeisbn.py.
Last updated August 19, 2002.
The Main Story