Crowdsource, Please
by wjw on May 15, 2011
Like every other midlist writer on the planet, I’m striving to get my out-of-print books and stories online so that (a) you can enjoy them, and (b) I can make a few bucks.
To this end, I embarked upon a Cunning Plan. I discovered that my work had been pirated, and was available for free on BitTorrent sites located in the many outlaw server dens of former Marxist countries. So I downloaded my own work from thence with the intention of saving the work of scanning my books— I figured I’d let the pirates do the work, and steal from them. While this seemed karmically sound, there proved a couple problems.
First, the scans were truly dreadful and full of errors. (Even if you’re desperate for my work, I can’t really recommend them.) A lot of time has been spent copy-editing, both by me and by Kathy— which isn’t really so bad, because this would have to be d0ne anyway.
But second, apparently a few of my books were so obscure that they flew under the radar of even the pirates! You can’t imagine how astounded I was when I discovered this.
I could really use some decent scans of some of my books, and I figure some among you must have better scanners and OCR than the piece of crap that’s currently sitting on my shelf.
So I’m willing to trade. Should any of you volunteer to provide scans of Days of Atonement, Angel Station, and Knight Moves, that lucky individual will get a signed, personalized copy of the WJW book of his or her choice (assuming I actually have a copy, of course). Plus, whatever book you scan will spend digital eternity with your name in it, along with my eternal thanks. Sound good?
Crowdsourcing. It’s so 21st Century! You want to do this, right?
Let’s talk.
Next Comments →
I’ve been turning public domain books into free ebooks for eight years (at Distributed Proofreaders). I’ve proofread about 70,000 pages for them. I have a scanner, the latest copy of Abbyy Finereader, and great knowledge of common OCR errors, which we term scannos (clown for down, arid for and, that sort of thing). I believe that you, as blog owner, have access to my email address. Feel free to contact me. You’ll have to send me copies of the books you want scanned, as I don’t have paper copies. Old, battered copies is fine, as long as the print is clear. It might be best if I were to send you XHTML files, which you could subject to a second proofreading pass and then convert to the desired formats with Calibre (great free software). Just one proofing isn’t enough to catch all the errors, though I could commit to catching 95% of them.
Scanning the books is easy. Correcting the errors is the tricky part — which is why pirates skip it. The proofing is going to take more time than you expect, even though I’m dang fast at it.
And it’s too late for your first round, but I have a set of sed scripts that automatically corrects the most common OCR errors; wrote them for another author friend who was doing something similar.
I have copies of both Days of Atonement and Angel Station if those would be helpful.
A quick search shows that Knight Moves does have a scan floating around.
I don’t have the digitized books to offer (or copies to scan) but I do have a tip on a great scanner for books– the Plustek Opticbook 3600. Costs (a lot) more than the 30 buck printer/scanner combos you can buy at Wal-Mart, but worth it if you have a lot of books you want to digitize non-destructively.
http://plustek.com/usa/products/opticbook-series/opticbook-3600-plus/introduction.html
I can scan in Knight Moves, if you still need that one.
i’ve a decent copy of knight moves and a scanner. tell me where to send jpgs…
Count me in for proofreading. I do a lot of it and I’m always up for reading your work. And my sister has a document feed scanner (she’s a CPA) that I can borrow. I’ve got both Days of Atonement and Angel Station in paperback, but finding them in the boxes is always a pain. I might have Knight Moves, too; not sure.
I have an actual paperback copy of Days of Atonement somewhere on my shelves. I’d be willing to send it to you free of charge if you’d like to have an original to scan yourself. I entered my email as prompted, feel free to use it to contact me.
I have a copy of Knight Moves. It took a bit of searching to find it. How soon do you need the scans? I’m not going to proof the text though. That’s what http://www.pgdp.net is for. I found out you were searching from Boing Boing.
Greg Weeks
Amazon has KNIGHT MOVES in pb for as low as a penny plus shipping – I’d be happy to get a copy and have it sent if that would be helpful.
I also have a copy of Days of Atonement and Angel Station. Not a lot of free time to scan, however.
I’ve found a torrent containing Knight Moves if you are interested in a pirate copy of it. It is a collection called “Fantasy & Science Fiction Authors UVWYZ – PDF” mentioning the file “Walter Jon Williams – Knight Moves.pdf” in the details. I can download a copy and email if you like, it’s only 136.05 MB for the whole thing so even if it’s a single archive it shouldn’t take too long.
I’ve got copies of both the UK and USA first editions of Angel Station.
What? No, I was just boasting. It’s such a great book. Why hasn’t anyone turned it into a movie?
What file format and resolution would you like?
Scan, convert images to pdf, run Acrobat Pro OCR to
put text ‘underneath” the image?
I have Days of Atonment and Angel Station in hardcover.
(Signed, by the way)
Did you try the bowels of IRC?
I found Knight Moves on one server which I forwarded you a copy.
I’ve got a copy of Knight Moves. No scanner though.
I, too have been a volunteer at Distributed Proofreaders, and I have one of the better book scanners. I have all three books in paperback, which is harder to scan well due to the narrower margins. For this project, I think I could get two of them in hardcover through my library (one ILL, one local) for better quality scans. I’m not a collector of author signed books, would it be possible to get ebooks rather than paper?
I’m assuming you won’t approve want to approve this one. Just out of curiosity, I scanned and OCR’d the first two chapters of Knight Moves, which is the one that I don’t think I can get in hardcover from the library. I had to rescan a few pages because part of the page was too close to the edge, and I had to adjust a couple of text area boundaries in FineReader 10. I didn’t actually proofread it, but from just eyeballing the results it looks to me like the OCR is pretty clean. The biggest problem with it is that it’s creating paragraph breaks at some page boundaries. They’re easy to spot because they are paragraph breaks that are not indented.
I’ve only put up the FineReader HTML output of the two chapters I’ve scanned here:
http://www.zuhause.org/knight_moves_ch1-2.htm
If you want me to continue on this, I expect that you’ll want the scanned images as well, and maybe some other output formats from FineReader. It can write Word 2007, RTF, OpenOffice formats. The raw images are about 1MB each, when I create images for proofreading at PGDP, I usually postprocess them down to about 100k, which are not really suitable for OCR, but are usually clean enough for proofreading.
Oh, and I see that it had some issues with punctuation and italics, like on page 9, it didn’t OCR a ? and ! correctly.
And a couple other pages, 21 and 23 aren’t identing the beginning of the paragraphs.
I managed to score a real-life copy of the elusive Solip:System last year, so I’ll give it a shot.
I only own paperbacks of ‘Hardwired’ and ‘Voice of the Whirlwind’, but I’ll ask around the secondhand bookshops of London to try and buy the missing ones for you.
Andrew Robinson, former party leader, Pirate Party UK.
I’ve sourced a copy of Knight Moves that can be scanned, so we’ve got all 3 between us. What’s the best scan format for you guys, or should I get it mailed to someone with a better scanner than my 600dpi flatbed?
Can the proofreading process be easily broken down into manageable chunks for crowdsourcing? If so, we can put the word out and probably get a few thousand eyes on it fairly quickly.
I have an old paperback version of ‘Days of Atonement’ that I can separate from the binding and run through a copy machine (saves to PDF) and then OCR. Do you want the original PDF scans and the text file?
Let the crowd proof it for you 🙂
kto
Either “from there” or “thence” not “from thence.” The former is standard, the middle is archaic, and the latter is poseurishly snooty.
For the Old Earth Books reprint of Clifford Simak’s novel “Way Station” we also used a pirated online copy. Obviously it was proofed, but still …
I have found Knight Moves. The quality is good.
Send me an email and I will send you the PDF. My address is violetagris(at)ymail.com.
Hi Mr. Williams,
You might be interested in the open source book scanner project. A colleague of mine built one for a local library, to digitize out-of-copyright works. It’s surprisingly effective, won’t destroy the original book, and can be built in a weekend by a geek of moderate skill.
http://diybookscanner.org/forum/viewtopic.php?f=1&t=262
http://diybookscanner.org/forum/viewtopic.php?f=3&t=302&start=0
I have just mailed you scanned copy of Knight Moves.
Hi,
If you can’t find them, I have no problem in retyping… for free.
I have copies of of all 3 books (paperback) and would be honored to help the project.
I have Knight Moves here. I will email it to you momentarily for your perusal.
I’m game– I scan stuff for Distributed Proofreaders (Hi, Zora!), and have hard copies of Knight Moves (Tor) and Angel Station (Orbit, UK). And a scanner, ABYY FineReader, and copies of DP’s pre-proofing and post-proofing processing software. I can scan Angel Station there if there are no other electronic copies sitting on someone’s hard drive already….
I have Knight Moves. But scanning a whole book (without destroying it) is a lot of work for an autograph! Would you consider subdividing the task, putting up a doc where people could “claim” and scan, OCR, and proofread specific pages? I’d be happy to do a few.
I’d also be willing to accept reduced compensation—say your initials on a blank page.
Dammit, and I just got finished my own edit of “City On Fire” (finished “Metropolitan” about a week ago.) I think the biggest pain in the ass was fixing all the quotation marks–for some reason, whoever scanned it turned all the double-quotes into singles. ‘The result was that all the dialogue looked like this’, he said, ‘and it’s necessary to go through every sentence.’
PS: Jesus, Walter, you sure do love hyphens! If I never have to paste — again it’ll be too soon!
If you didn’t mind the copy being torn apart, a clean scan would be pretty easy and I have a pretty good way of pulling the text from some kinds of scanned documents. On the other hand, leaving the book intact makes it re-usable but scans aren’t likely to be as clear. Either way, I don’t have any copies of the books and would need to obtain one to scan it.
Oh, and: “Constan-tine” “Con-stantine” It was funny to see all the places where the scan hadn’t realized that something was a hyphenation.
I’ll clarify: I have hardcopies of those two books which I’d offer up for scanning. I’d offer to scan’em, but my flatbed scanner gave up the ghost some time back.
Of course, you didn’t steal from “the pirates”, and they didn’t steal from you. You may already be aware of that… But I think the confusion between copying and stealing is a bad thing.
Oooh does this mean I be able to buy an e-copy of Solip:System? Splendid!
Yeah I have Angel Station in pb and have a scanner. Guess he already has the other 3 books by him that I have scanned.
My local library have Angel Station and Knight Moves. Ping me via email if you’d like me to scan them.
No signal. Sorry. Just wanted to say that, after reading the comments to this post, I have to say, you have some very cool fans.
And kudos to you for making a minus into a plus. Congratulations, and good luck with the eBook ‘re-release’ plan. I truly believe it’s time for writers to take back control of their back catalogues, and yours seems like a very intelligent method of implementing that.
I’d be happy to scan any of those three. I have a nice, big, ex-library edition of Days of Atonement and would love a signed WJW edition (I like Angel Station a lot).
Let me know if I can be of assistance.
You must not be looking very hard. I found Knight Moves in under 2 minutes. Looks pretty clean too.
—
I have a little story that I made up, and I’ll tell it to you if you don’t read that much into it. It’s called the Tale of the Pythian Kassandra, and it’s about a priestess of the Delphic Apollo. I like to think of her as a sturdy, big-hipped woman with a straight Grecian nose and a slight mustache, not very bright, good-hearted, a little vague, and new to the job—the priests never chose the Pythia for her brains, you see; they didn’t want the oracle challenging their power. The oracle was open for business only one month out of the year, so we’ll have to picture this story as taking place toward the end of Kassandra’s busy time; the pilgrims have been winding their way up to the temple for weeks now, and Kassandra’s
been breathing the inspiring vapors so often she’s half-addled.
Wow, the one day I get boing’d, I’m traveling and have no Internet! This is posted off a borrowed computer in a hotel lobby.
Okay, I’ve now got multiple copies of Knight Moves, so thanks everybody.
I’ll have to get back to y’all later on the other books, after I get home and have time to organize my response.
Might be a good idea to organize ourselves and spread this out. There are two books left to do. Contribute two, old, sacrificial books. Cut off the spine with a guillotine and send the pages through the offered document feed scanner. 300 dpi, save as .pngs. One or two people to do the OCR. Two people to proofread each book — two passes. Let WJW do the collecting of email addresses and organizing of the group. The more people involved, the faster we’ll get this done.
I should perhaps mention that I have a new, fast computer and the latest Abbyy. I could do the OCR in under half an hour. But if someone else has the same setup and wants to do it, fine.
If Greg Weeks wants to give the files a final once-over and validate the XHTML, do accept his help. He’s the guy who has organized a lot of the sf at Distributed Proofreaders. You have him to thank for the etexts of H. Beam Piper and Astounding Stories, among other things.
Oh, and since I forgot to say: I plan to purchase these as soon as they’re made available. I’ll only steal it if I can’t get it (…did that make sense?)
I’ve been producing out of print and hard to find ebooks for years and years – up into the 4 digits long since. The thing most people really don’t realize is that proofreading and formatting are the key to a decent ebook… there IS no way to produce a “clean scan” with only a scanner and AbbyyFineReader. Come on, guys – it’s not brain surgery! Even publishing companies hire proofreaders, or used to, though sometimes these days I wonder.
(Just could not resist putting in my two cents worth, having spent way more energy than I wanted to on trying to get groups of volunteer proofers to do a decent job, before I gave up and started working alone.)
I saw a copy of the original post on a private email list, and I have to say it is the most entertaining of its type I’ve seen. In fact, I don’t think I’ve enjoyed an author letter so much in decades. WJW clearly deserves good clean files of all his books and many sales.
Comments on this entry are closed.