Tom Swift Jr. E-books

My Kindle comes with me on work excursions and I wanted to include the Tom Swift Jr. books in my portable library. When traveling or really anytime when there isn't enough peace and tranquility to read articles requiring great attention, these books are a fun diversion. Here is the method I use:
  1. Photograph the pages.
  2. OCR the text using tesseract.
  3. Typeset ocr'd text using LaTeX.
  4. Manually compare paper book to OCR text to include italics, embedded letters, and anything else other than plain text paragraphs.
  5. Convert to HTML using latex2html
  6. Use calibre to convert html to epub and mobi formats.
  7. Scan illustrations at high resolution (easier to clean).
  8. Use gimp to clean.
  9. Use LaTeX to insert into document.

Doing the above yields plain text, pdf, epub, and mobi formats. I have two pdf formats, illustrated and unillustrated. To save space I don't use illustrations in e-book versions. After all that work I now greatly enjoy having electronic versions of the 33 paper books on my shelf!