Tom Swift Jr. E-books
My Kindle comes with me on work excursions and I wanted to include the Tom
Swift Jr. books in my portable library. When traveling or really anytime
when there isn't enough peace and tranquility to read articles requiring
great attention, these books are a fun diversion. Here is the method I use:
- Photograph the pages.
- OCR the text using tesseract.
- Typeset ocr'd text using LaTeX.
- Manually compare paper book to OCR text to include italics, embedded
letters, and anything else other than plain text paragraphs.
- Convert to HTML using latex2html
- Use calibre to convert html to
epub and mobi formats.
- Scan illustrations at high resolution (easier to clean).
- Use gimp to clean.
- Use LaTeX to insert into
document.
Doing the above yields plain text, pdf, epub, and mobi formats. I have
two pdf formats, illustrated and unillustrated. To save space I
don't use illustrations in e-book versions. After all that work I now
greatly enjoy having electronic versions of the 33 paper books on my shelf!