EmailDiscussions.com  

Go Back   EmailDiscussions.com > Miscellaneous > The Off-Topic Lounge
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
Stay in touch wirelessly

The Off-Topic Lounge APPROPRIATE FAMILY-FRIENDLY TOPICS ONLY - READ THE RULES!
This forum is for posting anything (excluding topics prohibited by the forum rules) that's unrelated to email. General discussions, in other words.

Reply
 
Thread Tools
Old 6 Jul 2018, 12:54 AM   #1
communicant
Cornerstone of the Community
 
Join Date: Jul 2009
Posts: 837
Seeking advice about scanning paper to editable text

This is completely unrelated to email, and I am aware there are tech-advice forums where it might possibly be more appropriate to direct a query such as this, but there are also several very tech-savvy and helpful members here, and I hope that one of them may be kind enough to offer advice, or at least point me in the right direction.

For a time, back in the 1980s, almost every room of my house had a 286 running DOS, used exclusively for word-processing. The machines were free, given away by a university that would otherwise have thrown them out, so using so many of them was merely an eccentricity not an extravagance. It was convenient to be able to take up work in any room where I happened to be at the time, using floppy disks to transfer data between machines. In later years when things got more complicated in the cyber world, I began using MAC laptops for the modern stuff, but I was actually glad to continue using the old machines for some categories of work because of their complete security for confidential material -- no internet connection was even remotely possible, so air-gap is an understatement. I tried to beck up material regularly on floppy disks, but inevitably got careless and sometimes failed to back up the latest versions of revised projects.

Anyway, one day nature took its course and the hard drive containing the most recent and crucial material (of course) failed. I had just finished a nine-hundred-plus-page manuscript (when printed double spaced), and I had edited and revised it on paper, but had not yet transferred the emendations to any electronic version. I had also printed out an extra copy of the manuscript for a use that turned out not to be needed, and it sits in a box, clean and unmarked, awaiting my acquisition of knowledge of how best to use it now to save this situation from being more inconvenient than it already it.

I do have an external USB drive that will read 3.5" floppies, but of course the form in which they arrive on a MAC is messy because of the antediluvian word-processing program on which the manuscript was written. Anyway, I couldn't be absolutely certain that I was using the very latest floppy versions. Before the crash, I would occasionally transfer material from a floppy to a MAC using this drive after first converting the file to pure ASCII on a DOS machine. The resulting transfer was relatively clean, although paragraphing was sometimes lost, and diacritical marks in other languages and italics other typographical aspects had to be manually restored, but that was not prohibitively difficult on a short document.

Now, however, the only sure way of getting back to where I was when the hard drive crashed without having to retype almost a thousand pages would be to scan the clean print-out and then take up work using the result on a MAC, as I would have resumed work using one of the old DOS machines if the crash had not occurred.

I have an HP all-in-one machine that includes a scanner I have never had occasion to use, and I presume that this would be adequate for the task, but I have no idea how to scan text into editable form for use on one of the MACS. Can anyone advise me? Should I perhaps scan each page into a PDF? Would that be editable?

Apologies for my ignorance. And yes, I am painfully aware that I am only facing this difficulty because of my careless back-up practices.

Any help or advice that anyone can furnish would be much appreciated.

Many thanks.
communicant is offline   Reply With Quote

Old 6 Jul 2018, 05:38 AM   #2
Adrian Bell
Cornerstone of the Community
 
Join Date: Apr 2001
Location: Darlington, UK
Posts: 925
If you are going to scan the documents so that they will be editable you will need OCR (Optical Character Recognition) software. This may or may not be included with the software that came with your scanner. If not, and the HP is a windows machine FreeOCR http://www.paperfile.net/ works well and is free. Commercial software is expensive.

PDFs are only partially editable, however OCR software will usually extract the text from a PDF, I know FreeOCR will. This page https://www.cisdem.com/resource/how-...oogle-ocr.html says Google will do it also but I've not tried it.

As for the different formats between Macs & PCs, plain text or rich text should be universal I believe (I don't have a Mac though). The main problem I think is the time it is going to take to scan it all.

I don't know what this antediluvian program is but LibreOffice can open and save to quite a lot of formats https://en.wikipedia.org/wiki/LibreO...d_file_formats.

Finally, are you sure that it is the hard drive that crashed and not some other part of the PC. If you have a few PCs you could try swapping it, also hard drive to USB converters are available if the drives are standard. Example: https://www.amazon.co.uk/AGPtek-Driv...+usb+convertor.

Last edited by Adrian Bell : 6 Jul 2018 at 05:43 AM.
Adrian Bell is online now   Reply With Quote
Old 6 Jul 2018, 04:24 PM   #3
janusz
The "e" in e-mail
 
Join Date: Feb 2006
Location: EU
Posts: 4,523
Quote:
Originally Posted by Adrian Bell View Post
The main problem I think is the time it is going to take to scan it all.
... and proof-read it. Depending on the OCR program, the font used in the document, and the state of the paper copy, digit "one", lower case "l" and the upper-case version of "i" are bound to be misinterpreted. The same goes for "zero" and upper case "o".
janusz is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +9. The time now is 08:40 PM.

 

Copyright EmailDiscussions.com 1998-2013. All Rights Reserved. Privacy Policy