Google, using optical character-recognition technology,
can now scan PDF files, turn them back into text and make them available as
search results via the 'View as HTML' link. By turning images of text into
text, Google expands its already massive index. Google's approach doesn't
obviate the need to consult the scanned file, however, if it contains images or
diagrams. While Google appears to do a good job of converting text, its scans
omit graphics. Perhaps in time its engineers will be able to isolate graphic
elements in scanned PDFs and insert them into its HTML conversions. One
unfortunate consequence of this is that personal information like Social
Security numbers that might have gone unnoticed in scans of court documents may
now be discoverable through a Google search. Public.Resource.org, a project that
aims to make public government publicly accessible, recently found about 1,700
documents with Social Security numbers or alien identification numbers out of a
corpus of 2.5 million court documents that go back decades.