Presentation of scanned Clippings in PDF and DjVu
Clippings Defined
Clippings are articles from newspapers and magazines that are first cut from the issue in which they are printed, then scanned to become a TIFF or JPEG image. TIFF is used for black and white scans, while JPEG is used for color scans.
Step 1 - Example Scanned Clippings
Here we present 4 example newspaper clippings that were scanned in black and white. We show them in image-only PDF format and in image-only DjVu format, since TIFF files cannot be natively viewed on the web. These clippings are named with an abbreviated code identifying the publication name, followed by the date of publication:
Step 2 - OCRed Clippings
The digital clipping files can be run through an OCR (optical character recognition) process, and the recognized text is placed behind the image. The image plus the hidden searchable text can be saved in either PDF or DjVu formats. Both of these formats support a hidden text layer which can be copied and searched upon.
Step 3 - "Matted" Clippings
The digital clippings that have been OCRed can now be placed on a standard letter-size page. This has the benefits of creating margin areas where additional information can be placed, and it centers the clipping when it is printed on letter-size paper. A bounding rectangle can be drawn around the clipping, as demonstrated. If the clipping is oversize, it can be scaled to fit the letter-size page.
Step 4 - Adding information to the header and footer areas
Now we can put additional information into the header and footer areas of the page. We can place:
A collection title
Document information fields
A collection logo
A legal disclaimer
A watermark
The final result (using a disclamer and DocInfo fields)
Acrobat 3.0 or above required for viewing
PDF
|
Size (Kb)
|
DjVu
|
Size (Kb)
|
|
|
227,707
|
|
150,942
|
|
|
83,625
|
|
50,284
|
|
|
141,156
|
|
80,044
|
|
|
110,140
|
|
57,154
|
Acrobat 5.0 or above required for viewing
JBIG2-Compressed PDF
cVision
|
Size (Kb)
|
JBIG2-Compressed PDF
Capture
|
Size (Kb)
|
|
|
165,060
|
|
172,072
|
|
|
56,344
|
|
64,984
|
|
|
88,373
|
|
104,786
|
|
|
85,087
|
|
88,317
|
Size Comparison Analysis
PDF 562,628 100%
JBIG2-PDF (cVision) 394,864 70%
JBIG2-PDF (Capture) 430,159 76%
DjVu 338,424 60%
|