Latest Digital Document Formats for Scanned Manuscripts
September 21, 2003
Introduction
This study uses a 382-page Latin Manuscript that was scanned in color by Rutgers University at a resolution of 300 dpi. The objective is to convert these raster-image scans into a digital document format that is can be effectively used on the web.
We converted to both PDF and DjVu formats. For PDF, we used the new JPEG2000 compression from Adobe and from Algo Vision LuraTech. We also created MRC-PDF (Mixed Raster Content PDF) using the new "Adaptive" segmenter in Acrobat 6, and using the LuraDocument.jpm segmenter in the brand-new LuraDocument.jpm PDF Compressor product.
The results (ordered by size from largest to smallest)
Acrobat 6.0 viewer or above required for viewing these PDF files
DjVu Web Browser plug-in required for viewing these DjVu files
Filename
|
Compression Type
|
Size (MB)
|
|
|
JPEG
|
|
|
|
JPEG2000 - Medium
|
|
|
|
JPEG2000 - Minimum
|
|
|
|
IW44
|
|
|
|
IW44
|
|
|
|
JPEG2000 (default)
|
|
|
|
Adaptive PDF 1.5
(JPEG2000 & JBIG2)
|
|
|
|
LuraDocument.jpm
(JPEG2000 & G4)
|
|
|
|
DjVu Segmented
(IW44 & JB2)
|
|
|
|
DjVu Segmented
(IW44 & JB2)
|
|
* these formats do not exist yet. They are only proposed.
Comparison of Average Page Sizes by Digital Document Type
File Type
|
Average Page Size (KB)
|
PDF with JPEG page images
|
474 KB
|
PDF with Acrobat JPEG2000-medium
|
400 KB
|
PDF with Acrobat JPEG2000-minimum
|
353 KB
|
DjVu Photo
|
261 KB
|
PDF with LuraTech JPEG2000
|
223 KB
|
PDF segmented with Acrobat Adaptive
|
141 KB
|
PDF segmented with LuraDocument.jpm
|
75 KB
|
DjVu Segmented
|
44 KB
|
|