|
JRA Software, Publisher
PDFMetamaker 3.0 Upgrade Specification
Overview:
PDFMetamaker 3.0 is a major upgrade to the PDFMetamaker 2.0 product. PDFMetamaker 2.0 is a web server application for metadata entry into PDF files.
PDFMetamaker 3.0 is now both a web server application and a desktop application for the adminstration and management of metadata schemas and data. This upgrade adds robust metadata management features to the existing metadata entry features of the product.
New Features:
Desktop Application for Metadata Management
Complementing the web-server data entry features of PDFMetamaker is a new desktop application. This application uses the QT GUI development SDK, and is designed for multi-language support. Then intial supported languages are English and Spanish. This application provides an easy-to-understand interface for managing metadata.
Metadata Schema Definition
Supported in this upgrade is metadata schema definition. A field dictionary allows you to define the core attributes of fields, which can then be combined in metadata schemas (sets of fields for specific document collections).
Metadata schemas are stored as XMP-compliant XML files. This is the new storage architecture for metadata that is promoted by Adobe for PDF files, and which became available with the updgrade of the PDF format to Version 1.4 two years ago, when Acrobat 5 was introduced.
Metadata Schema Extensions for
An extension of the schema allows the use of metadata fields on the Search and Search Results pages of SearchPDF to be defined. This will largely automate the creation of custom SearchPDF interfaces, and it will also allow the use of metadata fields in the interface to change dynamically, based on the document collection that a user has selected for searching.
Full Support for XML Packet data in PDF files, with XMP Standards Compliance
The management of metadata is facilitated with new features that permit embedded PDF metadata to be accurately duplicated in both the DocInfo dictionary of the PDF and in the XML Packet Data area of the PDF file. This dual-storage approach is recommended by Adobe.
Embedded document metadata can by synchronized between a document collection and an external database file. The adminstrative application of PDFMetamaker 3.0 can be used to perform batch updates to the metadata fields in a document collection, and to add and remove fields for the entire collection. Document metadata may also be exported to a spreadsheet for editing.
Folder-level to Document-Level file splitting
The ability to split large, folder-level PDF files into multiple smaller, document-level PDF files, including individual document metadata assignment.
Conversion management and controls
Database reports will give you an overview of your conversion or metadata assignment progress. You can also monitor the performance of data entry personnel.
Both of these new features enable a scanned document collection to be published on the web immediately, with a minimum of labor, and then progressively refined later on.
Metadata entry enables structured queries and superior document descriptions, but it is a manual labor process, and much slower than physical document scanning. PDFMetamaker lets add metadata AFTER the document collection is already hosted on the web, with no time or productivity pressure.
User Logon
Only authorized users can use PDFMetamaker. A user table is maintained, Users will log on and then a user log will be maintained.
"Get Files" Function
"Get Files" is a button that will load a batch of PDF files to be processed in PDFMetamaker. It can be configured to be either "get a set number of files from an input folder", or "get a set number (like 1) of subfolders from the input folder".
Files or folders that are "checked out" by an operator will be reserved and unavailable to other operators. Files or folders that are "checked out" are copied to a temporary folder for processing as a batch.
"Single Pages" Function
A new option will split all PDF files that are "checked out" from multi-page PDF files to single-page PDF files. This is useful and needed when folder-level PDF files are to be split into document-level PDF files.
New PDF file editing functionality
The single-page PDF files will be navigated using the document navigation controls already present in PDFMetamaker. The "Save and Next", "Save" and "Undo" buttons will not be modifying the metadata stored in the PDF file, as is done in version 2.0, but rather they will be modifying the metadata stored in a database file for the batch.
The operator will navigate through single-page PDF files that are not the first page of a document, until the first page of document is displayed. The operator will then enter the metadata for this document. The entered metadata will be stored in the JRAMetamaker database.
New Commit Function
When the operator has completed metadata entry for the first page of every document in the folder, the operator will click the "Commit" button. This will release the batch and the operator can immediately create a new batch if desired.
Periodic Batch Updates
A batch update function can be scheduled by the administrator. The Batch Update function will run on the web server at the scheduled frequency, usually at night, and usually just before the search-index is updated. The following operations will be performed.
1. The single-page PDF files will compiled into multipage document PDF files, based on the
separation information stored in the database file for the batch.
2. Each document PDF file will get its own metadata based on entries stored in the data file.
3. Each document PDF file will be properly named according to naming rules.
4. The document PDF files will be written into a new folder under the output path.
5. The original folder under the input path will then be deleted or moved to an archive
location.
New Adminstrative Interface
Administrative functions such as creating new operator logons, scheduling batch updates, and generating progess and status reports will be available in a GUI interface.
|