PDF WebSearch at MIT
Division 3 of MIT - Lincoln Laboratory uses PDF WebSearch for automated, web-based document management
August 16, 2000
Division 3 of MIT - Lincoln Laboratory, located in Lexington, MA, has implemented an innovative solution for document management on their corporate Intranet. Web-based document management is seamlessly integrated with calendar and project management. Document Management is now fully automated, with users making document submissions directly into the system.
Document Management is accomplished with the central use of the Acrobat PDF format and PDF WebSearch. By converting all source documents, including paper, into the PDF format, a common format is achieved with all the benefits of robust uniformity.
PDF WebSearch builds the search-index for the growing collection of PDF files. The search-index is updated every 5 minutes. The search-index contains not just the full-text of the PDF collection, but structured index data as well, retrieved from custom index fields that are stored in the PDF documents.
Documents are retrieved by a combination of structured lookups using the index fields, and full-text queries. These two retrieval methods can be used separately or together. In addition to the traditional search screen for performing document queries, filter-queries are made from hyperlinks that are generated throughout the site. These filter-queries present subset lists of documents. For example, all documents related to Project "A" that were posted in the last 30 days. These dynamically created lists replace HTML pages that in the past were static, and had to be updated manually. To post a document to the web before, up to 5 HTML list-pages had to be updated!
It couldn't be easier for users to submit new documents to the system. A submission form is tailored for each user with menu links that match their role in the organization. The submission form asks for the index field data and the source file. Once the user submits a document, the automated process begins. The source document is converted to PDF by activePDF DocConverter. Then the new PDF file and the source document are moved to a web folder. The index data is placed into the PDF file by JRA MetaMaker. The next time the search-index is updated (within a few minutes), the PDF document appears on HTML lists and can be found with search queries.
For Division 3 of MIT - Lincoln Laboratory, the ability to highlight the search terms in the PDF with the results of a query was an absolute requirement. PDF WebSearch provides this feature, along with intelligent downloading of hit-pages to the browser, and the ability to navigate between hit-pages. These features provide precision in searching - down to the word level.
The ability to customize PDF WebSearch was essential. Division 3 added their own custom-generated menus at the top of the search and retrieval screens, and modified the layout of data on the Results screen. These customizations allow PDF WebSearch to integrate with the "look and feel" of the other functions in the Intranet portal.
Older legacy documents that exist only in paper form are being scanned and converted to PDF with OCR software that recognizes and creates the searchable text. These PDF files are added to the web folder and co-exist with PDF files created from digital source files.
Division 3 of MIT - Lincoln Laboratory has taken the intranet portal to a new level of usefulness by adding automated document management to the equation with PDF WebSearch.
|