|
JRA Software, Publisher
Introduction
SearchPDF 2.0 is the successor to the popular PDF WebSearch 1.1 product, and it features a new JavaScript toolbar. The toolbar provides additional controls for Search Results, such as Hit Page Navigation for DjVu and HTML files, and it permits Search Results Navigation without leaving the document viewing area.
SearchPDF 3.0 development plans are underway. This upgrade will migrate SearchPDF from a classic ASP application to a ASP.NET application. All parameters and settings for custom designs will migrate to a single XML-formatted web.config file. This upgrade will both simplify customizing and add significant new features that are available with Microsoft's .NET platform.
 Advanced PDF Features of SearchPDF 2.0
Search Phrase and Search Word Highlighting
The Search Word “water” is highlighted in yellow
Your search phrase or search words are highlighted in the PDF file. The page automatically opens to the region where the search results are located.
On-demand Page Downloading
The first page of the PDF file opens in order to read the byte-range information for subsequent pages. Then the first hit page automatically opens next. You can see your search results with the download of only two pages. Additional pages will download on demand as you navigate the document. Large PDF documents download and display much faster this way than in normal file viewing, in which every page of the PDF file must be downloaded.
Hit Page Navigation
Hit page navigation buttons are an integrated feature of the Adobe Acrobat Reader plug-in. Normally "grayed out", they turn black (active) when the full-text of the PDF file is searched. In Acrobat 5, the hit page navigation buttons are docked on the left side of the toolbar. In Acrobat 4, the buttons are on the far right of the toolbar.
A Note About the PDF Highlight Color
The highlight color for PDF files is governed by the highlight color of the operating system that you are using. Therefore no highlight color selection is provided for PDF in JRASearch.
Advanced HTML Features
Search Phrase and Search Word Highlighting
The search word “navigation” is highlighted in yellow
Hit Page Navigation
The Previous Hit, Next Hit, First Hit and Last Hit buttons in the toolbar allow you to navigate Hits in the open HTML file. The page will automatically scroll down to display the next hit.
Hit Navigation Buttons are Highlight Color-Coordinated
The Hit Navigation buttons for HTML are color-coordinated to the user-selected highlight color.
Advanced DjVu Features
Search Phrase and Search Word Highlighting
The search word “navigation” is highlighted in yellow
On-demand Page Downloading
The first hit page opens immediately for DjVu files, much faster than for PDF files. Additional pages will download on demand as you navigate the document. You can navigate hit pages using the JRASearch toolbar, or all pages using the DjVu Plug-in toolbar. Large multi-page DjVu documents download and display much faster this way than in normal file viewing. Plus, using the DjVu Indirect Format, a user can jump to any page on demand without downloading the intervening pages. This is not possible with PDF files.
Hit Page Navigation
The Previous Hit Page, Next Hit Page, First Hit Page and Last Hit Page buttons in the JRASearch toolbar allow you to navigate Hit Pages in the open DjVu file.
Hit Page Zoom
You can set the Hit Page Resolution so that it overrides the default page resolution.
The Hit Page Navigation buttons are color coded to match your selected Highlight Color.
Hit Page Navigation Buttons are Highlight Color-Coordinated
The Hit Page Navigation buttons for DjVu files are color-coordinated to the user-selected highlight color. Highlight colors are usually obvious on pages with a white background, but might not be so obvious on a colored magazine page. Pick a highlight color that works best with the documents you are viewing.
Search Features
Multiple Search-Index Selection
Documents can be indexed into a single search-index or they can be organized into multiple search-indexes. Multiple search-indexes are appropriate when document collections are related but distinct from each other. For example, you could create one index for invoices and one index for receipts. In the model application, four search-indexes are used to separate different document types. Simply check off the collections that you wish to search upon.
Search-index Selection
Canned Queries Under Hyperlinks
Canned Queries are pre-defined search queries that can be executed with a single click of a hyperlink.
Canned Queries can be performed against data fields or against the full-text or both. Most commonly they are performed against category-type fields.
For example, if there is an Author field, then a Canned Query can be "List by Author", which will list all documents having a value for Author, sorted by Author.
Canned Query Hyperlinks
Canned Queries can make useful additions to other pages on your website. For example, a Canned Query using the Date field could be used on a page with a hyperlink which reads: “List documents added in the last 30 days”, or “Documents Published in 1998”.
Users can use the Refine Search feature on the Search Results page to query upon the set of documents returned by the Canned Query.
Simple/ Advanced Search Option
The JRASearch search screen interface accommodates both beginning and advanced users. It does so by presenting to the beginning user a simple search interface supported by pop-up Help and Examples. The advanced user can click on the "Advanced" button, and then all the advanced search features are displayed. This design approach best serves the needs of both classes of users without the need to compromise between simplicity and power.
Selecting the Hide Advanced button will return to the Simple Search interface and will also reset the Advanced Settings to their defaults for Simple Search.
Simple Search Interface
Advanced Search Interface
Date Range Searching
You can limit your search to documents within a specific date range.
A date can be entered in many ways. If you are entering March 15, 1999, you can enter this as "03/15/1999", "3/15/1999", "3-15-1999", or "19990315". You can enter "1999" or "199903". You can even enter "Mar. 15, 1999" or "March 15, 1999". Date ranges are inclusive of the From and To dates entered.
Set fuzzy level in advanced options
The advanced user can select the "fuzzy" option when performing a search. The user can specify the level of fuzziness between 1 and 9.
Display Summary Option
Users can select to display or hide document summaries. A Summary is either a description of the document provided by the author and stored in a summary field, or if there is no summary provided, the first 100 or so words of the document.
Turning off summaries provides less information on the results list, but then a greater number of search results can be displayed on one screen.
Set Sort Order based on any Field
By default, the sort order is by Hits, in descending order, with a secondary sort by Title. This is referred to as Revelancy Ranking.
The user can select an alternate sort order if desired. For example, a Sort Order based on Author will group all search results by the same author together in the list.
Set Result per Page
The default number or search results to be displayed on the results screen is 5. A user can change the number of results with this setting. Selecting All will display the results all in one long scrolling page, up to a maximum limit of 300.
This setting is used to indicate a preference for scrolling search results either horizontally as separate pages, or vertically as one page.
Search upon the contents of any field
A normal search will search upon the full-text of the document. With this setting, the user can search just upon a specific field. Selecting the Author field and entering "Smith" will return all documents authored by anyone named Smith.
It is possible to enter both a field query AND a full-text query for precision searching.
Pop-Up Context-Sensitive Help for advanced options
Click the label of any of the advanced options, and a context-sensitive pop-up screen will appear that explains the option in detail.
The advanced search options are Fuzzy, Stemming, Phonic and Natural Language. Thesaurus is an additional search option which can be implemented, but which is not part of the model demonstration.
Publisher Identification Area
The left portion of the search screen is reserved for publisher identification. This identification is constructed in a separate Include file, and may contain both text and graphics elements. This area of the search screen can also be customized for the selection of multiple search libraries, date searching and other field searching options.
Hyperlink Tabs
Hyperlink Tabs can be customized for the application and provide supporting information for the document collection. In the demonstration, links are made to pop-up windows. Hyperlinks can also be created to link the search page to other sections of the web site.
Interactive Help for Searching
Search Help is organized in a fashion that makes it intuitive to find the information you are looking for. The help text is organized into topical sections that are accessed from a drop down choice list.
You can look up the help you need by using any of the following sections:
HOW CAN I SEARCH FOR…
WHAT IS…
WHEN SHOULD I USE…
HOW DO I…
Learning by Example
Examples provide an excellent introduction to the search environment and provide a demonstration of Boolean search commands and syntax. When you click on an Example query, the Example box remains visible while the search results are displayed. This makes it easy to try other examples.
Displays well at any screen resolution
The index card design of the search screen fits nicely on a 800 x 600 display monitor, yet also is attractive at higher resolutions. The interface therefore adapts well to any size monitor. It even displays well at 640 x 480 resolution (14" monitors), which means that the interface is suitable for any monitor in use today.
Search Results Features
Layout is segmented into Toolbar and Body sections
The layout of the search results screen is divided Toolbar and Body sections. The Toolbar remains visible at all times while the Body section scrolls to display the search results.
In addition, the Publisher Identification is provided at the top of the screen. The Header and Footer sections can easily be customized for the document collection which is being hosted. The Body section displays the actual search results in a dynamic recordset that presents the results of the search query to the user.
Search Results Toolbar
The Toolbar remains visible at all times while the Body section scrolls.
Body Section
The Body Section contains:
The Search Phrase
The number of documents returned
The Search Attributes
The Search Results listing
The Page Navigation Control
The Body Section scrolls to display all the results of your query.
Display search results with document metadata
Metadata (index fields) are used to display meaningful search results to the user. The placement of fields can be specified to complement the collection. Fields can placed in either columns and rows. Column and row fields are re-sortable. In addition, a summary field can be displayed under column values and before the optional row fields are displayed.
In the model application, the Title and Author and Date fields are displayed in columns, followed by the Hits field, which is generated by JRASearch in response to the search query. The Summary field is displayed below the column values, followed by the Publisher row field.
The Summary Field
The Summary Field will display the contents of a stored metadata field is this available in a document. This field may contain up to 512 characters. This field can store not only a brief summary of the document but also user comments about the document.
The model application demonstrates the technique of concatenating multiple fields together for display in the summary area. The fields Subject, Summary and ISBN are concatenated in this demonstration. If there is no stored summary field, then the first 100 or so words of the full-text of the document will be displayed.
Summaries Visible
Summaries Hidden
Results Per Page Setting
Search Results are divided into pages for ease of navigation. You can easily modify the number of results in a page even after a query is performed, by using the Results setting in the Toolbar.
Setting the Search Term Highlight Color
The Search Term Highlight Color can be changed for DjVu and HTML documents in the Toolbar.
Results are organized into pages, with AltaVista-like page navigation
Pages can be navigated using Next and Previous buttons, and you can also "jump" to any page.
Re-sort results based on any column or row field
Standard Sort Active Neutral Alternate Sort Active
By clicking on the Column Field Title or the Row Field Title, or the sort-status arrow to the right of the title, the search results will resort based on that field.
Each field has a Default sort order and an Alternate sort order. For example, the Default sort order for the Hits field is descending order by number of hits, with a secondary sort by Title in ascending order.
A Field Title or the arrow next to it, when clicked repeatedly, will toggle between Standard, Alternate and Neutral states.
Current sort order indicated by status arrows
The Sort Order Status Arrow is located to the right of the field name for both column fields and for row fields.
 It the sort-status arrow is pointing to the right  , this means the field is not currently used for the sort order (it is neutral).
 If the sort-status arrow is pointing up  , this means the Standard sort order for the field is being used.
 If the sort-status arrow is pointing down  , this means the Alternate sort order for the field is being used.
Advanced Sort and Display Options for Author and Date Fields
An advanced sorting option is available for Author and Date fields, and an Advanced display option is available for a Date field.
Advanced Author Sort
The Advanced Author Sort, when applied to an Author field, changes the default sort method of sorting based on the first characters of the field. This default sort method will search on an author's first name, when the name is entered in the normal format of "firstname middlename lastname".
The Advanced Author Sort will sort based on the author's last name. When there is more than one author, the multiple author names should be separated by commas in the field. Then the Advanced Author Sort will sort on the last name of the first author in the list of authors.
Advanced Date Display
A date field is stored for search purposes in the format "YYYYMMDD". Advanced Date Display permits the date to be reformatted for display on the search results. The display may be changed to one of the following formats:
"MM-DD-YYYY"
"DD-MM-YYYY"
"XYZ-DD-YYYY" (where XYZ is a 3-letter abbreviation of month)
"DD-XYZ-YYYY"
“Year YYYY”
“XYZ. YYYY”
The dashes ( - ) may be replaced by a forward-slash ( / )
Advanced Date Sort
The Advanced Date Sort, when applied to a Date sort, changes the default sort method of sorting which is based on the first characters of the field. It permits sorting in Chronological order.
Perform a Refine Search when the search method is Boolean
When the default Boolean search method is used, then the Refine Search query may be entered. The Refine Search will narrow your search and will further limit the number of search results.
For example, if your first query is "Author contains Smith" and this produces too many documents, you may realize that document you have in mind discusses icebergs. Enter "icebergs" in the Refine Search box. Your original query phrase and your Refine Search phrase will look as follows:
Author contains Smith
(Author contains Smith) and (icebergs)
Perform an Expand Search when the search method is Natural Language
When the Natual Language search method is selected in the advanced options, then an Expand Search query may be entered. Field searching and boolean operators are disabled when using the Natural Language search option. Instead, all entered words are searched and the results are ordered by the frequency of all the words in the document. Natural Language is an in inclusive rather than exclusive selection process. Think of Natural Language searching as keyword searching against the full-text of the document collection.
If your first query was "apples pears" and you did not find the document results you were exactly looking for, you could enter "orchards" as an Expand Search and this would list documents discussing apples, pears and orchards. Your original query phrase and your Expand Search phrase will look as follows:
apples pears
apples pears orchards
Query Phrase is displayed with number of results
To the right of the the Query Phrase is listed the number of documents returned by the query.
Documents are displayed in results with a File Type Icon
 DjVu  PDF  HTML  XML  JPEG  TIFF  Generic Container
The File Type Icon is displayed to the right of the Document Title, and identifies the document file type. In Internet Explorer, a Tooltip identifies the HTML or XML file type by name when the cursor is positioned over the icon. PDF and DjVu icons have a Tooltip that asks if the user would like to get a plug-in (below).
Containers support any File Type
Containers are XML files that can contain the metadata for any file type that cannot store it internally. The XML Container can contain only index data, or index data plus either the full text or commentary on the document that is "contained". Examples of documents that can be contained include LuraDocument (LDF) and CAD vector formats like DGN.
File Type Icon is used to get supporting viewers
 Get Acrobat Reader?
 Get DjVu Viewer?
 Get TIFF Viewer?
These file types have supporting viewer plug-ins. Clicking one of these icons will open a new browser window with a page where you can download the free viewers for these file types. Tooltips on each icon serve as reminders.
Other Advanced Features
Advanced Search Options are displayed with search results
Fuzzy: 3 Stemming: ON Phonic: OFF Natural Language: OFF
The Advanced Search Options are displayed for reference at the top of search results.
Non-frames design maximizes screen real estate
By avoiding the use of a left frame for search results and search controls, maximum space is provided for the fields that describe a document, and then for the document itself when it is displayed.
Search results are scalable to any display resolution
The search results recordset will re-scale itself to any display resolution!
Phrase Hit Counting
Normally the number of hits displayed by a query is equal to the number of times that each word in your query occurs in the document. JRASearch reduces the hit count when a search phrase, like "English History", is used. The number of hits will then represent the number of times that the complete phrase occurred in the document.
Navigate Search Results from the Toolbar
The Previous Result and Next Result buttons on the Toolbar permit you to navigate the Search Results list without having to return to the Search Results Page.
The Toolbar displays the current Result number and the total number of Results for your query.
Display Document Information
Click the “i” button on the Toolbar to display the Document Information (metadata) for the currently-open document.
Search Within Document
The Search Within Document feature allows you to perform a Boolean query for the currently open document. This will result in a new set of Hit Pages for the document. This is faster and more powerful than using the default Find feature for an open document.
|