ME-CE PDF Examples and Application Support
Introduction
ME-CE is the support for Middle Eastern and Central Eastern versions of Acrobat and Acrobat PDF, which add suport for those non-latin languages which are used in the Middle and Central European Areas, and which must be supported with Unicode and Unicode-enabled applications.
From a business perspective, these areas are defined by the agreement between Adobe and WinSoft that WinSoft distributes Acrobat 4.x ME & CE in the Middle Eastern and Central European geographic regions.
PDF Examples
Following are 3 examples of Hebrew PDF and 1 example of Arabic PDF.
The first example is the 32-page brochure for Acrobat 4.05 ME. Acrobat Reader 4.0 ME is not installing the bookmarks correctly. It looks as if the font used for bookmark text cannot handle unicode Hebrew. The additional single-pages do not have bookmarks and display fine.
The Aribic example is taken from the WinSoft site.
Use of Acrobat Reader ME in the Web Browser
The Acrobat Reader ME can be downloaded from Adobe using the link to the left. It is available with Arabic, English, French and Hebrew interfaces.
Adobe cautions that you must install Acrobat Reader 4.0 ME as a separate application. It cannot be installed as a user interface option with the Roman or Asian language versions of Acrobat Reader 4.0.
Platform requirements for Acrobat Reader ME are the same as for the english version of Reader.
WinSoft unicode fonts in Acrobat Reader ME are provided only with Acrobat Reader ME in order to display Arabic and Hebrew text. Arabic and Hebrew text will not work with the english version of Reader, and so it is provided in the Acrobat Reader ME as a separate application. Only one Reader viewer (regular or ME) can be configured for use as the PDF Reader Plug-in in the web browser.
We have been testing the use of Acrobat Reader ME for viewing various PDF files with Roman text and find no problems using it as a replacement for the english version of Reader.
WinSoft in Grenoble, France is the distributor of Acrobat 4.05 ME and CE
(note that Adobe continues to be the distributor of Acrobat 4.05ME and CE Readers)
Specific Options for the Middle Eastern Languages in Acrobat 4.05 ME
Main new features are
* Arabic & Hebrew User Interface for Acrobat Reader
* ME Adapted Forms,
* ME adapted Annotations
Copy/Paste, Find, Annotations, Forms
Copy/Paste
With Adobe Acrobat 4.05 ME you can re-use Arabic and Hebrew texts from a PDF file, as you do with English texts: you select the desired text, copy it, switch to a text-capable application and paste. The Arabic/Hebrew text is transferred to the application as text, and will be displayed correctly if the application is able to handle correctly the Arabic or Hebrew script. When you select some text across several lines, Adobe Acrobat 4.05 ME guesses wether the selected text is Arabic, Hebrew or Roman and extends the selection to the end of the lines (on the left or on the right) automatically. The selected text also keeps the fonts, size and style, which are applied when the text is pasted in the new document.
Find
The find command of Adobe Acrobat 4.05 ME version has a specific option for Arabic and Hebrew: ”Ignore accents”. When checked (default option), it allows you to find a string of text whether it contains some accents (diacritics, vowels) or not. If the option is unchecked, you must type exactly the string you are looking for, including diacritic signs. The find command also ignores the kashidas which might be included in the text of the PDF file. If you would like to find a word including a manual kashida in a specific place, you must type the kashida character in the word you want to find.
Annotations
With Adobe Acrobat 4.05 ME you can type Arabic and Hebrew texts in a PDF text annotation exactly as you would do with English. A set of Arabic/Hebrew fonts are installed with Adobe Acrobat 4.05 ME, designed to look and print exactly the same on the Macintosh and Windows platforms.
Forms
Adobe Acrobat 4.05 ME makes it easy for you to create, fill in, and submit electronic PDF forms including Arabic or Hebrew fields.
Specific Options for the Central Europe Languages in Acrobat 4.05 CE
The Central Europe (CE) release of Acrobat 4.05 has been adapted by WinSoft to better support the Central European scripts for creating cross-platform PDF documents with embedded Central European fonts, then displaying and printing such multilingual documents on any computer or printer with the highest quality.
Moreover, documents created with Adobe Acrobat Distiller 4.0 and using state-of-the-art Unicode Central European fonts, allow such features as Bookmarks, Find, Copy/Paste, Annotations and Forms, using Central European text strings
The Acrobat Reader CE 4.05 is currently only available with an English interface.
Copy/Paste, Find, Annotations, Forms
Copy/Paste
With Adobe Acrobat 4.05 CE you can re-use Central Europe texts from a PDF file, as you do with English texts: you select the desired text, copy it, switch to a text-capable application and paste. The Arabic/Hebrew text is transferred to the application as text, and will be displayed correctly if the application is able to handle correctly the Central European scripts. The selected text also keeps the fonts, size and style, which are applied when the text is pasted in the new document.
Find
The find command of Adobe Acrobat 4.05 CE version has a specific option: ”Ignore accents”. When checked (default option), it allows you to find a string of text wether it contains some accents (diacritics, vowels) or not. If the option is unchecked, you must type exactly the string you are looking for, including accents.
Annotations
With Adobe Acrobat 4.05 CE you can type Central Europe texts in a PDF text annotation exactly as you would do with English. A set of Central Europe fonts are installed with Adobe Acrobat 4.05 CE, designed to look and print exactly the same on the Macintosh and Windows platforms.
Forms
Adobe Acrobat 4.05 CE makes it easy for you to create, fill in, and submit electronic PDF forms including Central Europe fields.
About Unicode Support in dtSearch 6.0 Search Engine
Accent-insensitive indexing with Unicode support
dtSearch 5 offered the option to create indexes that were "accent-insensitive," meaning that marks such as umlauts did not affect searching. This option has been added to dtSearch 6 with Unicode support, so accent-insensitivity works for the entire Unicode character set. An accent-insensitive index maps characters, wherever possible, to the letters A-Z or the digits 0-9. For example, the Arabic-Indic digit 8 (U+0668), the Bengali digit 8 (U+9EE0), and the Tamil digit 8 (U+0BEE) would all be indexed and searched as the digit 8 (U+0038).
Unicode Support
dtSearch 6 builds indexes using the UTF-8 encoding and expects all strings provided in API functions to be UTF-8. This enables Unicode support to be added without any changes to API functions. The Unicode version of the dtSearch Engine runs under any Win32 operating system -- Windows NT, 95, 98, or 2000.
API
Two new API functions, dtssUtf8Encode and dtssUtf8Decode, are provided to facilitate conversion between UTF-8 and wide character strings. Additionally, a set of API classes in dtsfc.cpp and dtsfc.h automatically handle conversion between UTF-8 and wide character strings. Example:
DSearchJob sj;
sj.IndexesToSearch.append(TEXT("c:\\sample\\index"));
sj.Request.set(TEXT("Wide String"));
DSearchResults *res = sj.Execute();
These classes are documented in the C++ Support Classes topics in the dtSearch Engine help file, dtengine.chm.
UTF-8
UTF-8 is an encoding of Unicode text that preserves all information in a Unicode string. Characters between 1 and 128 are encoded as Ansi characters 1 through 128. Other characters are encoded using character values greater than 128. UTF-8 encoded strings do not contain embedded NULL characters.
Support for ME-CE in PDF WebSearch
The PDF File Parser in dtSearch will support the extraction of Unicode text. This feature is still under development by dtSearch Corp. The ability to build indexes using UTF-8 encoding is already finished.
The model Search and Retrieval interface for PDF WebSearch will be translated for Hebrew, Arabic and French.
A optional search screen feature will be added, like for the Find command, to search with the option to "Ignore Accents" or not. When checked (default option), it allows you to find a string of text wether it contains some accents (diacritics, vowels) or not. If the option is unchecked, you must type exactly the string you are looking for, including accents.
To implement this feature, dual search indexes must be created and maintained. One search index will contain the accents, one will not. If "Ignore Accents" is checked, then the index without accents is searched. If the "Ingnore Accents" is not checked, then the index with accents is searched. Observation has shown that 99% of users prefer to search without accents.
Support for unicode text in field searches based on PDF DocInfo fields will be tested and verified to function correctly.
|