AN OVERVIEW OF DIFFERENT TOOLS FOR WORD-PROCESSING
OF TAMIL AND A PROPOSAL TOWARDS STANDARDISATION
(part II of the paper)

(click here for part I)

Dr.K. Kalyanasundaram,
Institute of Physical Chemistry, Swiss Federal Inst. of Technology,
CH-1015 Lausanne, Switzerland

(Invited Paper to be presented at the "International Symposium for Tamil Information Processing and Resources on the Internet, National Univ. of Singapore, Singapore, 17-18,May 1997 )


In part I, various tools that are available today for tamil word-processing were reviewed. Herein we attempt to classify them in some unifying framework and use that framework as the basis for proposing a standardisation scheme for Tamil word processing.

Classification of tools for Tamil Word Processing
Functioning of any word-processor can be divided into two parts - those connected with the 'input' process and those with the 'output'. Based on the features of the input, output processes involved it is possible to classify all word-processing tools into following categories:
classical typewriter input/Direct output ;
wytiwyg ("what you type is what you get") input/direct output;
romanized input/interpreted output ;
phonetic input/Interpreted output.
Table 1 (see end of this part II )lists examples of different font faces and word-processing softwares grouped according to the above classification.
Examples of 'Direct' tools are font faces used with associated keyboard layouts in typewriter or WYTIWYG format. In direct usage of simple font faces, the 'output' has a one-to-one correspondance with the input. For every keystroke, there is a character output. There is no software interpretation or intervention of the keystrokes. What letter you see on screen depends on what letter is stored under the keystroke in question. In other elegant word-processors, the 'input' is 'interpreted' by the software to give the output. The input can be in the form of romanized text or phonetically based. In all cases, keyboard editors/managers allow some manipulation of the input process. 'Font-encoding' determines the final 'output'.
A primary requirement for any standardisation process is to have a standard font encoding scheme. Irrespective of the mode of input and the output modes, all word-processing tools must use this unique standard character set. With such a unique font encoding standard, it is enough to have one single tamil font/word processor to exchange tamil documents electronically (including via WWW pages of Internet). In any implementation of standards, there is geneuine fear on the part of end-users to know how much of their current typing habits and word-processor capabilities are to be sacrificed. Is it possible to find methodologies by which different typing habits ('input' practices) of end-users and also the choice of different modes of inputs be guaranteed within such a scheme imposing a standard character set (font encoding scheme)? The answer is yes. The details are elaborated in the following paragraphs.

Keyboard Layouts and associated Editors/Managers
Keyboard Editors (or Managers) allow access of any character stored in the font-face by typing any key on the keyboard. Different keyboard layout can handle typing preferences of individuals. As an end-user/laymen, we do not care where ku is assigned in the ref. table. But we would like to know which key on the keyboard we should use to get ku, ki and so on. Keyboard layout is what controls this access pathway and hence this is an important point that affects the end-user (you and me!!!) drastically.
Both on Windows and Mac OS, standard keyboard layout softwares come with the system. These allow us to choose different layouts for typing in different european languages. In French, for example, there are at least 4 different keyboard layouts available (french, french-numerical, swiss french, canadian french). Here in the french speaking part, we set my Mac/PCs to use Swiss-French keyboard layout and switch to US keyboard layout whenever typing is done in Tamil using Mylai tamil font. Switching between keyboard layouts is rather trivial (even our 8-year old daughter knows how to change-to go from one keyboard to the other). (In french french keyboard, common french letters like e(accent grave), e(accent acute) etc are in normal numerals key positions and you need to use shift to get the roman numerals themselves!!). Different keyboard layouts present different characters at key positions of choice. Thus the same fontface set can be accessed differently on the keyboard using different layout schemes.
In tamil, we can have different keyboard editors that allow us to do the same thing. Anjal, for example, can have different keyboard layouts made available to allow tamil typing corresponding to different typing habits. In fact, Muthu Nedumaran is working towards providing a Mylai keymap typing option in future versions of Anjal. In principle, we can have many tamil keyboard layouts (as is the case I mentioned earlier for french) to satisfy every interest.The Summer Institute of Linguistics of Dallas, Texas, USA makes available already in public domain a handful of softwares that allow development of dedicated keyboard editors/drivers for windows (e.g, KeyMan) and Macintosh (SILKEY) platforms. It is desirable though, to limit the number of keyboard layouts to make the life of software designers easier. Too many keyboard options means, the softwares have to be adopted to handle different schemes.

A proposal Towards Standardisation
In the introductory section of this paper it was pointed out that internet is rapidly becoming a main channel/forum for exchange of information worldwide and that currently, tamil on internet is in a rather messy situation. Why? The Indian Government might have proposed a standard character set (ISCII) and a keyboard layout (INSCRIPT) for indian languages nearly a decade ago. However the existence of these standards were not popularised outside India. Due to lack of communications, dedicated softwares for tamil word-processing have been developed independently in India and abroad (particularly in Malaysia/Singapore region where there is a high concentration of tamils). Many of these softwares have their own novel features. Even if the number of tamil-speaking community outside India may be less than those within India, major fraction of the former group have access to computers. Tamil computing is fast catching up in this group. Due to varying font encoding schemes used in the word-processing tools currently employed, the web pages require prior downloading of as many fonts as the web pages to browse. Along with the use of softwares comes the typing habits of individuals. In order to make information exchange of tamil materials via Internet a real pleasure for all of us, it is essential that efforts are taken to unify these different approaches under some umbrella scheme.
It was mentioned earlier that, for a given font, different keyboard layouts can be presented to the end-user through the use of keyboard editors/managers. Given this possibility, one possible approach towards standardisation is to go for a single font encoding (standard character set) to be adopted in all tamil font faces, word-processors and DTP packages. Each font/DTP package can come with different options of input methods provided in the form of different keyboard layouts made available under a pull-down menu. The feasibility of having keyboard editors/managers for some of the commonly used input methods has already been shown. Thus standardisation process reduces to deciding on a standard character set with some recommendations on possible keyboard layouts that can be provided for users' choice. Since end-users can continue to work with their own favourite keyboard layout, there will not be any major resistance to the implementation of this standard. The major task will be for the software developers to recast their existing font faces to correspond to one standard encoding scheme and provide appropriate keyboard editors/managers that are currently used. With the option available for anyone to test out different keyboard input methods, there can be hope to reduce the number of these layouts (weed out some less popular ones).
Implementation/large scale acceptance of proposed standards in a short span of time is possible if and only if the implementation process does not get bogged down with legal constraints imposed by copyright protections and associated high costs to obtain rights of usage of the technology involved in the standardisation process. Clearly, any scheme for font encoding or keyboard layout that is not strictly in public domain can cause problems. Open standards for all the key elements is critical. Recommending possible standards for world-wide tamil computing that are based on propriety materials of few author(s) amounts to patronising and against all free market practices. To facilitate the process and to avoid ambiguity, it is highly desirable that all key players (software developers) in the field openly declare their consent to work within such "open standards framework".

A possible standard character set under 8859-X scheme
Currently most of the world languagues are handled under different standard character sets registered with International Standards Organization (ISO) under different headings. ISO 8859-X of ECMA (European Computer Manufacturers Association) is the most popular of these schemes for handling european and other languages of the world. It is currently implemented by the commonly used web browsers. So, in short, it is a proven technique. The standard (default case for most web browsers) is 8859-1 and this supports most of the languages of Europe and Latin America. 8859-2 (aka as Latin-2) is designed for Eastern European languages, 8859-3 (aka as Latin-3) is designed for South-Eastern Europe, 8859-4 (aka as Latin-4) for Scandinavia (also covered by 8859-1), 8859-5 for Cyrillic/russian, 8859-6 for Arabic; 8859-7 for Greek, 8859-8 for Hebrew, 8859-9 (aka as Latin-5) is same as 8859-1 except for Turkish instead of Icelandic and 8859-10 (aka as Latin-6) for Eskimo/Scandinavian Languages. For those who would like to know more about these standard character sets, there are a couple of web sites providing additional information: ISO Alphabet Soup ; Info. on ISO-8859 ; Internationalisation.
Herein, we would like to discuss a possible standard character set for tamil for eventual registration under the existing ISO 8859-X scheme. Figure 2 presents one possible standard character set following the general pattern of 8859-X schemes. The character set contains a minimal set of characters or glyphs that one would need to be able to type tamil texts in a form that will be acceptable to majority of the tamil community. It is modelled on the 7-bit tamil font faces of the classical tamil typewriter and 'wytiwyg' keyboard layouts. Tamil texts can be written in all of the possible current writing practices and also in forms corresponding to some recent proposals suggesting reforms in tamil writing practices.
The time of introduction of standards for tamil computing can also be an opportunity to introduce some of the proposed reforms in tamil writing practices. Any reform/revision should be gradual to have a quick world-wide acceptance. Standards proposing drastic reforms will remain on paper and people will continue to write the way they do now. Hence we have to be very careful in deciding the content of the standard character set: number of glyphs Unicode 2.0 and ISCII Standards contain a minimal set of tamil character glyphs (basic vowels, consonants and a handful of modifier glyphs that add to consonants to give the uyirmeis). The actual generation of the uyirmeis is left largely to the softwares. If there are no standards on the actual number of glyphs to be used in tamil word-processing, the output will be software dependent. All the proposed exercise of standardisation will be useless. As mentioned earlier, most of the uyirmei alphabets of tamil language have their own unique geometric shape/glyphs. Currently most of the 8-bit word-processors designed for professional publishing houses (tamil newspapers, magazines,...) keep these unique uyirmei glyphs within the 256 character set slots. If we do not specifically include these unique glyphs (many are not easily obtained using the kerning techniques), these schemes necessitate writing tamil in a new radically revised form!!! To ensure backward compatibility and ready world-wide acceptance, the proposed scheme includes many of these unique uyirmeis as such. We are fortunate that, two other paper presentations at this conference (of Muthu Nedumaran and Anbarasan) will address specifically the tamil standards as envisaged under the above UNICODE and ISCII standards respectively.

Acknowledgement
Many of the points discussed in this paper were floated and extensively debated in the internet email discussion forum 'tamil.net' during the last couple of months. I would like to thank Mr. Muthu Nedumaran and Mr. Bala Pillai for making this forum available and Muthu in particular for many fruitful dialogues during the past year. I would like to thank all those who participated in these discussions.

Author
Dr. K. Kalyanasundaram is a native of Madras (oops, Chennai), Tamilnadu where he had his early school, university education. He attended Loyola College affiliated to the Univ. of Madras, from where he received his B.Sc, M.Sc degrees in Chemistry. This was followed by doctoral thesis research in the area of photochemistry done at the Radiation Laboratory of the Univ. of Notre Dame in Indiana, USA (received Ph.D in Physical Chemistry in 1976). After spending nearly 27 months in London, UK as a post-doctoral research fellow of the Royal Institution of Great Britain, he moved to his present location of Lausanne, Switzerland in 1979. He is a member of the teaching and research staff of the Chemistry Dept. of the Swiss Federal Inst. of Technology (Ecole Polytechnique Federale, as it is known locally in the french speaking part of switzerland). While surfing the Internet for a couple of years, he came across the huge amount of electronic archives of ancient literary classics in English. When he could not find anything worth talking about in Tamil available on the Internet, he floated an idea of a 'Tamil Electronic Library' in 1994 to the soc.culture.tamil newsgroup. To facilitate electronic text archiving he developed a tamil font called Mylai and has distributed several thousand copies of this font free via internet. With this font as the base, he started building a collection of etexts of tamil literary classics and the tamil electronic library web site. Thus grew his interests to various aspects of Tamil Computing and its Resources on Internet.
Table 1: A classification of various tools available for tamil word-procssing.
Name Authortype platformremarks
Fontfaces
[ ananku] P. Kuppuswamy direct/ttw classical-1 windows/mac 7-bit
tamillasr George Hart direct/ttw classical-1 mac7-bit, mac
[saraswathi] Vijayakumardirect/ttw classical-1 windows 7-bit
TMNews (dinamani) direct/ttw classical-2 windows 7-bit
[ Inscrypt ](softview computers) direct/ttw classical-3windows Ramington ttw?, 8-bit
mylai K. Kalyanasundaramdirect/wytiwyg1 windows/mac/unix7-bit
mylai-sri K.Srinivasan direct/wytiwyg1 windows,mac 7-bit/
palladam T. Govindaraj direct/wytiwyg2 windows,mac 8-bit
valai-sri K. Srinivasan direct/wytiwyg3 windows 7-bit
trutamilRaja Seshadri direct/wytiwyg4 windows 8-bit?
Word Processors
anjal/inaimathiMuthu Nedumaran interpreted/romanized1windows, unix8-bit, mac font
adhawin, Adami K. Srinivasan interpreted/romanized2windows 8-bit
[nalinam] Sivaguru Chinniah interpreted/romanized3windows 8-bit
ITrans Avinash Chopde interpreted/romanized4 Unix/windows 8-bit Washington tamil font
XLibTamil G. Swaminathan intepreted/romanized4 Unix ?
madurai Bala Swaminathan interpreted/romanized2unix/PC?
PCTamil Vasu Ranganathan interpreted/romanized3 DOS PC?
[ IE/tamilfix] Naa. Govindasamy interpreted/phonetic1 windows, mac, unix 8-bit
[ Thunaivan] Ravindran Paul interpreted/phonetic2 windows 8-bit
[Yarzan] Shanmugalingam interpreted/phonetic3windows 8-bit
[Shree Lipi] Modular/CDAC-DOE interpreted/phonetic4 windows 8-bit, multilingual
[Gamma UniTypeComStar interpreted/? windows multilingual
bharathi ? interpreted/? DOS PC ?
venus ? interpreted/? windows?

i) 1,2,3... indicate variations of the input method/keyboard layouts of a given type.
ii) [....] indicate tools are not of "proprietory" nature (not "OPEN").

Click here to return to part 1 of this paper presentation

Please feel free to leave your comments here
Name:
Email:
Location (city, country):
Url:
Comments:



Visitors since Nov 16, 2005:

This page hosted by efree2net.com
Click here to return to Hompage of Tamil Electronic Library