#osxchat blog

2004/12/25

OpenVanilla 藏文輸入法

作者: zonble

OpenVanilla 藏文輸入法

在上週末ICOS2004上,#osxchat的朋友們一同釋出OpenVanilla這套在MacOS X上的中文輸入法框架的0.61版之後,在網路上就有篤信藏傳佛教的朋友,因為有輸入藏文的需要,詢問是否可能使用OpenVanilla輸入在麥金塔電腦上輸入藏文。這個問題現在有了肯定的答案,因為我在聖誕夜裡,已經寫好了一套在OpenVanilla環境下的藏文輸入法—雖然說完全不會藏文的人寫出來的藏文輸入法,到底堪不堪用是個大問題,雖然已經有「還在學藏文」的朋友做,了大概簡短的測試,不過,這需要會藏文而且使用麥金塔電腦的朋友們,多多測試。

在Unicode的規格中,從U+0F00開始的位置就是藏文,MacOS X對於Unicode的支援算是相當完整,在藏文字體方面也沒有什麼問題,MacOS X內建的簡體中文字體中,就包含了藏文,另外在網路上也可以下載Tibetan Machine Uni等字體。這篇blog中用了一些Unicode當中的藏文字元,如果您沒有藏文字體的話,就看不到這些字,您或許可以去下載一下藏文字體。總之,在MacOS X中顯示Unicode藏文沒有問題,問題就在怎麼輸入。

在The Tibetan & Himalayan Digital Library網站介紹藏文輸入的頁面上,介紹了相當多的輸入方式,不過都是以Windows平台為主,包括供Keyman使用的藏文鍵盤,Keyman只有Windows平台的版本(而且要錢),能夠在MacOS X上用的,有Tibetan!,不過只能夠在Microsoft Word裡頭用,Wylie Word應該也可以在Mac的Word裡頭使用,不過下載下來,卻是一個Windows的自動解壓縮格式。

另外就是Jskad這個用Java寫成的編輯器,在這個編輯器當中,支援四種藏文輸入方式。雖然比較堪用,但問題同樣很大:Jskad裡頭的藏文,如果沒有在打完之後,用Convert All→Convert to Unicode轉換的話,不是Unicode,而是在編輯器中用某些ASCII code組成藏文文字,雖然是除存成RTF格式,但是RTF當中的藏文,只能在Jskad裡頭看的到,需要在轉換之後才能夠剪貼到其他應用程式;另外,Jskad居然沒有列印功能,顯然用起來不怎麼方便。

雖然不會藏文,不過大概看了一下Sambhota Keyboard One(Jskad提供的四套鍵盤配置之一)的說明,大概整理出了一套邏輯;另,為什麼選Sambhota Keyboard One開始研究呢?因為在網路上的朋友想要的是這個鍵盤,就先了解這個鍵盤了。

一、在藏文中分成母音與子音,母音似乎無法單獨存在,而是必須黏靠在子音的字母上,如果只有單獨的子音,就是念做「阿」音,例如「ཀ」是「ka」,而加上了「e」的音,「ke」就會成為「ཀེ」,在這邊「k」與「e」是個別的字碼,蘋果的文字處理引擎,在TextEdit等程式中可以很流暢的把個別的部份組合在一起,不過如Microsoft Work就會有問題,可能Sarafi也會怪怪。所以要先整理出母音與子音的Unicode與鍵盤位置。如果打了子音,就進入中文輸入法的組字狀態,把子音放進buffer中;在組字狀態下,如果又按了一個子音,就把前一個子音送出,如果打母音,就把母音加上,一起送出。

二、另外就是藏文中有一些幾個字母組成的組合字,英文叫做Stacking,我不清楚用組合字稱呼正不正確、或精不精確,不過,在漢字的中文輸入法中做藏文的組字,在這裡「組字」會有兩種意思,一種是像無蝦米用「OAO」打「哈」這種用按鍵組合成一個對應的字碼的意思,上一段用的就是這個「組字」的意思,一種是藏文中用多個字碼合成一個字,也就是接下來要講的意思。

在Unicode處理組合字也沒有什麼問題,從0x0F40開始的字母是組合的基礎,然後其他會自動放在原來字母下的字母,是設計在0x0F90的位置。Sambhota的設計是在按下「f」按鍵之後開始組合文字,所以假如在鍵盤按鍵的陣列中,第一個按鍵是f的話,那第二個按鍵送出原本的字碼,從第三個按鍵開始,每個字碼加上0x50,然後,超過四個按鍵、或按下母音的話就把buffer送出,因為看起來最多組合字也是以三個字母組成。

三、藏文中的數字是單獨存在的,對應到鍵盤的一到九,另外也有一些如分隔號、空白、括弧的符號,有這些符號的話,就把buffer與符號一起送出。

四、另外在藏文字母上,會出現一些奇怪的圈圈。比方說「ཀཽ」上面可以在加上一個圈,變成「ཀཽཾ」,在Sambhota中是用&以及%做這件事情,而這個圈圈需要放在字母上,而不是放在符號上,所以很髒的用了一個變數,架設最後送出的一個按鍵是字母,按下&與%才有作用。這麼做有一個問題,就是你可能打出一個字之後,會用左右鍵到前面的字上面加圈圈,但是按鍵是無效的;可是在Jskad中,也有同樣的狀況,所以,這個問題,可能不是問題—吧?

所以按照這幾個原則,用了一堆if與then,就先弄出了一個在OpenVanilla當中的藏文輸入環境,另外看了一下,Jskad裡頭另外兩個鍵盤TCC Keyboard 1與TCC Keyboard 2的原則,與Sambhota差不多,只是鍵盤的位置不太一樣,所以等到目前OV的Sambhota弄得更好一點,應該只要改一下,就可以變成TCC Keyboard了。

Wylie鍵盤會比較麻煩一點,因為之前的鍵盤都是每個母音或子音都是一鍵一字,但是Wylie會使用到兩個按鍵打一個藏文字母,比方說,在Sambhota鍵盤中打「x」就可以打出「」,但是Wylie鍵盤是用「ts」,Sambhota是用鍵盤上的大小寫判斷是長短音,而Wylie則是用加上一個「h」表示長音,大概要重新想要怎麼做。反正現在也沒有人說要用,就先不管了,不過如果什麼蒙藏委員會、西藏流亡政府或西藏獨立人士有興趣贊助OpenVanilla這個輸入法計畫的話,我想這些東西要弄出來應該都很快,呵。

༄༅།ༀ་ཧྰ་རེ་ཧུཧྷྰ་རེ་ཧུ་རེ་སྰ་ཧྰ།

這段藏文是在SubEthaEdit裡頭,照著某個網頁直接照打的。完全不懂是什麼意思。

藏文輸入法應該會放在下一版的OpenVanilla當中一同釋出。

標籤: , ,

13 篇留言:

  • 與粽老的討論結果,目前只有 cocoa 的 textedit 能夠支援正確的藏文顯示,所以其他 browser 都是完全無解的。

    作者: Blogger Mengjuei Hsieh 發表時間: 12/25/2004 04:58:00 下午  

  • -- 因為看起來最多組合字也是以三個字母組成。

    最多是六個,可看看此 PDF:

    http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2624.pdf

    Tibetan BrdaRten Character STRIIM
    Tibetan BrdaRten Character BHRUUM

    又,印象中只有 cocoa 應用程式如 TextEdit.app、Mail.app 及 iCal 等才能正確顯示藏文。Carbon 的, . . . 哎

    作者: Anonymous 匿名 發表時間: 12/25/2004 09:26:00 下午  

  • Zonble,

    你的動作很快,太好了。

    作者: Blogger Q3Q 發表時間: 12/25/2004 09:28:00 下午  

  • 你看過這個網站嗎?他們也計劃osx對應的藏文輸入法。但遲遲不推出。我不知道他們到了什麼時候才釋出這個輸入法。
    http://web.otani.ac.jp/cri/twrp/TLK/tlkhelp-en/index.html

    作者: Anonymous 匿名 發表時間: 12/26/2004 11:39:00 上午  

  • Pardon me -- I hope no one will mind if I post to this discussion in English, as I do not read/write Chinese, though I have been able to make out approximately what is being discussed by copying/pasting some of the messages into my Wenlin dictionary and browsing with its lookup tool.

    I was alerted to this discussion, and the Tibetan entry project, by a post in the Chinese-Mac mailing list (http://groups.yahoo.com/group/chinese-mac/message/5179). I am an amateur scholar of languages, Chinese/Oriental culture and Buddhism living in Santa Fe, New Mexico, U.S.A. I have long been interested in possibilities for working with text in complex Asian script systems such as Chinese, Sanskrit and Tibetan, and have been following with interest the great expansion of such possibilities in OS X, with its Unicode basis.

    I bought the XenoType OS X Tibetan Language Kit when it appeared a couple years ago; it is very well designed, but works with few applications -- all I've found so far are TextEdit and Nisus Writer Express. And it only works with the single font supplied by XenoType. As in the classic Mac OS, the big problem is the complex routines required to combine characters -- which is why Apple's own Indian Language Kit worked only with SimpleText, and not with AppleWorks.

    Anyway, I was moved to post here when I noticed the picture in the beginning post, wherein an example is shown of entering Tibetan text, based on an image at the Nitartha site. It appears that no one in this discussion has noticed that the text was copied/entered incorrectly: the "ta" character has been confused with the "ha" character. The mantra (which is actually not Tibetan, but Sanskrit written in Tibetan letters) transliterates thus:

    oṁ tā re tuttā re tu re svā hā.

    Here it is in Tibetan letters (written with the XTT TLK), whose stacks unfortunately will decompose in transmission:

    ༀ་ཏྰ་རེ་ཏུཏྟྰ་རེ་ཏུ་རེ་སྭྰ་ཧྰ།

    In the picture at the top of this thread, it has been written as:

    oṁ hā re huhhā re hu re svā hā.

    ༀ་ཧྰ་རེ་ཧུཧྷྰ་རེ་ཧུ་རེ་སྭྰ་ཧྰ།

    I suppose whoever is working on this input method is aware of the difference between "ta" and "ha," but since the error seems to have been repeated, I thought I'd mention it.

    In the classic Mac OS I did a little work with the Otani University Tibetan Language Kit mentioned in one post here; it is very basic but adequate to simple Tibetan text entry, but only in the classic Mac OS. I did once send the publisher a query about their kit, and received no answer; I don't have the feeling they have any plans to recreate it to work in OS X.

    So far as I can tell, the real problem with Tibetan, as with the Indic languages (Sanskrit et al.), is not the lack of input methods, which can be remedied fairly easily -- it is that the way these scripts have been implemented in Unicode, there is no way to ensure cross-platform compatibility. That is, a Sanskrit, Hindi or Tibetan text created on the Macintosh cannot be reliably displayed in Windows, and vice-versa. So far as I can tell, the only way to solve this problem would be for software developers to create their own proprietary cross-platform systems: that is, for instance, if XenoType put out a Windows version of their Tibetan Language Kit, then texts created with their kit on either platform could be transferred to the other platform. In other words, in practical terms the cross-platform situation remains exactly as it was before Unicode.

    But unless XenoType, Nitartha/Sambhota, the Tibetan Computer Company (Tony Duff in Nepal), the University of Virginia, and anyone/everyone else working on Tibetan solutions, on any/all platforms (Mac, Windows, Linux, etc.), can be persuaded to code all their systems identically -- whch seems no more likely than getting Micro$oft and Apple to agree on how these complex scripts are to be handled -- working with Tibetan text will remain just as problematical as it was before Unicode. In other words, it appears that in the case of these scripts, Unicode has simply failed to solve the problem that Unicode was developed to solve -- but instead has made the problem permanent and effectively insoluble.

    Of course, if everyone were working in Windows, the problem would be much simpler. But I'm not about to give up my Macintosh. Most Tibetan work is done in Windows, and most who are doing the work seem to assume that everyone is working in Windows. Thus the noted paper at (http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2624.pdf), which correctly states that all the stacks can be accommodated in the current Unicode scheme, but doesn't mention the problem of cross-platform transfer.

    BTW, the Nitartha/Sambhota site has said they're working on a system for OS X for at least two years now. I'm not holding my breath anymore. Their font is beautiful, though.

    Andrew Main

    作者: Anonymous 匿名 發表時間: 12/27/2004 05:14:00 上午  

  • Tibetan Machine Uni 這個字體是所謂的 OpenType 的 smart font,內含有如何疊字的指令。這也就是為什麼以這個字體顯示藏文,會自動疊字。如果用 MS Arial Unicode 字體,就不會疊字了。不知 Andrew 先生意思是指不同的平台會對 OpenType 內的指令解釋不同嗎?

    The font Tibetan Machine Uni is of OpenType. It is so called smart font with built-in instructions on stacking. This is why in Apple's TextEdit and CodingMonkey's SubEthaEdit the Tibetan script is rendered beautifully, whereas no stacking if the font MS Arial Unicode is used.

    Andrew, do you imply that the rendering of the OpenType instructions differs between Window XP and Mac OS X's aqua text engine, and results in different Tibetan script look?

    作者: Blogger Q3Q 發表時間: 12/27/2004 02:33:00 下午  

  • Q3Q,

    不能將元音、子音垂直疊在一起,是錯誤顯示。在 Safari 按 option command v 看看此網頁原始碼 :)

    又,依靠特定字體來顯示是權宜之策。Cocoa 應用程式以 STHeiti 也可正確顯示藏文。


    Andrew,

    A few months ago I asked an engineer, he hinted that in Mac OS X, only cocoa text rending engine works correctly for Tibetan, and there was a bug in carbon text rendering engine.

    First of all, we communicate thru browser; unfortunately none of the browsers on Mac OS X can display Tibetan correctly. For this page, Safari display incorrectly, but if you view the source code, you can see that consonants and vowels stack vertically.

    Secondly, an unicode character is a code point, the current problem is: provided the same data, different platforms/applications display differently. There are two approaches to this problem, one is let the platforms / applications to provide proper text rendering, another one is to define composed Tibetan character as a code point. Have a look of this PDF:

    www.unihan.com.cn/News/zmlm.pdf



    Lastly, when Tibetan meets applescript:

    in Script Editor, you may try:

    -- script starts

    -- Tibetan mantra
    property cL : ASCII character 199
    property cR : ASCII character 200
    property mantra : "0F000F0B0F4F0FB00F0B0F620F7A0F0B0F4F0F740F4F0F9F0FB00F0B0F620F7A0F0B0F4F0F740F0B0F620F7A0F0B0F660FAD0FB00F0B0F670FB00F0D"

    tell application "TextEdit"
    make new document at front
    set text 1 of document 1 to (run script (cL & "data utxt" & mantra & cR))
    end tell

    -- script ends


    作者: Anonymous 匿名 發表時間: 12/27/2004 08:40:00 下午  

  • 悲兄,

    在這個 font test 網頁
    http://www.babelstone.co.uk/Test/Tibetan.html
    如果你的系統裝有 Tibetan Machine Uni 和 Arial Unicode MS,你會發現在 Safari 裡以 Tibetan Machine Uni 也可以正確的疊字顯示;而以 Arial Unicode MS 則不會疊字。所以,應該不是Safari 的 rendering 有問題。

    在 zonble 這個 blogger 的網頁,顯然是設成 San Serif 類的字體,我在 oikos 的貼文也是。我推想 Safari 在顯示這兩處網頁時,以 San Serif 為條件找可以用的藏文字體時,Arial Unicode MS 被選上,而顯示沒有疊字的藏文,以致於我們以為 Safari 有什麼問題。所以,把 Arial Unicode MS 從系統中拿掉也許是一個解決 Safari 不正確顯示藏文的辦法。不知你看法如何?

    作者: Blogger Q3Q 發表時間: 12/27/2004 09:46:00 下午  

  • Q3Q,

    愚見還是 依靠特定字體來顯示是權宜之策。

    初,小弟的機器沒有 Tibetan Machine Uni 或 Arial Unicode MS,iCal 也可正確顯示藏文;試問 iCal 能選特定字體嗎?

    Text-rendering engine 是很低階,應負全責正確顯示所有 unicode 字元。應用程式不應、也不需用者選用某特定字體,才可正確顯示藏文。

    以跨平台 / 應用程式計,把 precomposed 藏文定為 unicode 字元、佔用一個 code point 是較佳,可看看此 PDF 所陳述理由。

    www.unihan.com.cn/News/zmlm.pdf


    作者: Anonymous 匿名 發表時間: 12/27/2004 11:08:00 下午  

  • 据我了解,在西藏很多藏族人都使用
    sambhota作为藏文输入的首选。可是sambhota只能在Word下使用。对于很多藏人来说,要想用sambhota在网页上写藏文,几乎是不可能的。
    如果想编写用sambhota显示的网页,而且想支持用sambhota输入法在网页上留言、发帖子,现在有没有已经作好的这样的软件、代码可用呢?如果没有,编写这样的软件难不难?
    我虽然也是学软件的,但编程经验不丰富,对很多工具软件也不熟悉,若只靠我个人的力量很难解决这个问题,所以特来请教诸位高手。恳请不吝赐教,如果有人愿意给我更多的帮助,我非常感谢!

    作者: Anonymous 匿名 發表時間: 11/03/2005 10:19:00 下午  

  • 今天太累了,明天(或週末)再來回應好了。

    作者: Blogger zonble 發表時間: 11/03/2005 10:55:00 下午  

  • 各位先前好!敝人請教一下梵文與藏文是否為同一出處.
    那麼,文字的輸入法軟體要如何購得?謝謝各位指教!
    fgfh6786@yahoo.com.tw

    作者: Anonymous 匿名 發表時間: 2/24/2006 10:21:00 下午  

  • 你好:
    我所使用的是OX10.4系統,也完裝有 Tibetan Machine Uni字型;但為什在應用程式如 TextEdit, Indesign等使用藏文輪入時疊字時也出了問題,都會疊成一堆,無法正常顯示,請問是那裡出了問題還是那裡要設定呢??
    例如:ཇྤི་་ཨོཾ.....

    謝謝!!

    作者: Blogger 老狗 發表時間: 9/12/2006 09:33:00 下午  

張貼留言

? 回前頁