Some background on the Tibetan language synthesis within MARY
This is a very first system designed to convert written Tibetan text into speech, which can be understood e.g. by blind people, or people who do not know how to write.
The system is the result of a student project. None of the persons involved speak any Tibetan -- we were therefore unable to determine if the system sounds understandable, does the right thing, etc. We strongly invite feedback, but please be kind, and be constructive.
How to enter Tibetan text
Currently, Tibetan text must be entered in Wylie transliteration, more precisely in the Extended Wylie format as defined at THDL. A number of converters from/to Tibetan script in different encodings are available.
Who built this?
The Tibetan language version of the MARY text-to-speech (TTS) synthesizer was built during a software project at DFKI and the department of Computational Linguistics at Saarland University, in summer 2005.
The project was directed by Marc Schröder and involved the following students:
- Maria Staudte (sentence and word detection; syllable structure parsing)
- Anna Hunecke (pronounciation rules and lexicon)
- Jens Apel (tone assignment; intonation)
- Lars Jungjohann (sound inventory; tone realisation; diphone voice mapping)
As none of the participants speaks any Tibetan, all modules and rules were determined based on the available literature, in particular the following two manuals:
Goldstein, M.C., Gelek Rimpoche & Phuntshog, L. (1991). Essentials of modern literary Tibetan. University of California Press.
Tournadre, N. & Dorje, S. (2003). Manual of Standard Tibetan. Snow Lion.
We now invite experts of the Tibetan language, such as Tibetologists and native speakers, to play around with our system and to give us feedback:
- Does the output sound understandable?
- Does it sound as it is supposed to?
- Which of the voices sound best? Which ones sound worst?
- What important mistakes have we made?
- What should urgently be improved?
- (and not least:) What can you do to help?
Who did you build this for?We built this system in the hope that it might become useful, e.g., in the Braille Without Borders project in Lhasa. The project is clearly non-commercial, and the resulting system is not to be sold. Organisations or institutions who wish to get a (royalty-free) license for the software should contact us.
State of affairs, LimitationsWe are very conscious of the fact that the system is crudely simplified in many ways. In the following, you will find a sketch of the current state of affairs.
Word segmentationA very crude mechanism for finding words consisting of more than one syllable is implemented, relying on a list of particles such as "la" or "gi", and a lexicon of words. Both the list of particles and the word list are mere proofs of concept, and contain only very few entries. Anyone who can give us access to more exhaustive word lists for Tibetan should contact us.
Numbers and other text normalisationsThe current system knows nothing of numbers, abbreviations, etc. These will need to be spelled out in full if they are to be pronounced.
SanskritWe have concentrated on core Tibetan, and have ignored all Sanskrit characters in Tibetan (those that are represented by capital letters in Extended Wylie). If you can give us a mapping of Sanskrit letters in Tibetan to their pronounciation, let us know. We have also ignored non-standard syllable structures which seem to stem from Sanskrit, such as "padme" or "karma". To pronounce these, write "pad me" and "kar ma".
PronounciationPronounciation relies on the fact that Tibetan is, despite the complex spelling, reasonably regular in its pronounciation. Rules as formalised, in particular, in Goldstein's book, have been implemented in an extendable formalism. If you think you can improve on the current rule set, contact us.
PhrasingPhrasing is very rudimentary at the moment, relying on punctuation marks such as "/" or "_" to mark ends of sentences or phrases. Within sentences, no phrase breaks after particles are currently inserted. Suggestions for improvement are welcome.
TonesWhile Tibetan may have only two tones distinguishing meaning (high and low), it seems that at least 4 need to be distinguished phonetically, namely a falling variant of high and low tones (corresponding to glottalisation).
Textbooks state that only on the first syllable of a multi-syllabic word, these tones are realised. In order to avoid monotonous high or low stretches, we have implemented a "mid-tone" realised on all non-initial syllables. Feedback on this is appreciated.