marytts.tools.newlanguage
Class LexiconCreator
java.lang.Object
marytts.tools.newlanguage.LexiconCreator
- Direct Known Subclasses:
- CMUDict2MaryFST
public class LexiconCreator
- extends java.lang.Object
The LexiconCreator is the base class for creating the files needed to run the
phonemiser component for a new language. From a list of phonetically transcribed
words, the class will create:
- a lexicon file, efficiently stored as a Finite State Transducer;
- a letter-to-sound prediction file, as a decision tree in MARY format.
The input file is expected to contain data in the following format:
grapheme | ' a l - l o - p h o n e s | (optional-part-of-speech)
Hereby, the allophones must correspond to a defined allophone set, given in the constructor.
The file's encoding is expected to be UTF-8.
Subclasses of LexiconCreator can override prepareLexicon() to provide data in this format.
- Author:
- marc
- See Also:
AllophoneSet
|
Constructor Summary |
LexiconCreator(AllophoneSet allophoneSet,
java.lang.String lexiconFilename,
java.lang.String fstFilename,
java.lang.String ltsFilename)
Initialise a new lexicon creator. |
LexiconCreator(AllophoneSet allophoneSet,
java.lang.String lexiconFilename,
java.lang.String fstFilename,
java.lang.String ltsFilename,
boolean convertToLowercase,
boolean predictStress,
int context)
Initialise a new lexicon creator. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
logger
protected org.apache.log4j.Logger logger
allophoneSet
protected AllophoneSet allophoneSet
lexiconFilename
protected java.lang.String lexiconFilename
fstFilename
protected java.lang.String fstFilename
ltsFilename
protected java.lang.String ltsFilename
convertToLowercase
protected boolean convertToLowercase
predictStress
protected boolean predictStress
context
protected int context
LexiconCreator
public LexiconCreator(AllophoneSet allophoneSet,
java.lang.String lexiconFilename,
java.lang.String fstFilename,
java.lang.String ltsFilename)
- Initialise a new lexicon creator.
Letter to sound rules built with this lexicon creator will convert graphemes to
lowercase before prediction, using the locale given in the allophone set;
letter-to-sound rules will also predict stress;
a context of 2 characters to the left and to the right of the current character will
be used as predictive features.
- Parameters:
allophoneSet - this specifies the set of phonetic symbols that can be used in the lexicon, and
provides the locale of the lexiconlexiconFilename - where to find the plain-text lexiconfstFilename - where to create the compressed lexicon FST fileltsFilename - where to create the letter-to-sound prediction tree.
LexiconCreator
public LexiconCreator(AllophoneSet allophoneSet,
java.lang.String lexiconFilename,
java.lang.String fstFilename,
java.lang.String ltsFilename,
boolean convertToLowercase,
boolean predictStress,
int context)
- Initialise a new lexicon creator.
- Parameters:
allophoneSet - this specifies the set of phonetic symbols that can be used in the lexicon, and
provides the locale of the lexiconlexiconFilename - where to find the plain-text lexiconfstFilename - where to create the compressed lexicon FST fileltsFilename - where to create the letter-to-sound prediction tree.convertToLowercase - if true, Letter to sound rules built with this lexicon creator will convert graphemes to
lowercase before prediction, using the locale given in the allophone set.predictStress - if true, letter-to-sound rules will predict stress.context - the number of characters to the left and to the right of the current character will
be used as predictive features.
prepareLexicon
protected void prepareLexicon()
throws java.io.IOException
- This base implementation does nothing. Subclasses can override this method
to prepare a lexicon in the expected format, which should then be found at
lexiconFilename.
- Throws:
java.io.IOException
compileFST
protected void compileFST()
throws java.io.IOException
- Throws:
java.io.IOException
testFST
protected void testFST()
throws java.io.IOException
- Throws:
java.io.IOException
compileLTS
protected void compileLTS()
throws java.io.IOException
- Throws:
java.io.IOException
testLTS
protected void testLTS()
throws java.io.IOException,
MaryConfigurationException
- Throws:
java.io.IOException
MaryConfigurationException
createLexicon
public void createLexicon()
throws java.lang.Exception
- Throws:
java.lang.Exception
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Parameters:
args -
- Throws:
java.lang.Exception