de.dfki.lt.mary.unitselection.featureprocessors
Class FeatureDefinition

java.lang.Object
  extended by de.dfki.lt.mary.unitselection.featureprocessors.FeatureDefinition

public class FeatureDefinition
extends java.lang.Object

A feature definition object represents the "meaning" of feature vectors. It consists of a list of byte-valued, short-valued and continuous features by name and index position in the feature vector; the respective possible feature values (and corresponding byte and short codes); and, optionally, the weights and, for continuous features, weighting functions for each feature.

Author:
Marc Schröder

Field Summary
static java.lang.String BYTEFEATURES
           
static java.lang.String CONTINUOUSFEATURES
           
static java.lang.String EDGEFEATURE
           
static java.lang.String EDGEFEATURE_END
           
static java.lang.String EDGEFEATURE_START
           
static java.lang.String NULLVALUE
           
static java.lang.String SHORTFEATURES
           
static char WEIGHT_SEPARATOR
           
 
Constructor Summary
FeatureDefinition(java.io.BufferedReader input, boolean readWeights)
          Create a feature definition object, reading textual data from the given BufferedReader.
FeatureDefinition(java.io.DataInput input)
          Create a feature definition object, reading binary data from the given DataInput.
 
Method Summary
 FeatureVector createEdgeFeatureVector(int unitIndex, boolean start)
          Create a feature vector that marks a start or end of a unit.
static int diff(FeatureVector v1, FeatureVector v2)
          Compares two feature vectors in terms of how many discrete features they have in common.
 boolean equals(FeatureDefinition other)
          Determine whether two feature definitions are equal, regarding both the actual feature definitions and the weights.
 boolean featureEquals(FeatureDefinition other)
          Determine whether two feature definitions are equal, with respect to number, names, and possible values of the three kinds of features (byte-valued, short-valued, continuous).
 java.lang.String featureEqualsAnalyse(FeatureDefinition other)
          An extension of the previous method.
 void generateAllDotDescForWagon(java.io.PrintWriter out)
          Export this feature definition in the "all.desc" format which can be read by wagon.
 void generateAllDotDescForWagon(java.io.PrintWriter out, java.util.Set featuresToIgnore)
          Export this feature definition in the "all.desc" format which can be read by wagon.
 void generateFeatureWeightsFile(java.io.PrintWriter out)
          Print this feature definition plus weights to a .txt file
 int getFeatureIndex(java.lang.String featureName)
          Translate between a feature name and a feature index.
 int[] getFeatureIndexArray(java.lang.String[] featureName)
          Translate between an array of feature names and an array of feature indexes.
 java.lang.String getFeatureName(int index)
          Translate between a feature index and a feature name.
 java.lang.String[] getFeatureNameArray(int[] index)
          Translate between an array of feature indexes and an array of feature names.
 java.lang.String getFeatureNames()
          List all feature names, separated by white space, in their order of definition.
 byte getFeatureValueAsByte(int featureIndex, java.lang.String value)
          For the feature with the given index number, translate its String value to its byte value.
 byte getFeatureValueAsByte(java.lang.String featureName, java.lang.String value)
          For the feature with the given name, translate its String value to its byte value.
 short getFeatureValueAsShort(int featureIndex, java.lang.String value)
          For the feature with the given name, translate its String value to its short value.
 short getFeatureValueAsShort(java.lang.String featureName, java.lang.String value)
          For the feature with the given name, translate its String value to its short value.
 java.lang.String getFeatureValueAsString(int featureIndex, int value)
          For the feature with the given index number, translate its byte or short value to its String value.
 int getNumberOfByteFeatures()
          Get the number of byte features.
 int getNumberOfContinuousFeatures()
          Get the number of continuous features.
 int getNumberOfFeatures()
          Get the total number of features.
 int getNumberOfShortFeatures()
          Get the number of short features.
 int getNumberOfValues(int featureIndex)
          Get the number of possible values for the feature with the given index number.
 java.lang.String[] getPossibleValues(int featureIndex)
          Get the list of possible String values for the feature with the given index number.
 float getWeight(int featureIndex)
          For the feature with the given index, return the weight.
 java.lang.String getWeightFunctionName(int featureIndex)
          Get the name of any weighting function associated with the given feature index.
 boolean isByteFeature(int index)
          Determine whether the feature with the given index number is a byte feature.
 boolean isByteFeature(java.lang.String featureName)
          Determine whether the feature with the given name is a byte feature.
 boolean isContinuousFeature(int index)
          Determine whether the feature with the given index number is a continuous feature.
 boolean isContinuousFeature(java.lang.String featureName)
          Determine whether the feature with the given name is a continuous feature.
 boolean isShortFeature(int index)
          Determine whether the feature with the given index number is a short feature.
 boolean isShortFeature(java.lang.String featureName)
          Determine whether the feature with the given name is a short feature.
 FeatureVector readFeatureVector(int currentUnitIndex, java.io.DataInput input)
          Create a feature vector consistent with this feature definition by reading the data from the given input.
 java.lang.String toFeatureString(FeatureVector fv)
          Convert a feature vector into a String representation.
 FeatureVector toFeatureVector(int unitIndex, java.lang.String featureString)
          Create a feature vector consistent with this feature definition by reading the data from a String representation.
 void writeBinaryTo(java.io.DataOutput out)
          Write this feature definition in binary format to the given output.
 void writeTo(java.io.PrintWriter out, boolean writeWeights)
          Export this feature definition in the text format which can also be read by this class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BYTEFEATURES

public static final java.lang.String BYTEFEATURES
See Also:
Constant Field Values

SHORTFEATURES

public static final java.lang.String SHORTFEATURES
See Also:
Constant Field Values

CONTINUOUSFEATURES

public static final java.lang.String CONTINUOUSFEATURES
See Also:
Constant Field Values

WEIGHT_SEPARATOR

public static final char WEIGHT_SEPARATOR
See Also:
Constant Field Values

EDGEFEATURE

public static final java.lang.String EDGEFEATURE
See Also:
Constant Field Values

EDGEFEATURE_START

public static final java.lang.String EDGEFEATURE_START
See Also:
Constant Field Values

EDGEFEATURE_END

public static final java.lang.String EDGEFEATURE_END
See Also:
Constant Field Values

NULLVALUE

public static final java.lang.String NULLVALUE
See Also:
Constant Field Values
Constructor Detail

FeatureDefinition

public FeatureDefinition(java.io.BufferedReader input,
                         boolean readWeights)
                  throws java.io.IOException
Create a feature definition object, reading textual data from the given BufferedReader.

Parameters:
input - a BufferedReader from which a textual feature definition can be read.
readWeights - a boolean indicating whether or not to read weights from input. If weights are read, they will be normalized so that they sum to one.
Throws:
java.io.IOException - if a reading problem occurs

FeatureDefinition

public FeatureDefinition(java.io.DataInput input)
                  throws java.io.IOException
Create a feature definition object, reading binary data from the given DataInput.

Parameters:
input - a DataInputStream or a RandomAccessFile from which a binary feature definition can be read.
Throws:
java.io.IOException - if a reading problem occurs
Method Detail

writeBinaryTo

public void writeBinaryTo(java.io.DataOutput out)
                   throws java.io.IOException
Write this feature definition in binary format to the given output.

Parameters:
out - a DataOutputStream or RandomAccessFile to which the FeatureDefinition should be written.
Throws:
java.io.IOException - if a problem occurs while writing.

getNumberOfFeatures

public int getNumberOfFeatures()
Get the total number of features.

Returns:
the number of features

getNumberOfByteFeatures

public int getNumberOfByteFeatures()
Get the number of byte features.

Returns:
the number of features

getNumberOfShortFeatures

public int getNumberOfShortFeatures()
Get the number of short features.

Returns:
the number of features

getNumberOfContinuousFeatures

public int getNumberOfContinuousFeatures()
Get the number of continuous features.

Returns:
the number of features

getWeight

public float getWeight(int featureIndex)
For the feature with the given index, return the weight.

Parameters:
featureIndex -
Returns:
a non-negative weight.

getWeightFunctionName

public java.lang.String getWeightFunctionName(int featureIndex)
Get the name of any weighting function associated with the given feature index. For byte-valued and short-valued features, this method will always return null; for continuous features, the method will return the name of a weighting function, or null.

Parameters:
featureIndex -
Returns:
the name of a weighting function, or null

getFeatureName

public java.lang.String getFeatureName(int index)
Translate between a feature index and a feature name.

Parameters:
index - a feature index, as could be used to access a feature value in a FeatureVector.
Returns:
the name of the feature corresponding to the index
Throws:
java.lang.IndexOutOfBoundsException - if index<0 or index>getNumberOfFeatures()

getFeatureNameArray

public java.lang.String[] getFeatureNameArray(int[] index)
Translate between an array of feature indexes and an array of feature names.

Parameters:
index - an array of feature indexes, as could be used to access a feature value in a FeatureVector.
Returns:
an array with the name of the features corresponding to the index
Throws:
java.lang.IndexOutOfBoundsException - if any of the indexes is <0 or >getNumberOfFeatures()

getFeatureNames

public java.lang.String getFeatureNames()
List all feature names, separated by white space, in their order of definition.

Returns:

isByteFeature

public boolean isByteFeature(java.lang.String featureName)
Determine whether the feature with the given name is a byte feature.

Parameters:
featureName -
Returns:
true if the feature is a byte feature, false if the feature is not known or is not a byte feature

isByteFeature

public boolean isByteFeature(int index)
Determine whether the feature with the given index number is a byte feature.

Parameters:
featureIndex -
Returns:
true if the feature is a byte feature, false if the feature is not a byte feature or is invalid

isShortFeature

public boolean isShortFeature(java.lang.String featureName)
Determine whether the feature with the given name is a short feature.

Parameters:
featureName -
Returns:
true if the feature is a short feature, false if the feature is not known or is not a short feature

isShortFeature

public boolean isShortFeature(int index)
Determine whether the feature with the given index number is a short feature.

Parameters:
featureIndex -
Returns:
true if the feature is a short feature, false if the feature is not a short feature or is invalid

isContinuousFeature

public boolean isContinuousFeature(java.lang.String featureName)
Determine whether the feature with the given name is a continuous feature.

Parameters:
featureName -
Returns:
true if the feature is a continuous feature, false if the feature is not known or is not a continuous feature

isContinuousFeature

public boolean isContinuousFeature(int index)
Determine whether the feature with the given index number is a continuous feature.

Parameters:
featureIndex -
Returns:
true if the feature is a continuous feature, false if the feature is not a continuous feature or is invalid

getFeatureIndex

public int getFeatureIndex(java.lang.String featureName)
Translate between a feature name and a feature index.

Parameters:
featureName - a valid feature name
Returns:
a feature index, as could be used to access a feature value in a FeatureVector.
Throws:
java.lang.IllegalArgumentException - if the feature name is unknown.

getFeatureIndexArray

public int[] getFeatureIndexArray(java.lang.String[] featureName)
Translate between an array of feature names and an array of feature indexes.

Parameters:
featureName - an array of valid feature names
Returns:
an array of feature indexes, as could be used to access a feature value in a FeatureVector.
Throws:
java.lang.IllegalArgumentException - if one of the feature names is unknown.

getNumberOfValues

public int getNumberOfValues(int featureIndex)
Get the number of possible values for the feature with the given index number. This method must only be called for byte-valued or short-valued features.

Parameters:
featureIndex - the index number of the feature.
Returns:
for byte-valued and short-valued features, return the number of values.
Throws:
java.lang.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex >= getNumberOfByteFeatures() + getNumberOfShortFeatures().

getPossibleValues

public java.lang.String[] getPossibleValues(int featureIndex)
Get the list of possible String values for the feature with the given index number. This method must only be called for byte-valued or short-valued features. The position in the String array corresponds to the byte or short value of the feature obtained from a FeatureVector.

Parameters:
featureIndex - the index number of the feature.
Returns:
for byte-valued and short-valued features, return the array of String values.
Throws:
java.lang.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex >= getNumberOfByteFeatures() + getNumberOfShortFeatures().

getFeatureValueAsString

public java.lang.String getFeatureValueAsString(int featureIndex,
                                                int value)
For the feature with the given index number, translate its byte or short value to its String value. This method must only be called for byte-valued or short-valued features.

Parameters:
featureIndex - the index number of the feature.
value - the feature value. This must be in the range of acceptable values for the given feature.
Returns:
for byte-valued and short-valued features, return the String representation of the feature value.
Throws:
java.lang.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex >= getNumberOfByteFeatures() + getNumberOfShortFeatures()
java.lang.IndexOutOfBoundsException - if value is not a legal value for this feature

getFeatureValueAsByte

public byte getFeatureValueAsByte(java.lang.String featureName,
                                  java.lang.String value)
For the feature with the given name, translate its String value to its byte value. This method must only be called for byte-valued features.

Parameters:
featureName - the name of the feature.
value - the feature value. This must be among the acceptable values for the given feature.
Returns:
for byte-valued features, return the byte representation of the feature value.
Throws:
java.lang.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a byte-valued feature.
java.lang.IllegalArgumentException - if value is not a legal value for this feature

getFeatureValueAsByte

public byte getFeatureValueAsByte(int featureIndex,
                                  java.lang.String value)
For the feature with the given index number, translate its String value to its byte value. This method must only be called for byte-valued features.

Parameters:
featureName - the name of the feature.
value - the feature value. This must be among the acceptable values for the given feature.
Returns:
for byte-valued features, return the byte representation of the feature value.
Throws:
java.lang.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a byte-valued feature.
java.lang.IllegalArgumentException - if value is not a legal value for this feature

getFeatureValueAsShort

public short getFeatureValueAsShort(java.lang.String featureName,
                                    java.lang.String value)
For the feature with the given name, translate its String value to its short value. This method must only be called for short-valued features.

Parameters:
featureName - the name of the feature.
value - the feature value. This must be among the acceptable values for the given feature.
Returns:
for short-valued features, return the short representation of the feature value.
Throws:
java.lang.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a short-valued feature.
java.lang.IllegalArgumentException - if value is not a legal value for this feature

getFeatureValueAsShort

public short getFeatureValueAsShort(int featureIndex,
                                    java.lang.String value)
For the feature with the given name, translate its String value to its short value. This method must only be called for short-valued features.

Parameters:
featureName - the name of the feature.
value - the feature value. This must be among the acceptable values for the given feature.
Returns:
for short-valued features, return the short representation of the feature value.
Throws:
java.lang.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a short-valued feature.
java.lang.IllegalArgumentException - if value is not a legal value for this feature

featureEquals

public boolean featureEquals(FeatureDefinition other)
Determine whether two feature definitions are equal, with respect to number, names, and possible values of the three kinds of features (byte-valued, short-valued, continuous). This method does not compare any weights.

Parameters:
other - the feature definition to compare to
Returns:
true if all features and values are identical, false otherwise

featureEqualsAnalyse

public java.lang.String featureEqualsAnalyse(FeatureDefinition other)
An extension of the previous method.


equals

public boolean equals(FeatureDefinition other)
Determine whether two feature definitions are equal, regarding both the actual feature definitions and the weights. The comparison of weights will succeed if both have no weights or if both have exactly the same weights

Parameters:
other - the feature definition to compare to
Returns:
true if all features, values and weights are identical, false otherwise
See Also:
featureEquals(FeatureDefinition)

toFeatureVector

public FeatureVector toFeatureVector(int unitIndex,
                                     java.lang.String featureString)
Create a feature vector consistent with this feature definition by reading the data from a String representation. In that String, the String values for each feature must be separated by white space. For example, this format is created by toFeatureString(FeatureVector).

Parameters:
unitIndex - an index number to assign to the feature vector
featureString - the string representation of a feature vector.
Returns:
the feature vector created from the String.
Throws:
java.lang.IllegalArgumentException - if the feature values listed are not consistent with the feature definition.
See Also:
toFeatureString(FeatureVector)

readFeatureVector

public FeatureVector readFeatureVector(int currentUnitIndex,
                                       java.io.DataInput input)
                                throws java.io.IOException
Create a feature vector consistent with this feature definition by reading the data from the given input.

Parameters:
input - a DataInputStream or RandomAccessFile to read the feature values from.
Returns:
a FeatureVector.
Throws:
java.io.IOException

createEdgeFeatureVector

public FeatureVector createEdgeFeatureVector(int unitIndex,
                                             boolean start)
Create a feature vector that marks a start or end of a unit. All feature values are set to the neutral value "0", except for the EDGEFEATURE, which is set to start if start == true, to end otherwise.

Parameters:
unitIndex - index of the unit
start - true creates a start vector, false creates an end vector.
Returns:
a feature vector representing an edge.

toFeatureString

public java.lang.String toFeatureString(FeatureVector fv)
Convert a feature vector into a String representation.

Parameters:
fv - a feature vector which must be consistent with this feature definition.
Returns:
a String containing the String values of all features, separated by white space.
Throws:
java.lang.IllegalArgumentException - if the feature vector is not consistent with this feature definition
java.lang.IndexOutOfBoundsException - if any value of the feature vector is not consistent with this feature definition

writeTo

public void writeTo(java.io.PrintWriter out,
                    boolean writeWeights)
Export this feature definition in the text format which can also be read by this class.

Parameters:
out - the destination of the data
writeWeights - whether to write weights before every line

generateAllDotDescForWagon

public void generateAllDotDescForWagon(java.io.PrintWriter out)
Export this feature definition in the "all.desc" format which can be read by wagon.

Parameters:
out - the destination of the data

generateAllDotDescForWagon

public void generateAllDotDescForWagon(java.io.PrintWriter out,
                                       java.util.Set featuresToIgnore)
Export this feature definition in the "all.desc" format which can be read by wagon.

Parameters:
out - the destination of the data
featuresToIgnore - a set of Strings containing the names of features that wagon should ignore. Can be null.

generateFeatureWeightsFile

public void generateFeatureWeightsFile(java.io.PrintWriter out)
Print this feature definition plus weights to a .txt file

Parameters:
out - the destination of the data

diff

public static int diff(FeatureVector v1,
                       FeatureVector v2)
Compares two feature vectors in terms of how many discrete features they have in common. WARNING: this assumes that the feature vectors are issued from the same FeatureDefinition; only the number of features is checked for compatibility.

Parameters:
v1 - A feature vector.
v2 - Another feature vector to compare v1 with.
Returns:
The number of common features.