au.id.pbw.hyfo.hyph
Class Alphabet

java.lang.Object
  extended by au.id.pbw.hyfo.hyph.Alphabet
All Implemented Interfaces:
Serializable

public class Alphabet
extends Object
implements Serializable

Represents a alphabet based on a set of character classes.

Author:
Peter B. West
See Also:
Serialized Form

Field Summary
protected  int[] char_class2canon
          Maps a character class code to its canonical codepoint.
protected  Map<Integer,Integer> codept2char_class
          Maps a codepoint to a character class code.
protected  HyphenBreak default_hyphen
          A HyphenBreak with default values for this alphabet.
protected  Modifier default_modifier
          The default Modifier for this alphabet.
 int hyphen_codept
          The hyphen character for this alphabet.
 int max_codepoint
          The maximum codepoint of any alphabet character.
 byte min_after
          The mimimum number of character following a hyphen.
 byte min_before
          The minimum number of character preceding a hyphen.
 int min_codepoint
          The minimum codepoint of any alphabet character.
protected  BreakMinima minima
          The minimum number of characters allowed before and after hyphenation for this Alphabet.
protected  HyphenBreak neutral_hyphen
          A HyphenBreak with neutral values.
 boolean uses_codepoints
          Does this alphabet use codepoints rather than characters?
 
Constructor Summary
Alphabet(int hyphen_codept, BreakMinima minima, Map<Integer,Integer> codept2char_class, int[] char_class2canon, boolean uses_codepoints)
          Creates a new instance of Alphabet from the given codepoint to character class code mapping and the array of codepoints indexed by the character class code for each valid codepoint.
 
Method Summary
 int canonical_codept_of_char_class(int char_class)
          Returns the canonical codepoint represented by the given character class char_class, or NUL if no codepoint corresponds to the given class code value.
 int canonicalize(int codept)
          Returns the canonical codepoint equivalent to the argument codepoint in this alphabet.
 Integer char_class_of_codept(int codept)
          Returns the character class code, 0 <= cc_code < number of character classes, corresponding to the given codepoint.
 Map<Integer,Integer> get_codept2char_class_map()
          Gets the Map of codepoint to character class code.
 HyphenBreak get_default_hyphen()
          Gets a reference to the default_hyphen generated for this alphabet.
 Modifier get_default_modifier()
          Gets a reference to the default_modifier for this alphabet.
 HyphenBreak get_neutral_hyphen()
          Gets a reference to the neutral_hyphen generated for this alphabet.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

hyphen_codept

public final int hyphen_codept
The hyphen character for this alphabet.


uses_codepoints

public final boolean uses_codepoints
Does this alphabet use codepoints rather than characters?


minima

protected BreakMinima minima
The minimum number of characters allowed before and after hyphenation for this Alphabet.


min_before

public final byte min_before
The minimum number of character preceding a hyphen.


min_after

public final byte min_after
The mimimum number of character following a hyphen.


codept2char_class

protected Map<Integer,Integer> codept2char_class
Maps a codepoint to a character class code.


min_codepoint

public final int min_codepoint
The minimum codepoint of any alphabet character.


max_codepoint

public final int max_codepoint
The maximum codepoint of any alphabet character.


char_class2canon

protected int[] char_class2canon
Maps a character class code to its canonical codepoint.


default_modifier

protected final Modifier default_modifier
The default Modifier for this alphabet.


default_hyphen

protected final HyphenBreak default_hyphen
A HyphenBreak with default values for this alphabet. It is used, for example, to represent hyphen positions mark with a '-' character in exception words.


neutral_hyphen

protected final HyphenBreak neutral_hyphen
A HyphenBreak with neutral values. It is effectively a null hyphenation value used at non-hyphenating positions.

Constructor Detail

Alphabet

public Alphabet(int hyphen_codept,
                BreakMinima minima,
                Map<Integer,Integer> codept2char_class,
                int[] char_class2canon,
                boolean uses_codepoints)
Creates a new instance of Alphabet from the given codepoint to character class code mapping and the array of codepoints indexed by the character class code for each valid codepoint.

Parameters:
hyphen_codept - The codepoint of the hyphen character for this alphabet.
minima - The BreakMinima representing the minimum number of characters before and after a hyphen.
codept2char_class - A map from Integer codepoint values to corresponding character class codes.
char_class2canon - An int array of canonical codepoint values, indexed by character class code.
uses_codepoints - does this Alphabet use codepoints or characters?
Method Detail

get_default_modifier

public Modifier get_default_modifier()
Gets a reference to the default_modifier for this alphabet.

Returns:
a reference to the default Modifier generated for this alphabet instance.

get_default_hyphen

public HyphenBreak get_default_hyphen()
Gets a reference to the default_hyphen generated for this alphabet.

Returns:
a reference to the default HyphenBreak instance for this alphabet.

get_neutral_hyphen

public HyphenBreak get_neutral_hyphen()
Gets a reference to the neutral_hyphen generated for this alphabet.

Returns:
a reference to the neutral HyphenBreak instance for this alphabet.

get_codept2char_class_map

public Map<Integer,Integer> get_codept2char_class_map()
Gets the Map of codepoint to character class code.

Returns:
the map.

char_class_of_codept

public Integer char_class_of_codept(int codept)
Returns the character class code, 0 <= cc_code < number of character classes, corresponding to the given codepoint. Returns null if there is no corresponding character class.

Parameters:
codept - the codepoint.
Returns:
the character class code represented by this codepoint.

canonical_codept_of_char_class

public int canonical_codept_of_char_class(int char_class)
Returns the canonical codepoint represented by the given character class char_class, or NUL if no codepoint corresponds to the given class code value.

Parameters:
char_class - the character class char_class.
Returns:
the canonical codepoint represented by the char_class.

canonicalize

public int canonicalize(int codept)
Returns the canonical codepoint equivalent to the argument codepoint in this alphabet. If the codepoint is not represented in any character class, the unchanged codepoint is returned.

Parameters:
codept - the codepoint to canonicalize.
Returns:
the canonicalized codepoint.


Copyright © 2005-2006 Peter B. West.