au.id.pbw.hyfo.hyph
Class HyphenatedWord

java.lang.Object
  extended by au.id.pbw.hyfo.hyph.HyphenatedWord

public class HyphenatedWord
extends Object

Represents a hyphenated-word, derived from an original word, with its hyphenation possibilities. Internally, the hyphenated-word is held as an array of character class codes derived from the Alphabet in which the hyphenator operates. Corresponding to the boundaries between the characters of the hyphenated-word (including the positions before the first character and after the last) is an array of HyphenBreaks. All but those corresponding to a potential breakpoint are null. The non-null HyphenBreaks include the strength of the break possibility between the corresponding codepoints. In the accompanying diagrams, these are represented by single digits giving their (odd-numbered) weight. Nominal weights in used in the diagrams.

The hyphenated-word can be retrieved as the original string. See get_word.

The associated breakpoint information is returned as an array of HyphenBreaks of the same length as the string. Non-null HyphenBreaks correspond to breakpoint opportunities immediately following the corresponding character in the string.
See get_string_breakpoints.

Author:
Peter B. West

Constructor Summary
HyphenatedWord(int[] char_classes, int[] char_indices, String word, HyphenBreak[] breakpoints, Alphabet alphabet)
          Creates a new instance of HyphenatedWord from the given arrays.
HyphenatedWord(String word)
          Creates a NULL instance of HyphenatedWord from the given word.
 
Method Summary
 int get_breakpoint_count()
          Gets the number of breakpoints in this HyphenatedWord.
 HyphenBreak[] get_char_classes_breakpoints()
          Gets the array of hyphenation possibilities corresponding to the codepoints of the original word.
 int[] get_fop_compatible_points()
          Gets the Fop-compatible array of breakpoint positions in the word.
 String get_fop_post_hyphen(int offset)
          Gets the suffix substring of the original word from the given offset, inclusive, to the end of the word.
 String get_fop_pre_hyphen(int offset)
          Gets the prefix substring of the original word up to, but excluding, the given offset.
 HyphenBreak[] get_string_breakpoints()
          Gets the array of hyphenation possibilities corresponding to the String representation of the word being hyphenated.
 String to_fop_string()
          Returns a fully-hyphenated string, mimicking the Fop Hyphenation.toString() method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HyphenatedWord

public HyphenatedWord(int[] char_classes,
                      int[] char_indices,
                      String word,
                      HyphenBreak[] breakpoints,
                      Alphabet alphabet)
Creates a new instance of HyphenatedWord from the given arrays.

The first argument is an array of int containing the character classes of the contents of the original word. A character class is an integer representing a set of codepoints from the alphabet which are equivalent in respect of hyphenation; for example the characters 'D' and 'd'. For Western alphabets, the upper and lower case versions of a character are equivalent for hyphenation.

The second argument is an array of int containing the offsets of the first character in the original word from which individual character classes were derived. This array is only relevant if the original word can contain supplementary characters, as indicated by the alphabet arument. Otherwise, it should be null.

For example, if the first character of the word is not a supplementary character, the second word is supplementary character, requiring two chars for its representation, then the first three entries in the indices array will be 0 (pointing to the char offset of the first character of the word), 1 (pointing to the char offset of the second character of the word) and 3 (pointing to the char offset of the third character of the word). The third character will be offset by 2 from the second because the second requires 2 supplementary characters for its representation.

The third argument is the original word.

The fourth argument is the array of HyphenBreaks corresponding to the character classes of the original word.

N.B. Breakpoints in the array breakpoints represent a breakpoint following the corresponding character class of the original word. The array of breakpoints must be the same length as the array of character classes, FIXME and positions which do not correspond to a hyphenation point must be null.

Parameters:
char_classes - the array of character classes representing the word.
char_indices - the array of indices from char_classes to corresponding offset in the original word string. May be null.
word - the original word being hyphenated.
breakpoints - the array of HyphenBreaks.
alphabet - the Alphabet of characters which are recognized for hyphenation.

HyphenatedWord

public HyphenatedWord(String word)
Creates a NULL instance of HyphenatedWord from the given word. This hyphenator may be used when no hyphenation can be generated for a word (due to the presence of non-Alphabet characters, for example).

Parameters:
word - the word to hyphenate.
Method Detail

get_char_classes_breakpoints

public HyphenBreak[] get_char_classes_breakpoints()
Gets the array of hyphenation possibilities corresponding to the codepoints of the original word. Each non-null position in the returned array of HyphenBreaks represents a hyphenation possibility immediately preceding the codepoint to which it corresponds.

Returns:
the array of hyphenation possibilities corresponding to the codepoints comprising the original word.

get_string_breakpoints

public HyphenBreak[] get_string_breakpoints()
Gets the array of hyphenation possibilities corresponding to the String representation of the word being hyphenated. Each non-null position in the returned array of HyphenBreaks represents a hyphenation possibility immediately preceding the char to which it corresponds.

Returns:
the array of hyphenation possibilities corresponding to the String representation of the word being hyphenated.

get_fop_compatible_points

public int[] get_fop_compatible_points()
Gets the Fop-compatible array of breakpoint positions in the word.

This method corresponds to the method Hyphenation.getHyphenationPoints(). In Fop, the breakpoint position is set on the character following the breakpoint.

Returns:
an array containing the indices of characters following hyphenation points in the original word.

get_breakpoint_count

public int get_breakpoint_count()
Gets the number of breakpoints in this HyphenatedWord.

This method corresponds to the Fop method Hyphenation.length(). It is strongly recommended that this method be used to determine the effective limit of the array returned by get_fop_compatible_points().

Returns:
the number of breakpoints.

get_fop_pre_hyphen

public String get_fop_pre_hyphen(int offset)
Gets the prefix substring of the original word up to, but excluding, the given offset. If the offset is out of range, returns an empty string.

This method is compatible with the Fop method Hyphenation.getPreHyphenText().

Parameters:
offset - of the character following the prefix.
Returns:
the string prefix of the original word up to but excluding the given offset.
See Also:
get_fop_compatible_points(), get_breakpoint_count(), get_fop_post_hyphen( int offset )

get_fop_post_hyphen

public String get_fop_post_hyphen(int offset)
Gets the suffix substring of the original word from the given offset, inclusive, to the end of the word. If the offset is out of range, returns an empty string.

This method is compatible with the Fop method Hyphenation.getPostHyphenText().

Parameters:
offset - of the first position of the suffix within the original word being hyphenated.
Returns:
the string suffix from the given offset, inclusive.
See Also:
get_fop_compatible_points(), get_breakpoint_count(), get_fop_pre_hyphen( int offset )

to_fop_string

public String to_fop_string()
Returns a fully-hyphenated string, mimicking the Fop Hyphenation.toString() method. Breakpoints are marked with '-', and there is no provision for spelling changes due to hyphenation.

Returns:
the word with hyphenation points marked by '-'.


Copyright © 2005-2006 Peter B. West.