au.id.pbw.hyfo.hyph
Class HyphenationTreeBuilder

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by au.id.pbw.hyfo.hyph.HyphenationTreeBuilder
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class HyphenationTreeBuilder
extends DefaultHandler

Builds a HyphenationTree from an input pattern file.

Author:
pbw

Nested Class Summary
static class HyphenationTreeBuilder.Element
          An enum of the elements n a resource file.
 
Field Summary
protected  AlphabetBuilder alpha_builder
          The AlphabetBuilder used to construct the Alphabet instance for use with this tree.
protected  Alphabet alphabet
          The Alphabet for the tree being constructed.
protected  boolean end_seen
          Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).
protected  HyphenDataCache exception_data
          The TernaryTreeDataStore associated with the exceptions tree.
protected  TernaryTree exceptions
          The set of exceptions for this tree.
protected  HashMap<String,HyphenBreak> hyphen_breaks
          A cache of HyphenBreak objects used in constructing the tree.
protected  int hyphen_char
          The codepoint of the hyphen character used with the Alphabet of the HyphenationTree being built.
protected  Map<String,Modifier> modifiers
          The modifiers for this tree.
protected  HyphenDataCache pattern_data
          The TernaryTreeDataStore associated with the patterns tree.
protected  TernaryTree patterns
          The set of patterns for this tree.
protected  boolean start_seen
          Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).
protected  ArrayList<HyphenBreak> weights
          Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).
 
Constructor Summary
HyphenationTreeBuilder()
          Creates a new instance of HyphenationTreeBuilder
 
Method Summary
protected  TernaryTree build_tree(Map<String,HyphenBreak[]> map, HyphenDataCache data_cache)
          Builds and returns a TernaryTree mapping string keys to arrays of HyphenBreaks.
 void characters(char[] ch, int start, int length)
          Processes a node of characters from the resource file.
 void endElement(String uri, String localName, String qName)
          Process an end element from the resource file.
 HyphenationTree get_hyphenation_tree(File resource_file)
          Builds and returns a TernaryTree from the resource file provided.
protected  HyphenBreak get_mod_ref(ModifierReference ref, InstanceType context)
          Gets the HyphenBreak corresponding to the Modifier named by this ModifierReference, in the named context.
protected  Modifier get_modifier(String id)
          Gets a known Modifier with a given id, which is generally the no-break string.
 void ignorableWhitespace(char[] ch, int start, int length)
          Processes ignorable white space from the resource file.
protected  HyphenBreak obtain_hyphen_break(HyphenBreak hyph_break)
          Returns a HyphenBreak equal to the given argument.
protected  void process_instance(InstanceType type, List<PatternElement> list, Alphabet alphabet, Map<String,HyphenBreak[]> map)
          Process a pattern instance, adding it to the given map.
protected  void process_text_element(StringBuilder text_chars_as_classes, TextElement text_el, InstanceType type)
          Process a text element from an exception or hyphenation pattern.
protected  void process_text_instance(InstanceType type, TextElement text_el, Alphabet alphabet, Map<String,HyphenBreak[]> map)
          Process a text instance, adding it to the given map.
 void startElement(String uri, String localName, String qName, Attributes attributes)
          Process a start element from the resource file.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

alpha_builder

protected AlphabetBuilder alpha_builder
The AlphabetBuilder used to construct the Alphabet instance for use with this tree.


alphabet

protected Alphabet alphabet
The Alphabet for the tree being constructed. All of the text to be hyphenated by the tree is expressed in this alphabet. Text in any codepoints not in this Alphabet cannot be hyphenated by the HyphenationTree being constructed.


hyphen_char

protected int hyphen_char
The codepoint of the hyphen character used with the Alphabet of the HyphenationTree being built.


modifiers

protected Map<String,Modifier> modifiers
The modifiers for this tree.


exceptions

protected TernaryTree exceptions
The set of exceptions for this tree.


exception_data

protected HyphenDataCache exception_data
The TernaryTreeDataStore associated with the exceptions tree.


patterns

protected TernaryTree patterns
The set of patterns for this tree.


pattern_data

protected HyphenDataCache pattern_data
The TernaryTreeDataStore associated with the patterns tree.


hyphen_breaks

protected HashMap<String,HyphenBreak> hyphen_breaks
A cache of HyphenBreak objects used in constructing the tree.


start_seen

protected boolean start_seen
Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).


end_seen

protected boolean end_seen
Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).


weights

protected ArrayList<HyphenBreak> weights
Used by process_text_element(java.lang.StringBuilder, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.InstanceType).

Constructor Detail

HyphenationTreeBuilder

public HyphenationTreeBuilder()
Creates a new instance of HyphenationTreeBuilder

Method Detail

get_hyphenation_tree

public HyphenationTree get_hyphenation_tree(File resource_file)
                                     throws SAXException,
                                            IOException,
                                            ParserConfigurationException
Builds and returns a TernaryTree from the resource file provided.

Parameters:
resource_file - the resource file.
Returns:
the constructed TernaryTree.
Throws:
SAXException - if an error occurs during the SAX parsing of the input file.
IOException - if an error occurs during the reading of the resource file.
ParserConfigurationException - if an error occurs during the configuration of the SAX parser.

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
Processes ignorable white space from the resource file.

Specified by:
ignorableWhitespace in interface ContentHandler
Overrides:
ignorableWhitespace in class DefaultHandler
Parameters:
ch - An array of chars.
start - the start point of the white space in the ch array.
length - the length of the white space in the ch array.

characters

public void characters(char[] ch,
                       int start,
                       int length)
Processes a node of characters from the resource file.

Specified by:
characters in interface ContentHandler
Overrides:
characters in class DefaultHandler
Parameters:
ch - an array of char.
start - the start point of the @code characters} in the ch array.
length - the length of the @code characters} in the ch array.

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
Process an end element from the resource file.

Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class DefaultHandler
Parameters:
uri - the namespace URI of the element.
localName - the local name fo the element.
qName - the qualified name of the element.

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes attributes)
Process a start element from the resource file.

Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class DefaultHandler
Parameters:
uri - the namespace URI of the element.
localName - the local name fo the element.
qName - the qualified name of the element.
attributes - the attributes defiend on the element.

get_modifier

protected Modifier get_modifier(String id)
Gets a known Modifier with a given id, which is generally the no-break string.

Parameters:
id - the unique name of the modifier.
Returns:
the modifier associated with the id.

obtain_hyphen_break

protected HyphenBreak obtain_hyphen_break(HyphenBreak hyph_break)
Returns a HyphenBreak equal to the given argument. The purpose is to reduce the object count in the hyphenation tables.

If a HyphenBreak with a key equal to the argument is not already present in the Map hyphen_breaks, the argument is inserted into the map and returned; else the matching element from the map is returned.

Parameters:
hyph_break - the required HyphenBreak.
Returns:
the required HyphenBreak.

get_mod_ref

protected HyphenBreak get_mod_ref(ModifierReference ref,
                                  InstanceType context)
                           throws HyphenationException
Gets the HyphenBreak corresponding to the Modifier named by this ModifierReference, in the named context. ModifierReferences occur in either exception or pattern set contexts. The context is used only to provide a more meaningful log message, if required.

Parameters:
ref - the ModifierReference.
context - a String naming the context.
Returns:
the corresponding HyphenBreak.
Throws:
HyphenationException - if the XML data is invalid.

build_tree

protected TernaryTree build_tree(Map<String,HyphenBreak[]> map,
                                 HyphenDataCache data_cache)
                          throws HyphenationException
Builds and returns a TernaryTree mapping string keys to arrays of HyphenBreaks.

Parameters:
data_cache - the HyphenDataCache managing the data for the tree.
map - a map from a string key to a HyphenBreak array value.
Returns:
the TernaryTree.
Throws:
HyphenationException - is an error occurs while building te tree.

process_instance

protected void process_instance(InstanceType type,
                                List<PatternElement> list,
                                Alphabet alphabet,
                                Map<String,HyphenBreak[]> map)
                         throws HyphenationException
Process a pattern instance, adding it to the given map. Because a pattern instance may contain ModifierReferences, the PatternElements are passed in a list.

Parameters:
list - the List containing the PatternElements making up this instance.
type - the type of instance.
alphabet - the Alphabet for the instance.
map - the map to which the instance details are added.
Throws:
HyphenationException - if the XML data is invalid.

process_text_instance

protected void process_text_instance(InstanceType type,
                                     TextElement text_el,
                                     Alphabet alphabet,
                                     Map<String,HyphenBreak[]> map)
                              throws HyphenationException
Process a text instance, adding it to the given map. A text instance contains no ModifierReferences, so its is simpler to process than a pattern instance, which may contain such references.

Parameters:
type - the type of instance.
text_el - the TextElement being added to the tree.
alphabet - the Alphabet for the instance.
map - the map to which the instance details are added.
Throws:
HyphenationException - if the XML data is invalid.

process_text_element

protected void process_text_element(StringBuilder text_chars_as_classes,
                                    TextElement text_el,
                                    InstanceType type)
                             throws HyphenationException
Process a text element from an exception or hyphenation pattern. Text element processing is common to process_instance(au.id.pbw.hyfo.hyph.InstanceType, java.util.List, au.id.pbw.hyfo.hyph.Alphabet, java.util.Map) and process_text_instance(au.id.pbw.hyfo.hyph.InstanceType, au.id.pbw.hyfo.hyph.TextElement, au.id.pbw.hyfo.hyph.Alphabet, java.util.Map).

Parameters:
text_chars_as_classes - Stringbuilder to construct the string from the text element with the original characters mapped to character classes.
text_el - the text element to process.
type - of element - exception or pattern.
Throws:
HyphenationException - if any errors are detected in the format or content of the text of the exception or pattern.


Copyright © 2005-2006 Peter B. West.