This project has retired. For details please refer to its Attic page.
org.apache.crunch.contrib.text (Apache Crunch 0.10.0 API)

Package org.apache.crunch.contrib.text

Interface Summary
Extractor<T> An interface for extracting a specific data type from a text string that is being processed by a Scanner object.

Class Summary
AbstractCompositeExtractor<T> Base class for Extractor instances that delegates the parsing of fields to other Extractor instances, primarily used for constructing composite records that implement the Tuple interface.
AbstractSimpleExtractor<T> Base class for the common case Extractor instances that construct a single object from a block of text stored in a String, with support for error handling and reporting.
Extractors Factory methods for constructing common Extractor types.
ExtractorStats Records the number of kind of errors that an Extractor encountered when parsing input data.
Parse Methods for parsing instances of PCollection<String> into PCollection's of strongly-typed tuples.
Tokenizer Manages a Scanner instance and provides support for returning only a subset of the fields returned by the underlying Scanner.
TokenizerFactory Factory class that constructs Tokenizer instances for input strings that use a fixed set of delimiters, skip patterns, locales, and sets of indices to keep or drop.
TokenizerFactory.Builder A class for constructing new TokenizerFactory instances using the Builder pattern.

