This project has retired. For details please refer to its Attic page.
TokenizerFactory (Apache Crunch 0.8.0 API)

org.apache.crunch.contrib.text
Class TokenizerFactory

java.lang.Object
  extended by org.apache.crunch.contrib.text.TokenizerFactory
All Implemented Interfaces:
Serializable

public class TokenizerFactory
extends Object
implements Serializable

Factory class that constructs Tokenizer instances for input strings that use a fixed set of delimiters, skip patterns, locales, and sets of indices to keep or drop.

See Also:
Serialized Form

Nested Class Summary
static class TokenizerFactory.Builder
          A class for constructing new TokenizerFactory instances using the Builder pattern.
 
Method Summary
static TokenizerFactory.Builder builder()
          Factory method for creating a TokenizerFactory.Builder instance.
 Tokenizer create(String input)
          Return a Scanner instance that wraps the input string and uses the delimiter, skip, and locale settings for this TokenizerFactory instance.
static TokenizerFactory getDefaultInstance()
          Returns a default TokenizerFactory that uses whitespace as a delimiter and does not skip any input fields.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getDefaultInstance

public static TokenizerFactory getDefaultInstance()
Returns a default TokenizerFactory that uses whitespace as a delimiter and does not skip any input fields.

Returns:
The default TokenizerFactory

create

public Tokenizer create(String input)
Return a Scanner instance that wraps the input string and uses the delimiter, skip, and locale settings for this TokenizerFactory instance.

Parameters:
input - The input string
Returns:
A new Scanner instance with appropriate settings

builder

public static TokenizerFactory.Builder builder()
Factory method for creating a TokenizerFactory.Builder instance.

Returns:
A new TokenizerFactory.Builder


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.