This project has retired. For details please refer to its Attic page.
AvroParquetFileSource (Apache Crunch 0.9.0 API)

org.apache.crunch.io.parquet
Class AvroParquetFileSource<T extends org.apache.avro.generic.IndexedRecord>

java.lang.Object
  extended by org.apache.crunch.io.impl.FileSourceImpl<T>
      extended by org.apache.crunch.io.parquet.AvroParquetFileSource<T>
All Implemented Interfaces:
ReadableSource<T>, Source<T>

public class AvroParquetFileSource<T extends org.apache.avro.generic.IndexedRecord>
extends FileSourceImpl<T>
implements ReadableSource<T>


Nested Class Summary
static class AvroParquetFileSource.Builder<T extends org.apache.avro.generic.IndexedRecord>
          Helper class for constructing an AvroParquetFileSource that only reads a subset of the fields defined in an Avro schema.
 
Field Summary
 
Fields inherited from class org.apache.crunch.io.impl.FileSourceImpl
inputBundle, path, paths, ptype
 
Constructor Summary
AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths, AvroType<T> ptype)
           
AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths, AvroType<T> ptype, Class<? extends parquet.filter.UnboundRecordFilter> filterClass)
           
AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths, AvroType<T> ptype, org.apache.avro.Schema schema)
           
AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths, AvroType<T> ptype, org.apache.avro.Schema schema, Class<? extends parquet.filter.UnboundRecordFilter> filterClass)
           
AvroParquetFileSource(org.apache.hadoop.fs.Path path, AvroType<T> ptype)
           
AvroParquetFileSource(org.apache.hadoop.fs.Path path, AvroType<T> ptype, org.apache.avro.Schema schema)
           
 
Method Summary
 ReadableData<T> asReadable()
           
static
<T extends org.apache.avro.specific.SpecificRecord>
AvroParquetFileSource.Builder<T>
builder(Class<T> clazz)
           
static AvroParquetFileSource.Builder<org.apache.avro.generic.GenericRecord> builder(org.apache.avro.Schema schema)
           
 Converter<?,?,?,?> getConverter()
          Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.
protected  org.apache.crunch.io.parquet.AvroParquetFileReaderFactory<T> getFileReaderFactory(AvroType<T> ptype)
           
 org.apache.avro.Schema getProjectedSchema()
           
 Iterable<T> read(org.apache.hadoop.conf.Configuration conf)
          Returns an Iterable that contains the contents of this source.
 String toString()
           
 
Methods inherited from class org.apache.crunch.io.impl.FileSourceImpl
configureSource, equals, getBundle, getLastModifiedAt, getPath, getPaths, getSize, getType, hashCode, inputConf, pathsAsString, read
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.Source
configureSource, getLastModifiedAt, getSize, getType, inputConf
 

Constructor Detail

AvroParquetFileSource

public AvroParquetFileSource(org.apache.hadoop.fs.Path path,
                             AvroType<T> ptype)

AvroParquetFileSource

public AvroParquetFileSource(org.apache.hadoop.fs.Path path,
                             AvroType<T> ptype,
                             org.apache.avro.Schema schema)

AvroParquetFileSource

public AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths,
                             AvroType<T> ptype)

AvroParquetFileSource

public AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths,
                             AvroType<T> ptype,
                             org.apache.avro.Schema schema)

AvroParquetFileSource

public AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths,
                             AvroType<T> ptype,
                             Class<? extends parquet.filter.UnboundRecordFilter> filterClass)

AvroParquetFileSource

public AvroParquetFileSource(List<org.apache.hadoop.fs.Path> paths,
                             AvroType<T> ptype,
                             org.apache.avro.Schema schema,
                             Class<? extends parquet.filter.UnboundRecordFilter> filterClass)
Method Detail

getProjectedSchema

public org.apache.avro.Schema getProjectedSchema()

read

public Iterable<T> read(org.apache.hadoop.conf.Configuration conf)
                                                               throws IOException
Description copied from interface: ReadableSource
Returns an Iterable that contains the contents of this source.

Specified by:
read in interface ReadableSource<T extends org.apache.avro.generic.IndexedRecord>
Parameters:
conf - The current Configuration instance
Returns:
the contents of this Source as an Iterable instance
Throws:
IOException

asReadable

public ReadableData<T> asReadable()
Specified by:
asReadable in interface ReadableSource<T extends org.apache.avro.generic.IndexedRecord>
Returns:
a ReadableData instance containing the data referenced by this ReadableSource.

getFileReaderFactory

protected org.apache.crunch.io.parquet.AvroParquetFileReaderFactory<T> getFileReaderFactory(AvroType<T> ptype)

getConverter

public Converter<?,?,?,?> getConverter()
Description copied from interface: Source
Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.

Specified by:
getConverter in interface Source<T extends org.apache.avro.generic.IndexedRecord>
Overrides:
getConverter in class FileSourceImpl<T extends org.apache.avro.generic.IndexedRecord>

toString

public String toString()
Overrides:
toString in class FileSourceImpl<T extends org.apache.avro.generic.IndexedRecord>

builder

public static <T extends org.apache.avro.specific.SpecificRecord> AvroParquetFileSource.Builder<T> builder(Class<T> clazz)

builder

public static AvroParquetFileSource.Builder<org.apache.avro.generic.GenericRecord> builder(org.apache.avro.Schema schema)


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.