This project has retired. For details please refer to its Attic page.
HFileOutputFormatForCrunch (Apache Crunch 0.9.0 API)

org.apache.crunch.io.hbase
Class HFileOutputFormatForCrunch

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Object,org.apache.hadoop.hbase.KeyValue>
          extended by org.apache.crunch.io.hbase.HFileOutputFormatForCrunch

public class HFileOutputFormatForCrunch
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Object,org.apache.hadoop.hbase.KeyValue>

This is a thin wrapper of HFile.Writer. It only calls HFile.Writer#append(byte[], byte[]) when records are emitted. It only supports writing data into a single column family. Records MUST be sorted by their column qualifier, then timestamp reversely. All data are written into a single HFile. HBase's official HFileOutputFormat is not used, because it shuffles on row-key only and does in-memory sort at reducer side (so the size of output HFile is limited to reducer's memory). As crunch supports more complex and flexible MapReduce pipeline, we would prefer thin and pure OutputFormat here.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
 
Field Summary
static String HCOLUMN_DESCRIPTOR_KEY
           
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, PART
 
Constructor Summary
HFileOutputFormatForCrunch()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<Object,org.apache.hadoop.hbase.KeyValue> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

HCOLUMN_DESCRIPTOR_KEY

public static final String HCOLUMN_DESCRIPTOR_KEY
See Also:
Constant Field Values
Constructor Detail

HFileOutputFormatForCrunch

public HFileOutputFormatForCrunch()
Method Detail

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<Object,org.apache.hadoop.hbase.KeyValue> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                  throws IOException,
                                                                                                         InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Object,org.apache.hadoop.hbase.KeyValue>
Throws:
IOException
InterruptedException


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.