This project has retired. For details please refer to its Attic page.
CBZip2InputStream (Apache Crunch 0.3.0-incubating API)

org.apache.crunch.io.text
Class CBZip2InputStream

java.lang.Object
  extended by java.io.InputStream
      extended by org.apache.crunch.io.text.CBZip2InputStream
All Implemented Interfaces:
Closeable, org.apache.hadoop.io.compress.bzip2.BZip2Constants

public class CBZip2InputStream
extends InputStream
implements org.apache.hadoop.io.compress.bzip2.BZip2Constants

An input stream that decompresses from the BZip2 format (without the file header chars) to be read as any other stream.

Author:
Keiron Liddle

Field Summary
 
Fields inherited from interface org.apache.hadoop.io.compress.bzip2.BZip2Constants
baseBlockSize, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB
 
Constructor Summary
CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream, int blockSize, long end)
           
 
Method Summary
 long getPos()
          getPos is used by the caller to know when the processing of the current InputSplit is complete.
 long getReadCount()
           
 long getReadLimit()
           
 int read()
           
 void setReadLimit(long readLimit)
           
 
Methods inherited from class java.io.InputStream
available, close, mark, markSupported, read, read, reset, skip
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CBZip2InputStream

public CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream,
                         int blockSize,
                         long end)
                  throws IOException
Throws:
IOException
Method Detail

getReadLimit

public long getReadLimit()

setReadLimit

public void setReadLimit(long readLimit)

getReadCount

public long getReadCount()

read

public int read()
         throws IOException
Specified by:
read in class InputStream
Throws:
IOException

getPos

public long getPos()
            throws IOException
getPos is used by the caller to know when the processing of the current InputSplit is complete. In this method, as we read each bzip block, we keep returning the beginning of the InputSplit as the return value until we hit a block which starts at a position >= end of current split. At that point we should set up retpos such that after a record is read, future getPos() calls will get a value > end of current split - this way we will read only one record out of that bzip block - the rest of the records from that bzip block should be read by the next map task while processing the next split

Returns:
Throws:
IOException


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.