This project has retired. For details please refer to its Attic page.
Shard (Apache Crunch 0.8.0 API)

org.apache.crunch.lib
Class Shard

java.lang.Object
  extended by org.apache.crunch.lib.Shard

public class Shard
extends Object

Utilities for controlling how the data in a PCollection is balanced across reducers and output files.


Constructor Summary
Shard()
           
 
Method Summary
static
<T> PCollection<T>
shard(PCollection<T> pc, int numPartitions)
          Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Shard

public Shard()
Method Detail

shard

public static <T> PCollection<T> shard(PCollection<T> pc,
                                       int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files. This is useful for map-only jobs that process lots of input files but only write out a small amount of input per task.

Parameters:
pc - The PCollection<T> to rebalance
numPartitions - The number of output partitions to create
Returns:
A rebalanced PCollection<T> with the same contents as the input


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.