public class Shard extends Object
PCollection is balanced across reducers
and output files.| Constructor and Description |
|---|
Shard() |
| Modifier and Type | Method and Description |
|---|---|
static <T> PCollection<T> |
shard(PCollection<T> pc,
int numPartitions)
Creates a
PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. |
public static <T> PCollection<T> shard(PCollection<T> pc, int numPartitions)
PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. This is useful for map-only jobs that process
lots of input files but only write out a small amount of input per task.pc - The PCollection<T> to rebalancenumPartitions - The number of output partitions to createPCollection<T> with the same contents as the inputCopyright © 2015 The Apache Software Foundation. All Rights Reserved.