public class Shard extends Object
PCollection
is balanced across reducers
and output files.Constructor and Description |
---|
Shard() |
Modifier and Type | Method and Description |
---|---|
static <T> PCollection<T> |
shard(PCollection<T> pc,
int numPartitions)
Creates a
PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. |
public static <T> PCollection<T> shard(PCollection<T> pc, int numPartitions)
PCollection<T>
that has the same contents as its input argument but will
be written to a fixed number of output files. This is useful for map-only jobs that process
lots of input files but only write out a small amount of input per task.pc
- The PCollection<T>
to rebalancenumPartitions
- The number of output partitions to createPCollection<T>
with the same contents as the inputCopyright © 2016 The Apache Software Foundation. All rights reserved.