shard(PCollection<T> pc,
int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files.
Creates a PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. This is useful for map-only jobs that process
lots of input files but only write out a small amount of input per task.
Parameters:
pc - The PCollection<T> to rebalance
numPartitions - The number of output partitions to create
Returns:
A rebalanced PCollection<T> with the same contents as the input