| Package | Description | 
|---|---|
| org.apache.crunch | Client-facing API and core abstractions. | 
| org.apache.crunch.contrib.bloomfilter | Support for creating Bloom Filters. | 
| org.apache.crunch.contrib.text | |
| org.apache.crunch.fn | Commonly used functions for manipulating collections. | 
| org.apache.crunch.impl.dist | |
| org.apache.crunch.impl.dist.collect | |
| org.apache.crunch.impl.mem | In-memory Pipeline implementation for rapid prototyping and testing. | 
| org.apache.crunch.impl.spark | |
| org.apache.crunch.impl.spark.collect | |
| org.apache.crunch.impl.spark.fn | |
| org.apache.crunch.kafka | |
| org.apache.crunch.kafka.inputformat | |
| org.apache.crunch.lambda | Alternative Crunch API using Java 8 features to allow construction of pipelines using lambda functions and method
 references. | 
| org.apache.crunch.lib | Joining, sorting, aggregating, and other commonly used functionality. | 
| org.apache.crunch.lib.join | Inner and outer joins on collections. | 
| org.apache.crunch.types | Common functionality for business object serialization. | 
| org.apache.crunch.types.avro | Business object serialization using Apache Avro. | 
| org.apache.crunch.types.writable | Business object serialization using Hadoop's Writables framework. | 
| org.apache.crunch.util | An assorted set of utilities. | 
| Modifier and Type | Method and Description | 
|---|---|
| static <T,U> Pair<T,U> | Pair. of(T first,
  U second) | 
| Modifier and Type | Method and Description | 
|---|---|
| <U> PTable<K,Pair<Collection<V>,Collection<U>>> | PTable. cogroup(PTable<K,U> other)Co-group operation with the given table. | 
| <U> PTable<K,Pair<V,U>> | PTable. join(PTable<K,U> other)Perform an inner join on this table and the one passed in as an argument on
 their common keys. | 
| Modifier and Type | Method and Description | 
|---|---|
| int | Pair. compareTo(Pair<K,V> o) | 
| Modifier and Type | Method and Description | 
|---|---|
| <K,V> PTable<K,V> | Pipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <K,V> PTable<K,V> | Pipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| PTable<K,V> | PTable. filter(FilterFn<Pair<K,V>> filterFn)Apply the given filter function to this instance and return the resulting
  PTable. | 
| PTable<K,V> | PTable. filter(String name,
      FilterFn<Pair<K,V>> filterFn)Apply the given filter function to this instance and return the resulting
  PTable. | 
| <K,V> PTable<K,V> | PCollection. parallelDo(DoFn<S,Pair<K,V>> doFn,
          PTableType<K,V> type)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| <K,V> PTable<K,V> | PCollection. parallelDo(String name,
          DoFn<S,Pair<K,V>> doFn,
          PTableType<K,V> type)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| <K,V> PTable<K,V> | PCollection. parallelDo(String name,
          DoFn<S,Pair<K,V>> doFn,
          PTableType<K,V> type,
          ParallelDoOptions options)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | BloomFilterFn. cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) | 
| void | BloomFilterFn. process(S input,
       Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,V> Extractor<Pair<K,V>> | Extractors. xpair(TokenizerFactory scannerFactory,
     Extractor<K> one,
     Extractor<V> two)Returns an Extractor for pairs of the given types that uses the given  TokenizerFactoryfor parsing the sub-fields. | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,V> PTable<K,V> | Parse. parseTable(String groupName,
          PCollection<String> input,
          Extractor<Pair<K,V>> extractor)Parses the lines of the input  PCollection<String>and returns aPTable<K, V>using
 the givenExtractor<Pair<K, V>>. | 
| static <K,V> PTable<K,V> | Parse. parseTable(String groupName,
          PCollection<String> input,
          PTypeFamily ptf,
          Extractor<Pair<K,V>> extractor)Parses the lines of the input  PCollection<String>and returns aPTable<K, V>using
 the givenExtractor<Pair<K, V>>that uses the givenPTypeFamily. | 
| Modifier and Type | Method and Description | 
|---|---|
| Pair<S,T> | PairMapFn. map(Pair<K,V> input) | 
| Pair<V2,V1> | SwapFn. map(Pair<V1,V2> input) | 
| Pair<K,V> | SPairFunction. map(T input) | 
| Pair<K,V> | ExtractKeyFn. map(V input) | 
| Modifier and Type | Method and Description | 
|---|---|
| static <V1,V2> Aggregator<Pair<V1,V2>> | Aggregators. pairAggregator(Aggregator<V1> a1,
              Aggregator<V2> a2)Apply separate aggregators to each component of a  Pair. | 
| static <V1,V2> PType<Pair<V2,V1>> | SwapFn. ptype(PType<Pair<V1,V2>> pt) | 
| Modifier and Type | Method and Description | 
|---|---|
| R | SFunction2. map(Pair<K,V> input) | 
| Pair<S,T> | PairMapFn. map(Pair<K,V> input) | 
| Pair<V2,V1> | SwapFn. map(Pair<V1,V2> input) | 
| void | SFlatMapFunction2. process(Pair<K,V> input,
       Emitter<R> emitter) | 
| Modifier and Type | Method and Description | 
|---|---|
| void | PairMapFn. cleanup(Emitter<Pair<S,T>> emitter) | 
| void | SPairFlatMapFunction. process(T input,
       Emitter<Pair<K,V>> emitter) | 
| static <V1,V2> PType<Pair<V2,V1>> | SwapFn. ptype(PType<Pair<V1,V2>> pt) | 
| Modifier and Type | Method and Description | 
|---|---|
| <K,V> PTable<K,V> | DistributedPipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype) | 
| <K,V> PTable<K,V> | DistributedPipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options) | 
| Modifier and Type | Method and Description | 
|---|---|
| <U> PTable<K,Pair<Collection<V>,Collection<U>>> | PTableBase. cogroup(PTable<K,U> other) | 
| PType<Pair<K,V>> | EmptyPTable. getPType() | 
| PType<Pair<K,V>> | BaseUnionTable. getPType() | 
| PType<Pair<K,V>> | BaseInputTable. getPType() | 
| PType<Pair<K,Iterable<V>>> | BaseGroupedTable. getPType() | 
| PType<Pair<K,V>> | BaseDoTable. getPType() | 
| <U> PTable<K,Pair<V,U>> | PTableBase. join(PTable<K,U> other) | 
| Modifier and Type | Method and Description | 
|---|---|
| <S,K,V> BaseDoTable<K,V> | PCollectionFactory. createDoTable(String name,
             PCollectionImpl<S> chainingCollection,
             CombineFn<K,V> combineFn,
             DoFn<S,Pair<K,V>> fn,
             PTableType<K,V> type) | 
| <S,K,V> BaseDoTable<K,V> | PCollectionFactory. createDoTable(String name,
             PCollectionImpl<S> chainingCollection,
             DoFn<S,Pair<K,V>> fn,
             PTableType<K,V> type,
             ParallelDoOptions options) | 
| PTable<K,V> | PTableBase. filter(FilterFn<Pair<K,V>> filterFn) | 
| PTable<K,V> | PTableBase. filter(String name,
      FilterFn<Pair<K,V>> filterFn) | 
| <K,V> PTable<K,V> | PCollectionImpl. parallelDo(DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type) | 
| <K,V> PTable<K,V> | PCollectionImpl. parallelDo(String name,
          DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type) | 
| <K,V> PTable<K,V> | PCollectionImpl. parallelDo(String name,
          DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type,
          ParallelDoOptions options) | 
| Modifier and Type | Method and Description | 
|---|---|
| <K,V> PTable<K,V> | MemPipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype) | 
| <K,V> PTable<K,V> | MemPipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options) | 
| static <S,T> PTable<S,T> | MemPipeline. tableOf(Iterable<Pair<S,T>> pairs) | 
| static <S,T> PTable<S,T> | MemPipeline. typedTableOf(PTableType<S,T> ptype,
            Iterable<Pair<S,T>> pairs) | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,V> com.google.common.base.Function<Pair<K,V>,scala.Tuple2<K,V>> | GuavaUtils. pair2tupleFunc() | 
| static <K,V> com.google.common.base.Function<scala.Tuple2<K,V>,Pair<K,V>> | GuavaUtils. tuple2PairFunc() | 
| Modifier and Type | Method and Description | 
|---|---|
| <K,V> PTable<K,V> | SparkPipeline. create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options) | 
| Modifier and Type | Method and Description | 
|---|---|
| PType<Pair<K,V>> | CreatedTable. getPType() | 
| Modifier and Type | Method and Description | 
|---|---|
| <S,K,V> BaseDoTable<K,V> | SparkCollectFactory. createDoTable(String name,
             PCollectionImpl<S> parent,
             CombineFn<K,V> combineFn,
             DoFn<S,Pair<K,V>> fn,
             PTableType<K,V> type) | 
| <S,K,V> BaseDoTable<K,V> | SparkCollectFactory. createDoTable(String name,
             PCollectionImpl<S> parent,
             DoFn<S,Pair<K,V>> fn,
             PTableType<K,V> type,
             ParallelDoOptions options) | 
| Constructor and Description | 
|---|
| CreatedTable(SparkPipeline pipeline,
            Iterable<Pair<K,V>> contents,
            PTableType<K,V> ptype,
            CreateOptions options) | 
| Modifier and Type | Method and Description | 
|---|---|
| Pair<K,Iterable<V>> | ReduceInputFunction. call(scala.Tuple2<ByteArray,Iterable<byte[]>> kv) | 
| Modifier and Type | Method and Description | 
|---|---|
| scala.Tuple2<S,Iterable<T>> | PairMapIterableFunction. call(Pair<K,List<V>> input) | 
| scala.Tuple2<K,V> | Tuple2MapFunction. call(Pair<K,V> p) | 
| scala.Tuple2<IntByteArray,byte[]> | PartitionedMapOutputFunction. call(Pair<K,V> p) | 
| scala.Tuple2<ByteArray,byte[]> | MapOutputFunction. call(Pair<K,V> p) | 
| Modifier and Type | Method and Description | 
|---|---|
| Iterable<scala.Tuple2<K,V>> | CrunchPairTuple2. call(Iterator<Pair<K,V>> iterator) | 
| Constructor and Description | 
|---|
| FlatMapPairDoFn(DoFn<Pair<K,V>,T> fn,
               SparkRuntimeContext ctxt) | 
| PairFlatMapDoFn(DoFn<T,Pair<K,V>> fn,
               SparkRuntimeContext ctxt) | 
| PairMapFunction(MapFn<Pair<K,V>,S> fn,
               SparkRuntimeContext ctxt) | 
| PairMapIterableFunction(MapFn<Pair<K,List<V>>,Pair<S,Iterable<T>>> fn,
                       SparkRuntimeContext runtimeContext) | 
| PairMapIterableFunction(MapFn<Pair<K,List<V>>,Pair<S,Iterable<T>>> fn,
                       SparkRuntimeContext runtimeContext) | 
| Tuple2MapFunction(MapFn<Pair<K,V>,Pair<K,V>> fn,
                 SparkRuntimeContext ctxt) | 
| Tuple2MapFunction(MapFn<Pair<K,V>,Pair<K,V>> fn,
                 SparkRuntimeContext ctxt) | 
| Modifier and Type | Method and Description | 
|---|---|
| ReadableData<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | KafkaSource. asReadable() | 
| PType<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | KafkaSource. getType() | 
| Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | KafkaSource. inputConf(String key,
         String value) | 
| Iterable<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | KafkaSource. read(org.apache.hadoop.conf.Configuration conf) | 
| Constructor and Description | 
|---|
| KafkaSource(Properties kafkaConnectionProperties,
           Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets)Constructs a Kafka source that will read data from the Kafka cluster identified by the  kafkaConnectionPropertiesand from the specific topics and partitions identified in theoffsets | 
| Modifier and Type | Method and Description | 
|---|---|
| static Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> | KafkaInputFormat. getOffsets(org.apache.hadoop.conf.Configuration configuration)Reads the  configurationto determine which topics, partitions, and offsets should be used for reading data. | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | KafkaInputFormat. writeOffsetsToBundle(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets,
                    FormatBundle bundle)Writes the start and end offsets for the provided topic partitions to the  bundle. | 
| static void | KafkaInputFormat. writeOffsetsToConfiguration(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets,
                           org.apache.hadoop.conf.Configuration config)Writes the start and end offsets for the provided topic partitions to the  config. | 
| Modifier and Type | Method and Description | 
|---|---|
| default <U> LTable<K,Pair<Collection<V>,Collection<U>>> | LTable. cogroup(LTable<K,U> other)Cogroup this table with another  LTablewith the same key type, collecting the set of values from
 each side. | 
| default <U> LTable<K,Pair<V,U>> | LTable. join(LTable<K,U> other)Inner join this table to another  LTablewhich has the same key type using a reduce-side join | 
| default <U> LTable<K,Pair<V,U>> | LTable. join(LTable<K,U> other,
    JoinType joinType)Join this table to another  LTablewhich has the same key type using the provideJoinTypeand
 theDefaultJoinStrategy(reduce-side join). | 
| default <U> LTable<K,Pair<V,U>> | LTable. join(LTable<K,U> other,
    JoinType joinType,
    JoinStrategy<K,V,U> joinStrategy)Join this table to another  LTablewhich has the same key type using the providedJoinTypeandJoinStrategy | 
| Modifier and Type | Method and Description | 
|---|---|
| default LTable<K,V> | LTable. filter(SPredicate<Pair<K,V>> predicate)Filter the rows of the table using the supplied predicate. | 
| default <K,V> LTable<K,V> | LCollection. filterMap(SFunction<S,Optional<Pair<K,V>>> fn,
         PTableType<K,V> pType)Combination of a filter and map operation by using a function with  Optionalreturn type. | 
| default <K,V> LTable<K,V> | LCollection. flatMap(SFunction<S,java.util.stream.Stream<Pair<K,V>>> fn,
       PTableType<K,V> pType)Map each element to zero or more output elements using the provided stream-returning function to yield an
  LTable | 
| default LTable<K,V> | LTable. incrementIf(Enum<?> counter,
           SPredicate<Pair<K,V>> condition)Increment a counter for every element satisfying the conditional predicate supplied. | 
| default LTable<K,V> | LTable. incrementIf(String counterGroup,
           String counterName,
           SPredicate<Pair<K,V>> condition)Increment a counter for every element satisfying the conditional predicate supplied. | 
| default <K,V> LTable<K,V> | LCollection. map(SFunction<S,Pair<K,V>> fn,
   PTableType<K,V> pType)Map the elements of this collection 1-1 through the supplied function to yield an  LTable | 
| default <K,V> LTable<K,V> | LCollection. parallelDo(DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> pType) | 
| default <K,V> LTable<K,V> | LCollection. parallelDo(LDoFn<S,Pair<K,V>> fn,
          PTableType<K,V> pType)Transform this LCollection using a Lambda-friendly  LDoFn. | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,V> Pair<K,V> | PTables. getDetachedValue(PTableType<K,V> tableType,
                Pair<K,V> value)Create a detached value for a table  Pair. | 
| static <K,V> Pair<K,Iterable<V>> | PTables. getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
                       Pair<K,Iterable<V>> value)Created a detached value for a  PGroupedTablevalue. | 
| static <T,U> Pair<PCollection<T>,PCollection<U>> | Channels. split(PCollection<Pair<T,U>> pCollection)Splits a  PCollectionof anyPairof objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels. | 
| static <T,U> Pair<PCollection<T>,PCollection<U>> | Channels. split(PCollection<Pair<T,U>> pCollection,
     PType<T> firstPType,
     PType<U> secondPType)Splits a  PCollectionof anyPairof objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels. | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> | Cogroup. cogroup(int numReducers,
       PTable<K,U> left,
       PTable<K,V> right)Co-groups the two  PTablearguments with a user-specified degree of parallelism (a.k.a, number of
 reducers.) | 
| static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> | Cogroup. cogroup(PTable<K,U> left,
       PTable<K,V> right)Co-groups the two  PTablearguments. | 
| static <U,V> PCollection<Pair<U,V>> | Cartesian. cross(PCollection<U> left,
     PCollection<V> right)Performs a full cross join on the specified  PCollections (using the
 same strategy as Pig's CROSS operator). | 
| static <U,V> PCollection<Pair<U,V>> | Cartesian. cross(PCollection<U> left,
     PCollection<V> right,
     int parallelism)Performs a full cross join on the specified  PCollections (using the
 same strategy as Pig's CROSS operator). | 
| static <K1,K2,U,V> | Cartesian. cross(PTable<K1,U> left,
     PTable<K2,V> right)Performs a full cross join on the specified  PTables (using the same
 strategy as Pig's CROSS operator). | 
| static <K1,K2,U,V> | Cartesian. cross(PTable<K1,U> left,
     PTable<K2,V> right)Performs a full cross join on the specified  PTables (using the same
 strategy as Pig's CROSS operator). | 
| static <K1,K2,U,V> | Cartesian. cross(PTable<K1,U> left,
     PTable<K2,V> right,
     int parallelism)Performs a full cross join on the specified  PTables (using the same
 strategy as Pig's CROSS operator). | 
| static <K1,K2,U,V> | Cartesian. cross(PTable<K1,U> left,
     PTable<K2,V> right,
     int parallelism)Performs a full cross join on the specified  PTables (using the same
 strategy as Pig's CROSS operator). | 
| static <K,V,T> DoFn<Pair<K,Iterable<V>>,T> | DoFns. detach(DoFn<Pair<K,Iterable<V>>,T> reduceFn,
      PType<V> valueType)"Reduce" DoFn wrapper which detaches the values in the iterable, preventing the unexpected behaviour related to
 object reuse often observed when using Avro. | 
| static <K,U,V> PTable<K,Pair<U,V>> | Join. fullJoin(PTable<K,U> left,
        PTable<K,V> right)Performs a full outer join on the specified  PTables. | 
| static <T,N extends Number> | Sample. groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
                              int[] sampleSizes)The most general purpose of the weighted reservoir sampling patterns that allows us to choose
 a random sample of elements for each of N input groups. | 
| static <T,N extends Number> | Sample. groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
                              int[] sampleSizes,
                              Long seed)Same as the other groupedWeightedReservoirSample method, but include a seed for testing
 purposes. | 
| static <K,U,V> PTable<K,Pair<U,V>> | Join. innerJoin(PTable<K,U> left,
         PTable<K,V> right)Performs an inner join on the specified  PTables. | 
| static <K,U,V> PTable<K,Pair<U,V>> | Join. join(PTable<K,U> left,
    PTable<K,V> right)Performs an inner join on the specified  PTables. | 
| static <K,U,V> PTable<K,Pair<U,V>> | Join. leftJoin(PTable<K,U> left,
        PTable<K,V> right)Performs a left outer join on the specified  PTables. | 
| static <K,U,V> PTable<K,Pair<U,V>> | Join. rightJoin(PTable<K,U> left,
         PTable<K,V> right)Performs a right outer join on the specified  PTables. | 
| static <U,V> PCollection<Pair<U,V>> | Sort. sortPairs(PCollection<Pair<U,V>> collection,
         Sort.ColumnOrder... columnOrders)Sorts the  PCollectionofPairs using the specified column
 ordering. | 
| static <X,Y> PTable<X,Collection<Pair<Long,Y>>> | TopList. topNYbyX(PTable<X,Y> input,
        int n)Create a top-list of elements in the provided PTable, categorised by the key of the input table and using the count
 of the value part of the input table. | 
| Modifier and Type | Method and Description | 
|---|---|
| int | Aggregate.PairValueComparator. compare(Pair<K,V> left,
       Pair<K,V> right) | 
| int | Aggregate.PairValueComparator. compare(Pair<K,V> left,
       Pair<K,V> right) | 
| static <K,V> Pair<K,V> | PTables. getDetachedValue(PTableType<K,V> tableType,
                Pair<K,V> value)Create a detached value for a table  Pair. | 
| static <K,V> Pair<K,Iterable<V>> | PTables. getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
                       Pair<K,Iterable<V>> value)Created a detached value for a  PGroupedTablevalue. | 
| void | Aggregate.TopKCombineFn. process(Pair<Integer,Iterable<Pair<K,V>>> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKFn. process(Pair<K,V> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| Modifier and Type | Method and Description | 
|---|---|
| static <K,V> PTable<K,V> | PTables. asPTable(PCollection<Pair<K,V>> pcollect)Convert the given  PCollection<Pair<K, V>>to aPTable<K, V>. | 
| void | Aggregate.TopKFn. cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKFn. cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| static <K,V,T> DoFn<Pair<K,Iterable<V>>,T> | DoFns. detach(DoFn<Pair<K,Iterable<V>>,T> reduceFn,
      PType<V> valueType)"Reduce" DoFn wrapper which detaches the values in the iterable, preventing the unexpected behaviour related to
 object reuse often observed when using Avro. | 
| static <T,N extends Number> | Sample. groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
                              int[] sampleSizes)The most general purpose of the weighted reservoir sampling patterns that allows us to choose
 a random sample of elements for each of N input groups. | 
| static <T,N extends Number> | Sample. groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
                              int[] sampleSizes,
                              Long seed)Same as the other groupedWeightedReservoirSample method, but include a seed for testing
 purposes. | 
| void | Aggregate.TopKCombineFn. process(Pair<Integer,Iterable<Pair<K,V>>> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKCombineFn. process(Pair<Integer,Iterable<Pair<K,V>>> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKCombineFn. process(Pair<Integer,Iterable<Pair<K,V>>> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKFn. process(Pair<K,V> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| void | Aggregate.TopKFn. process(Pair<K,V> input,
       Emitter<Pair<Integer,Pair<K,V>>> emitter) | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>, using
 the given number of reducers. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>, using
 the given number of reducers. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>, using
 the given number of reducers. | 
| static <K,V1,V2,U,V> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
            PTableType<U,V> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPTable<U, V>, using
 the given number of reducers. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>, using
 the given number of reducers. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>, using
 the given number of reducers. | 
| static <K,V1,V2,T> | SecondarySort. sortAndApply(PTable<K,Pair<V1,V2>> input,
            DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
            PType<T> ptype,
            int numReducers)Perform a secondary sort on the given  PTableinstance and then apply aDoFnto the resulting sorted data to yield an outputPCollection<T>, using
 the given number of reducers. | 
| static <U,V> PCollection<Pair<U,V>> | Sort. sortPairs(PCollection<Pair<U,V>> collection,
         Sort.ColumnOrder... columnOrders)Sorts the  PCollectionofPairs using the specified column
 ordering. | 
| static <T,U> Pair<PCollection<T>,PCollection<U>> | Channels. split(PCollection<Pair<T,U>> pCollection)Splits a  PCollectionof anyPairof objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels. | 
| static <T,U> Pair<PCollection<T>,PCollection<U>> | Channels. split(PCollection<Pair<T,U>> pCollection,
     PType<T> firstPType,
     PType<U> secondPType)Splits a  PCollectionof anyPairof objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels. | 
| static <T,N extends Number> | Sample. weightedReservoirSample(PCollection<Pair<T,N>> input,
                       int sampleSize)Selects a weighted sample of the elements of the given  PCollection, where the second term in
 the inputPairis a numerical weight. | 
| static <T,N extends Number> | Sample. weightedReservoirSample(PCollection<Pair<T,N>> input,
                       int sampleSize,
                       Long seed)The weighted reservoir sampling function with the seed term exposed for testing purposes. | 
| Constructor and Description | 
|---|
| Result(long count,
      Iterable<Pair<Double,V>> quantiles) | 
| TopKCombineFn(int limit,
             boolean maximize,
             PType<Pair<K,V>> pairType) | 
| TopKFn(int limit,
      boolean ascending,
      PType<Pair<K,V>> pairType) | 
| Modifier and Type | Method and Description | 
|---|---|
| PTable<K,Pair<U,V>> | DefaultJoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinFn<K,U,V> joinFn)Perform a default join on the given  PTableinstances using a user-specifiedJoinFn. | 
| PTable<K,Pair<U,V>> | ShardedJoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinType joinType) | 
| PTable<K,Pair<U,V>> | MapsideJoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinType joinType) | 
| PTable<K,Pair<U,V>> | JoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinType joinType)Join two tables with the given join type. | 
| PTable<K,Pair<U,V>> | DefaultJoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinType joinType) | 
| PTable<K,Pair<U,V>> | BloomFilterJoinStrategy. join(PTable<K,U> left,
    PTable<K,V> right,
    JoinType joinType) | 
| Modifier and Type | Method and Description | 
|---|---|
| void | JoinFn. process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
       Emitter<Pair<K,Pair<U,V>>> emitter)Split up the input record to make coding a bit more manageable. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | LeftOuterJoinFn. cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)Called during the cleanup of the MapReduce job this  DoFnis
 associated with. | 
| void | LeftOuterJoinFn. cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)Called during the cleanup of the MapReduce job this  DoFnis
 associated with. | 
| void | FullOuterJoinFn. cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)Called during the cleanup of the MapReduce job this  DoFnis
 associated with. | 
| void | FullOuterJoinFn. cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)Called during the cleanup of the MapReduce job this  DoFnis
 associated with. | 
| void | RightOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | RightOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | RightOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | LeftOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | LeftOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | LeftOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| abstract void | JoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| abstract void | JoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| abstract void | JoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | InnerJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter) | 
| void | InnerJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter) | 
| void | InnerJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter) | 
| void | FullOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | FullOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| void | FullOuterJoinFn. join(K key,
    int id,
    Iterable<Pair<U,V>> pairs,
    Emitter<Pair<K,Pair<U,V>>> emitter)Performs the actual joining. | 
| static <K,U,V,T> PCollection<T> | OneToManyJoin. oneToManyJoin(PTable<K,U> left,
             PTable<K,V> right,
             DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
             PType<T> ptype)Performs a join on two tables, where the left table only contains a single
 value per key. | 
| static <K,U,V,T> PCollection<T> | OneToManyJoin. oneToManyJoin(PTable<K,U> left,
             PTable<K,V> right,
             DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
             PType<T> ptype,
             int numReducers)Supports a user-specified number of reducers for the one-to-many join. | 
| void | JoinFn. process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
       Emitter<Pair<K,Pair<U,V>>> emitter)Split up the input record to make coding a bit more manageable. | 
| void | JoinFn. process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
       Emitter<Pair<K,Pair<U,V>>> emitter)Split up the input record to make coding a bit more manageable. | 
| void | JoinFn. process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
       Emitter<Pair<K,Pair<U,V>>> emitter)Split up the input record to make coding a bit more manageable. | 
| void | JoinFn. process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
       Emitter<Pair<K,Pair<U,V>>> emitter)Split up the input record to make coding a bit more manageable. | 
| Modifier and Type | Field and Description | 
|---|---|
| static TupleFactory<Pair> | TupleFactory. PAIR | 
| Modifier and Type | Method and Description | 
|---|---|
| Pair<K,Iterable<V>> | PGroupedTableType.PairIterableMapFn. map(Pair<Object,Iterable<Object>> input) | 
| Modifier and Type | Method and Description | 
|---|---|
| ReadableSourceTarget<Pair<K,Iterable<V>>> | PGroupedTableType. getDefaultFileSource(org.apache.hadoop.fs.Path path) | 
| <V1,V2> PType<Pair<V1,V2>> | PTypeFamily. pairs(PType<V1> p1,
     PType<V2> p2) | 
| Modifier and Type | Method and Description | 
|---|---|
| Pair<K,Iterable<V>> | PGroupedTableType.PairIterableMapFn. map(Pair<Object,Iterable<Object>> input) | 
| Modifier and Type | Method and Description | 
|---|---|
| <V1,V2> PType<Pair<V1,V2>> | AvroTypeFamily. pairs(PType<V1> p1,
     PType<V2> p2) | 
| static <V1,V2> AvroType<Pair<V1,V2>> | Avros. pairs(PType<V1> p1,
     PType<V2> p2) | 
| Modifier and Type | Method and Description | 
|---|---|
| <V1,V2> PType<Pair<V1,V2>> | WritableTypeFamily. pairs(PType<V1> p1,
     PType<V2> p2) | 
| static <V1,V2> WritableType<Pair<V1,V2>,TupleWritable> | Writables. pairs(PType<V1> p1,
     PType<V2> p2) | 
| Modifier and Type | Method and Description | 
|---|---|
| Iterator<Pair<S,T>> | Tuples.PairIterable. iterator() | 
Copyright © 2017 The Apache Software Foundation. All rights reserved.