This project has retired. For details please refer to its Attic page.
MapsideJoinStrategy (Apache Crunch 0.10.0 API)

org.apache.crunch.lib.join
Class MapsideJoinStrategy<K,U,V>

java.lang.Object
  extended by org.apache.crunch.lib.join.MapsideJoinStrategy<K,U,V>
All Implemented Interfaces:
Serializable, JoinStrategy<K,U,V>

public class MapsideJoinStrategy<K,U,V>
extends Object
implements JoinStrategy<K,U,V>

Utility for doing map side joins on a common key between two PTables.

A map side join is an optimized join which doesn't use a reducer; instead, one side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.

Instances of this class should be instantiated via the create() or create(boolean) factory methods, or optionally via the deprecated public constructor for backwards compatibility with older versions of Crunch where the right-side table was loaded into memory. The public constructor will be removed in a future release.

See Also:
Serialized Form

Constructor Summary
MapsideJoinStrategy()
          Deprecated. Use the create() factory method instead
MapsideJoinStrategy(boolean materialize)
          Deprecated. Use the create(boolean) factory method instead
 
Method Summary
static
<K,U,V> MapsideJoinStrategy<K,U,V>
create()
          Create a new MapsideJoinStrategy instance that will load its left-side table into memory, and will materialize the contents of the left-side table to disk before running the in-memory join.
static
<K,U,V> MapsideJoinStrategy<K,U,V>
create(boolean materialize)
          Create a new MapsideJoinStrategy instance that will load its left-side table into memory.
 PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
          Join two tables with the given join type.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapsideJoinStrategy

@Deprecated
public MapsideJoinStrategy()
Deprecated. Use the create() factory method instead

Constructs a new instance of the MapsideJoinStratey, materializing the right-side join table to disk before the join is performed.


MapsideJoinStrategy

@Deprecated
public MapsideJoinStrategy(boolean materialize)
Deprecated. Use the create(boolean) factory method instead

Constructs a new instance of the MapsideJoinStrategy. If the materialize argument is true, then the right-side join PTable will be materialized to disk before the in-memory join is performed. If it is false, then Crunch can optionally read and process the data from the right-side table without having to run a job to materialize the data to disk first.

Parameters:
materialize - Whether or not to materialize the right-side table before the join
Method Detail

create

public static <K,U,V> MapsideJoinStrategy<K,U,V> create()
Create a new MapsideJoinStrategy instance that will load its left-side table into memory, and will materialize the contents of the left-side table to disk before running the in-memory join.

The smaller of the two tables to be joined should be provided as the left-side table of the created join strategy instance.


create

public static <K,U,V> MapsideJoinStrategy<K,U,V> create(boolean materialize)
Create a new MapsideJoinStrategy instance that will load its left-side table into memory.

If the materialize parameter is true, then the left-side PTable will be materialized to disk before the in-memory join is performed. If it is false, then Crunch can optionally read and process the data from the left-side table without having to run a job to materialize the data to disk first.

Parameters:
materialize - Whether or not to materialize the left-side table before the join

join

public PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                PTable<K,V> right,
                                JoinType joinType)
Description copied from interface: JoinStrategy
Join two tables with the given join type.

Specified by:
join in interface JoinStrategy<K,U,V>
Parameters:
left - left table to be joined
right - right table to be joined
joinType - type of join to perform
Returns:
joined tables


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.