public class MapsideJoinStrategy<K,U,V> extends Object implements JoinStrategy<K,U,V>
PTable
s.
A map side join is an optimized join which doesn't use a reducer; instead, one side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.
Instances of this class should be instantiated via thecreate()
or create(boolean)
factory
methods, or optionally via the deprecated public constructor for backwards compatibility with
older versions of Crunch where the right-side table was loaded into memory. The public constructor will be removed
in a future release.Constructor and Description |
---|
MapsideJoinStrategy()
Deprecated.
Use the
create() factory method instead |
MapsideJoinStrategy(boolean materialize)
Deprecated.
Use the
create(boolean) factory method instead |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> MapsideJoinStrategy<K,U,V> |
create()
Create a new
MapsideJoinStrategy instance that will load its left-side table into memory,
and will materialize the contents of the left-side table to disk before running the in-memory join. |
static <K,U,V> MapsideJoinStrategy<K,U,V> |
create(boolean materialize)
Create a new
MapsideJoinStrategy instance that will load its left-side table into memory. |
PTable<K,Pair<U,V>> |
join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type.
|
@Deprecated public MapsideJoinStrategy()
create()
factory method insteadMapsideJoinStratey
, materializing the right-side
join table to disk before the join is performed.@Deprecated public MapsideJoinStrategy(boolean materialize)
create(boolean)
factory method insteadMapsideJoinStrategy
. If the materialize
argument is true, then the right-side join PTable
will be materialized to disk
before the in-memory join is performed. If it is false, then Crunch can optionally read
and process the data from the right-side table without having to run a job to materialize
the data to disk first.materialize
- Whether or not to materialize the right-side table before the joinpublic static <K,U,V> MapsideJoinStrategy<K,U,V> create()
MapsideJoinStrategy
instance that will load its left-side table into memory,
and will materialize the contents of the left-side table to disk before running the in-memory join.
The smaller of the two tables to be joined should be provided as the left-side table of the created join
strategy instance.public static <K,U,V> MapsideJoinStrategy<K,U,V> create(boolean materialize)
MapsideJoinStrategy
instance that will load its left-side table into memory.
If the materialize
parameter is true, then the left-side PTable
will be materialized to disk
before the in-memory join is performed. If it is false, then Crunch can optionally read and process the data
from the left-side table without having to run a job to materialize the data to disk first.materialize
- Whether or not to materialize the left-side table before the joinpublic PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategy
join
in interface JoinStrategy<K,U,V>
left
- left table to be joinedright
- right table to be joinedjoinType
- type of join to performCopyright © 2016 The Apache Software Foundation. All rights reserved.