Hive Joins

Join Syntax

Hive supports the following syntax for joining tables:

join_table:
    table_reference [INNER] JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)

table_reference:
    table_factor
  | join_table

table_factor:
    tbl_name [alias]
  | table_subquery alias
  | ( table_references )

join_condition:
    ON expression

See Select Syntax for the context of this join syntax.

Version 0.13.0+: Implicit join notation

Implicit join notation is supported starting with Hive 0.13.0 (see HIVE-5558 ). This allows the FROM clause to join a comma-separated list of tables, omitting the JOIN keyword. For example:

SELECT *
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';

Version 0.13.0+: Unqualified column references

Unqualified column references are supported in join conditions, starting with Hive 0.13.0 (see HIVE-6393 ). Hive attempts to resolve these against the inputs to a Join. If an unqualified column reference resolves to more than one table, Hive will flag it as an ambiguous reference.

For example:

CREATE TABLE a (k1 string, v1 string);
CREATE TABLE b (k2 string, v2 string);

SELECT k1, v1, k2, v2
FROM a JOIN b ON k1 = k2; 

Version 2.2.0+: Complex expressions in ON clause

Complex expressions in ON clause are supported, starting with Hive 2.2.0 (see HIVE-15211 , HIVE-15251 ). Prior to that, Hive did not support join conditions that are not equality conditions.

In particular, syntax for join conditions was restricted as follows:

join_condition:
ON equality_expression ( AND equality_expression )*
equality_expression:
expression = expression

Examples

Some salient points to consider when writing join queries are as follows:

MapJoin Restrictions

Join Optimization

Predicate Pushdown in Outer Joins

See Hive Outer Join Behavior for information about predicate pushdown in outer joins.

Enhancements in Hive Version 0.11

See Join Optimization for information about enhancements to join optimization introduced in Hive version 0.11.0. The use of hints is de-emphasized in the enhanced optimizations (HIVE-3784 and related JIRAs).

首页