LanguageManual Explain

EXPLAIN Syntax

Hive 提供了一个EXPLAIN命令,该命令显示了查询的执行计划。该语句的语法如下:

EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

HIVE 0.14.0 通过HIVE-5961支持AUTHORIZATION。 Hive 2.3.0 通过HIVE-11394支持VECTORIZATION。 Hive 3.2.0 通过HIVE-17683支持LOCKS

AST 已从HIVE-13533中的 EXPLAIN EXTENDED 中删除,并在HIVE-15932中作为单独的命令恢复。

EXPLAIN语句中使用EXTENDED会产生有关计划中运算符的更多信息。这通常是物理信息,例如文件名。

Hive 查询将转换为阶段序列(它更是有向非循环图)。这些阶段可以是 map/reduce 阶段,或者甚至可以是执行元存储或文件系统操作(如移动和重命名)的阶段。说明输出分为三个部分:

  • 查询的抽象语法树

  • 计划不同阶段之间的依赖性

  • 每个阶段的描述

这些阶段本身的描述显示了一系列操作符,以及与操作符相关联的元数据。元数据可能包含诸如 FilterOperator 的过滤器表达式或 SelectOperator 的选择表达式或 FileSinkOperator 的输出文件名之类的内容。

Example

例如,请考虑以下EXPLAIN查询:

EXPLAIN
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

该语句的输出包含以下部分:

  • 依赖图
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2

这表明阶段 1 是根阶段,阶段 2 完成后执行阶段 2,阶段 2 完成后执行阶段 0.

  • 每个阶段的计划
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        src
            Reduce Output Operator
              key expressions:
                    expr: key
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: rand()
                    type: double
              tag: -1
              value expressions:
                    expr: substr(value, 4)
                    type: string
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(UDFToDouble(VALUE.0))
          keys:
                expr: KEY.0
                type: string
          mode: partial1
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                name: binary_table

  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        /tmp/hive-zshao/67494501/106593589.10001
          Reduce Output Operator
            key expressions:
                  expr: 0
                  type: string
            sort order: +
            Map-reduce partition columns:
                  expr: 0
                  type: string
            tag: -1
            value expressions:
                  expr: 1
                  type: double
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE.0)
          keys:
                expr: KEY.0
                type: string
          mode: final
          Select Operator
            expressions:
                  expr: 0
                  type: string
                  expr: 1
                  type: double
            Select Operator
              expressions:
                    expr: UDFToInteger(0)
                    type: int
                    expr: 1
                    type: double
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1

  Stage: Stage-0
    Move Operator
      tables:
            replace: true
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                name: dest_g1

在此示例中,有 2 个 Map/缩减阶段(阶段 1 和阶段 2)和 1 个与文件系统相关的阶段(阶段 0)。阶段 0 基本上将结果从临时目录移动到与表 dest_g1 对应的目录。
Sort order表示键表达式中用于排序的列数。每个“ +”代表一列以升序排序,每个“ -”代表一列以降序排序。

map/reduce 阶段本身包含 2 个部分:

  • 从表别名到 Map 运算符树的 Map–此 Map 告诉 Map 器要调用哪个运算符树,以便处理特定表中的行或上一个 Map/归约阶段的结果。在以上示例的 Stage-1 中,src 表中的行由植根于 Reduce Output 运算符的运算符树处理。类似地,在阶段 2 中,阶段 1 的结果行由植根于另一个 Reduce Output 运算符的另一个运算符树处理。这些 Reduce 输出运算符中的每一个都根据元数据中显示的标准将数据分区到 Reducer。

  • 还原运算符树–这是运算符树,用于处理 map/reduce 作业的缩减程序上的所有行。例如,在 Stage-1 中,Reducer Operator Tree 正在执行部分聚合,而 Stage-2 中的 Reducer Operator Tree 从 Stage-1 中计算出的部分聚合中计算最终聚合。

CBO 条款

CBO 子句输出由 Calcite 优化器生成的计划。它可以选择包括有关使用 Calcite 默认成本模型和用于联接重新排序的成本模型的计划成本的信息。从 Hive 版本 4.0.0(HIVE-17503/HIVE-21184)开始。

语法:EXPLAIN [FORMATTED] CBO [COST | JOINCOST]

    • COST *选项打印使用 Calcite 默认成本模型计算的计划和成本。
    • JOINCOST *选项打印使用用于联接重新排序的成本模型计算的计划和成本。

例如,我们可以执行以下语句:

EXPLAIN CBO
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
  sr_store_sk AS ctr_store_sk,
  SUM(SR_FEE) AS ctr_total_return
  FROM store_returns, date_dim
  WHERE sr_returned_date_sk = d_date_sk
    AND d_year =2000
  GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  AND s_store_sk = ctr1.ctr_store_sk
  AND s_state = 'NM'
  AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100

该查询将被优化,并且 Hive 产生以下输出:

CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100])
  HiveProject(c_customer_id=[$1])
    HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available])
      HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available])
        HiveProject(c_customer_sk=[$0], c_customer_id=[$1])
          HiveFilter(condition=[IS NOT NULL($0)])
            HiveTableScan(table=[[default, customer]], table:alias=[customer])
        HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))])
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
                HiveProject(d_date_sk=[$0])
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
          HiveProject(s_store_sk=[$0])
            HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))])
              HiveTableScan(table=[[default, store]], table:alias=[store])
      HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0])
        HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)])
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))])
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
                HiveProject(d_date_sk=[$0])
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])

反过来,我们可以执行以下命令:

EXPLAIN CBO COST
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
  sr_store_sk AS ctr_store_sk,
  SUM(SR_FEE) AS ctr_total_return
  FROM store_returns, date_dim
  WHERE sr_returned_date_sk = d_date_sk
    AND d_year =2000
  GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  AND s_store_sk = ctr1.ctr_store_sk
  AND s_state = 'NM'
  AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100

它将产生一个类似的计划,但是每个运算符的成本将被嵌入到运算符 Descriptors 的旁边:

CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100]): rowcount = 100.0, cumulative cost = {2.395588892021712E26 rows, 1.197794434438787E26 cpu, 0.0 io}, id = 1683
  HiveProject(c_customer_id=[$1]): rowcount = 1.1977944344387866E26, cumulative cost = {2.395588892021712E26 rows, 1.197794434438787E26 cpu, 0.0 io}, id = 1681
    HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 1.1977944344387866E26, cumulative cost = {1.1977944575829254E26 rows, 4.160211553874922E10 cpu, 0.0 io}, id = 1679
      HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 2.3144135067474273E18, cumulative cost = {2.3144137967122499E18 rows, 1.921860676139634E10 cpu, 0.0 io}, id = 1663
        HiveProject(c_customer_sk=[$0], c_customer_id=[$1]): rowcount = 7.2E7, cumulative cost = {2.24E8 rows, 3.04000001E8 cpu, 0.0 io}, id = 1640
          HiveFilter(condition=[IS NOT NULL($0)]): rowcount = 7.2E7, cumulative cost = {1.52E8 rows, 1.60000001E8 cpu, 0.0 io}, id = 1638
            HiveTableScan(table=[[default, customer]], table:alias=[customer]): rowcount = 8.0E7, cumulative cost = {8.0E7 rows, 8.0000001E7 cpu, 0.0 io}, id = 1055
        HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 2.1429754692105807E11, cumulative cost = {2.897408225471977E11 rows, 1.891460676039634E10 cpu, 0.0 io}, id = 1661
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2]): rowcount = 6.210443022113779E9, cumulative cost = {7.544327346205959E10 rows, 1.891460312135634E10 cpu, 0.0 io}, id = 1685
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)]): rowcount = 6.210443022113779E9, cumulative cost = {6.92328304399458E10 rows, 2.8327405501500005E8 cpu, 0.0 io}, id = 1654
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 6.2104430221137794E10, cumulative cost = {6.2246082040067795E10 rows, 2.8327405501500005E8 cpu, 0.0 io}, id = 1652
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14]): rowcount = 4.198394835000001E7, cumulative cost = {1.4155904670000002E8 rows, 2.8311809440000004E8 cpu, 0.0 io}, id = 1645
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))]): rowcount = 4.198394835000001E7, cumulative cost = {9.957509835000001E7 rows, 1.15182301E8 cpu, 0.0 io}, id = 1643
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns]): rowcount = 5.759115E7, cumulative cost = {5.759115E7 rows, 5.7591151E7 cpu, 0.0 io}, id = 1040
                HiveProject(d_date_sk=[$0]): rowcount = 9861.615, cumulative cost = {92772.23000000001 rows, 155960.615 cpu, 0.0 io}, id = 1650
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))]): rowcount = 9861.615, cumulative cost = {82910.615 rows, 146099.0 cpu, 0.0 io}, id = 1648
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}, id = 1043
          HiveProject(s_store_sk=[$0]): rowcount = 230.04000000000002, cumulative cost = {2164.08 rows, 3639.04 cpu, 0.0 io}, id = 1659
            HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))]): rowcount = 230.04000000000002, cumulative cost = {1934.04 rows, 3409.0 cpu, 0.0 io}, id = 1657
              HiveTableScan(table=[[default, store]], table:alias=[store]): rowcount = 1704.0, cumulative cost = {1704.0 rows, 1705.0 cpu, 0.0 io}, id = 1050
      HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0]): rowcount = 6.900492246793088E8, cumulative cost = {8.537206083312463E10 rows, 2.2383508777352882E10 cpu, 0.0 io}, id = 1677
        HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)]): rowcount = 6.900492246793088E8, cumulative cost = {8.468201160844533E10 rows, 2.1003410327994267E10 cpu, 0.0 io}, id = 1675
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2]): rowcount = 6.900492246793088E9, cumulative cost = {8.381945007759619E10 rows, 2.1003410327994267E10 cpu, 0.0 io}, id = 1686
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)]): rowcount = 6.900492246793088E9, cumulative cost = {7.69189578308031E10 rows, 3.01933587615E8 cpu, 0.0 io}, id = 1673
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 6.900492246793088E10, cumulative cost = {6.915590405316087E10 rows, 3.01933587615E8 cpu, 0.0 io}, id = 1671
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14]): rowcount = 4.66488315E7, cumulative cost = {1.50888813E8 rows, 3.01777627E8 cpu, 0.0 io}, id = 1667
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))]): rowcount = 4.66488315E7, cumulative cost = {1.042399815E8 rows, 1.15182301E8 cpu, 0.0 io}, id = 1665
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns]): rowcount = 5.759115E7, cumulative cost = {5.759115E7 rows, 5.7591151E7 cpu, 0.0 io}, id = 1040
                HiveProject(d_date_sk=[$0]): rowcount = 9861.615, cumulative cost = {92772.23000000001 rows, 155960.615 cpu, 0.0 io}, id = 1650
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))]): rowcount = 9861.615, cumulative cost = {82910.615 rows, 146099.0 cpu, 0.0 io}, id = 1648
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}, id = 1043

AST 子句

输出查询的抽象语法树。

Example:

EXPLAIN AST
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

Outputs:

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key))))

DEPENDENCY 子句

EXPLAIN语句中使用DEPENDENCY会产生有关计划中 Importing 的额外信息。它显示了 Importing 的各种属性。例如,对于如下查询:

EXPLAIN DEPENDENCY
  SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

产生以下输出:

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}

Importing 既包含表又包含分区。请注意,即使在查询中未访问任何分区,该表仍然存在。

如果通过视图访问表,则依赖项将显示父项。考虑以下查询:

CREATE VIEW V1 AS SELECT key, value from src;
EXPLAIN DEPENDENCY SELECT * FROM V1;

产生以下输出:

{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}

如上所述,Importing 包含视图 V1 和视图 V1 引用的表'src'。

如果通过多个父级访问一个表,则显示所有输出。

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;
CREATE VIEW V4 AS
  SELECT src1.key, src2.value as value1, src3.value as value2
  FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;
EXPLAIN DEPENDENCY SELECT * FROM V4;

产生以下输出。

{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}

可以看出,src 是通过父 v1 和 v4 访问的。

授权条款

EXPLAIN语句中使用AUTHORIZATION会显示需要授权才能执行查询和授权失败的所有实体(如果存在)。例如,对于如下查询:

EXPLAIN AUTHORIZATION
  SELECT * FROM src JOIN srcpart;

产生以下输出:

INPUTS: 
  default@srcpart
  default@src
  default@srcpart@ds=2008-04-08/hr=11
  default@srcpart@ds=2008-04-08/hr=12
  default@srcpart@ds=2008-04-09/hr=11
  default@srcpart@ds=2008-04-09/hr=12
OUTPUTS: 
  hdfs://localhost:9000/tmp/.../-mr-10000
CURRENT_USER: 
  navis
OPERATION: 
  QUERY
AUTHORIZATION_FAILURES: 
  Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

使用FORMATTED关键字,它将以 JSON 格式返回。


"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}

LOCKS 子句

这对于了解系统将获得哪些锁以运行指定的查询很有用。从 Hive 版本 3.2.0(HIVE-17683)开始。

For example

EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t WHERE t.a1=5)

将产生这样的输出。

LOCK INFORMATION:
default.source -> SHARED_READ
default.target.p=1/q=2 -> SHARED_READ
default.target.p=1/q=3 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_WRITE
default.target.p=1/q=3 -> SHARED_WRITE
default.target.p=1/q=2 -> SHARED_WRITE

EXPLAIN FORMATTED LOCKS <sql>

还支持将生成 JSON 编码的输出。

VECTORIZATION 条款

将详细信息添加到 EXPLAIN 输出中,以显示为什么未对 Map and Reduce 工作进行矢量化处理。从 Hive 版本 2.3.0(HIVE-11394)开始。

语法:[仅用于说明] [摘要|运算符|表达|详细信息]

    • ONLY *选项禁止显示大多数非矢量化元素。
    • SUMMARY *(默认)显示 PLAN 的矢量化信息(已启用矢量化)以及 Map and Reduce 工作的摘要。
    • OPERATOR *显示用于运算符的向量化信息。例如。过滤器矢量化。包括摘要的所有信息。
    • EXPRESSION *显示表达式的向量化信息。例如。谓词表达。包括摘要和操作符的所有信息。
    • DETAIL *显示详细级别的向量化信息。它包含摘要,运算符和表达式的所有信息。

可选子句的默认值不是 ONLY 和 SUMMARY。

有关更多详细信息和示例,请参见HIVE-11394

ANALYZE 子句

用实际的行数 Comments 计划。从 Hive 2.2.0 开始(HIVE-14362)

格式为:(估算的行数)/(实际的行数)

Example:

对于下表扫描;估计是 500 行;但实际上扫描仅产生 13 行。

[...]
              TableScan [TS_13] (rows=500/13 width=178)
                Output:["key","value"]
[...]

用户级别的解释输出

自 Hive 1.1.0 中的HIVE-8600以来,我们支持在 set hive.log.explain.output == true(默认值为 false )后,在 log4j INFO 级别对任何查询提供用户级解释扩展输出。

自 Hive 3.1.0 中的HIVE-18469以来,对任何查询的用户级解释扩展输出将在 set hive.server2.webui.explain.output = true (默认值为 false )后显示在 WebUI/Drilldown/Query Plan 中。

从 Hive 1.2.0 中的HIVE-9780开始,我们在 Tez 用户上支持 Hive 的用户级解释。在 set hive.explain.user = true (默认为 false )之后,如果发送了以下查询,则用户可以看到更加清晰易读的操作树。

自 Hive 3.0.0 中的HIVE-11133以来,我们支持 Hive on Spark 用户的用户级解释。 Hive-on-Spark 使用单独的配置 hive.spark.explain.user ,默认情况下设置为 false。

EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31'
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 2
         File Output Operator [FS_8]
            compressed:false
            Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
            Group By Operator [GBY_6]
            |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
            |  outputColumnNames:["_col0","_col1"]
            |  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            |<-Map 1 [SIMPLE_EDGE]
               Reduce Output Operator [RS_5]
                  sort order:
                  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                  value expressions:_col0 (type: bigint), _col1 (type: bigint)
                  Group By Operator [GBY_4]
                     aggregations:["sum(_col0)","sum(_col1)"]
                     outputColumnNames:["_col0","_col1"]
                     Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                     Select Operator [SEL_2]
                        outputColumnNames:["_col0","_col1"]
                        Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                        TableScan [TS_0]
                           alias:src_orc_merge_test_part
                           Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE