On this page
LanguageManual Explain
EXPLAIN Syntax
Hive 提供了一个EXPLAIN
命令,该命令显示了查询的执行计划。该语句的语法如下:
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
HIVE 0.14.0 通过HIVE-5961支持AUTHORIZATION
。 Hive 2.3.0 通过HIVE-11394支持VECTORIZATION
。 Hive 3.2.0 通过HIVE-17683支持LOCKS
。
AST 已从HIVE-13533中的 EXPLAIN EXTENDED 中删除,并在HIVE-15932中作为单独的命令恢复。
在EXPLAIN
语句中使用EXTENDED
会产生有关计划中运算符的更多信息。这通常是物理信息,例如文件名。
Hive 查询将转换为阶段序列(它更是有向非循环图)。这些阶段可以是 map/reduce 阶段,或者甚至可以是执行元存储或文件系统操作(如移动和重命名)的阶段。说明输出分为三个部分:
查询的抽象语法树
计划不同阶段之间的依赖性
每个阶段的描述
这些阶段本身的描述显示了一系列操作符,以及与操作符相关联的元数据。元数据可能包含诸如 FilterOperator 的过滤器表达式或 SelectOperator 的选择表达式或 FileSinkOperator 的输出文件名之类的内容。
Example
例如,请考虑以下EXPLAIN
查询:
EXPLAIN
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;
该语句的输出包含以下部分:
- 依赖图
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
这表明阶段 1 是根阶段,阶段 2 完成后执行阶段 2,阶段 2 完成后执行阶段 0.
- 每个阶段的计划
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
src
Reduce Output Operator
key expressions:
expr: key
type: string
sort order: +
Map-reduce partition columns:
expr: rand()
type: double
tag: -1
value expressions:
expr: substr(value, 4)
type: string
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(UDFToDouble(VALUE.0))
keys:
expr: KEY.0
type: string
mode: partial1
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
name: binary_table
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
/tmp/hive-zshao/67494501/106593589.10001
Reduce Output Operator
key expressions:
expr: 0
type: string
sort order: +
Map-reduce partition columns:
expr: 0
type: string
tag: -1
value expressions:
expr: 1
type: double
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE.0)
keys:
expr: KEY.0
type: string
mode: final
Select Operator
expressions:
expr: 0
type: string
expr: 1
type: double
Select Operator
expressions:
expr: UDFToInteger(0)
type: int
expr: 1
type: double
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
在此示例中,有 2 个 Map/缩减阶段(阶段 1 和阶段 2)和 1 个与文件系统相关的阶段(阶段 0)。阶段 0 基本上将结果从临时目录移动到与表 dest_g1 对应的目录。Sort order
表示键表达式中用于排序的列数。每个“ +
”代表一列以升序排序,每个“ -
”代表一列以降序排序。
map/reduce 阶段本身包含 2 个部分:
从表别名到 Map 运算符树的 Map–此 Map 告诉 Map 器要调用哪个运算符树,以便处理特定表中的行或上一个 Map/归约阶段的结果。在以上示例的 Stage-1 中,src 表中的行由植根于 Reduce Output 运算符的运算符树处理。类似地,在阶段 2 中,阶段 1 的结果行由植根于另一个 Reduce Output 运算符的另一个运算符树处理。这些 Reduce 输出运算符中的每一个都根据元数据中显示的标准将数据分区到 Reducer。
还原运算符树–这是运算符树,用于处理 map/reduce 作业的缩减程序上的所有行。例如,在 Stage-1 中,Reducer Operator Tree 正在执行部分聚合,而 Stage-2 中的 Reducer Operator Tree 从 Stage-1 中计算出的部分聚合中计算最终聚合。
CBO 条款
CBO 子句输出由 Calcite 优化器生成的计划。它可以选择包括有关使用 Calcite 默认成本模型和用于联接重新排序的成本模型的计划成本的信息。从 Hive 版本 4.0.0(HIVE-17503/HIVE-21184)开始。
语法:EXPLAIN [FORMATTED] CBO [COST | JOINCOST]
-
- COST *选项打印使用 Calcite 默认成本模型计算的计划和成本。
-
- JOINCOST *选项打印使用用于联接重新排序的成本模型计算的计划和成本。
例如,我们可以执行以下语句:
EXPLAIN CBO
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
SUM(SR_FEE) AS ctr_total_return
FROM store_returns, date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year =2000
GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'NM'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100
该查询将被优化,并且 Hive 产生以下输出:
CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100])
HiveProject(c_customer_id=[$1])
HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(c_customer_sk=[$0], c_customer_id=[$1])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, customer]], table:alias=[customer])
HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))])
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
HiveProject(s_store_sk=[$0])
HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))])
HiveTableScan(table=[[default, store]], table:alias=[store])
HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0])
HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)])
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))])
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
反过来,我们可以执行以下命令:
EXPLAIN CBO COST
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
SUM(SR_FEE) AS ctr_total_return
FROM store_returns, date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year =2000
GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'NM'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100
它将产生一个类似的计划,但是每个运算符的成本将被嵌入到运算符 Descriptors 的旁边:
CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100]): rowcount = 100.0, cumulative cost = {2.395588892021712E26 rows, 1.197794434438787E26 cpu, 0.0 io}, id = 1683
HiveProject(c_customer_id=[$1]): rowcount = 1.1977944344387866E26, cumulative cost = {2.395588892021712E26 rows, 1.197794434438787E26 cpu, 0.0 io}, id = 1681
HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 1.1977944344387866E26, cumulative cost = {1.1977944575829254E26 rows, 4.160211553874922E10 cpu, 0.0 io}, id = 1679
HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 2.3144135067474273E18, cumulative cost = {2.3144137967122499E18 rows, 1.921860676139634E10 cpu, 0.0 io}, id = 1663
HiveProject(c_customer_sk=[$0], c_customer_id=[$1]): rowcount = 7.2E7, cumulative cost = {2.24E8 rows, 3.04000001E8 cpu, 0.0 io}, id = 1640
HiveFilter(condition=[IS NOT NULL($0)]): rowcount = 7.2E7, cumulative cost = {1.52E8 rows, 1.60000001E8 cpu, 0.0 io}, id = 1638
HiveTableScan(table=[[default, customer]], table:alias=[customer]): rowcount = 8.0E7, cumulative cost = {8.0E7 rows, 8.0000001E7 cpu, 0.0 io}, id = 1055
HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 2.1429754692105807E11, cumulative cost = {2.897408225471977E11 rows, 1.891460676039634E10 cpu, 0.0 io}, id = 1661
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2]): rowcount = 6.210443022113779E9, cumulative cost = {7.544327346205959E10 rows, 1.891460312135634E10 cpu, 0.0 io}, id = 1685
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)]): rowcount = 6.210443022113779E9, cumulative cost = {6.92328304399458E10 rows, 2.8327405501500005E8 cpu, 0.0 io}, id = 1654
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 6.2104430221137794E10, cumulative cost = {6.2246082040067795E10 rows, 2.8327405501500005E8 cpu, 0.0 io}, id = 1652
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14]): rowcount = 4.198394835000001E7, cumulative cost = {1.4155904670000002E8 rows, 2.8311809440000004E8 cpu, 0.0 io}, id = 1645
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))]): rowcount = 4.198394835000001E7, cumulative cost = {9.957509835000001E7 rows, 1.15182301E8 cpu, 0.0 io}, id = 1643
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns]): rowcount = 5.759115E7, cumulative cost = {5.759115E7 rows, 5.7591151E7 cpu, 0.0 io}, id = 1040
HiveProject(d_date_sk=[$0]): rowcount = 9861.615, cumulative cost = {92772.23000000001 rows, 155960.615 cpu, 0.0 io}, id = 1650
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))]): rowcount = 9861.615, cumulative cost = {82910.615 rows, 146099.0 cpu, 0.0 io}, id = 1648
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}, id = 1043
HiveProject(s_store_sk=[$0]): rowcount = 230.04000000000002, cumulative cost = {2164.08 rows, 3639.04 cpu, 0.0 io}, id = 1659
HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))]): rowcount = 230.04000000000002, cumulative cost = {1934.04 rows, 3409.0 cpu, 0.0 io}, id = 1657
HiveTableScan(table=[[default, store]], table:alias=[store]): rowcount = 1704.0, cumulative cost = {1704.0 rows, 1705.0 cpu, 0.0 io}, id = 1050
HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0]): rowcount = 6.900492246793088E8, cumulative cost = {8.537206083312463E10 rows, 2.2383508777352882E10 cpu, 0.0 io}, id = 1677
HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)]): rowcount = 6.900492246793088E8, cumulative cost = {8.468201160844533E10 rows, 2.1003410327994267E10 cpu, 0.0 io}, id = 1675
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2]): rowcount = 6.900492246793088E9, cumulative cost = {8.381945007759619E10 rows, 2.1003410327994267E10 cpu, 0.0 io}, id = 1686
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)]): rowcount = 6.900492246793088E9, cumulative cost = {7.69189578308031E10 rows, 3.01933587615E8 cpu, 0.0 io}, id = 1673
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available]): rowcount = 6.900492246793088E10, cumulative cost = {6.915590405316087E10 rows, 3.01933587615E8 cpu, 0.0 io}, id = 1671
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14]): rowcount = 4.66488315E7, cumulative cost = {1.50888813E8 rows, 3.01777627E8 cpu, 0.0 io}, id = 1667
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))]): rowcount = 4.66488315E7, cumulative cost = {1.042399815E8 rows, 1.15182301E8 cpu, 0.0 io}, id = 1665
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns]): rowcount = 5.759115E7, cumulative cost = {5.759115E7 rows, 5.7591151E7 cpu, 0.0 io}, id = 1040
HiveProject(d_date_sk=[$0]): rowcount = 9861.615, cumulative cost = {92772.23000000001 rows, 155960.615 cpu, 0.0 io}, id = 1650
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))]): rowcount = 9861.615, cumulative cost = {82910.615 rows, 146099.0 cpu, 0.0 io}, id = 1648
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}, id = 1043
AST 子句
输出查询的抽象语法树。
Example:
EXPLAIN AST
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;
Outputs:
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key))))
DEPENDENCY 子句
在EXPLAIN
语句中使用DEPENDENCY
会产生有关计划中 Importing 的额外信息。它显示了 Importing 的各种属性。例如,对于如下查询:
EXPLAIN DEPENDENCY
SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key
产生以下输出:
{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}
Importing 既包含表又包含分区。请注意,即使在查询中未访问任何分区,该表仍然存在。
如果通过视图访问表,则依赖项将显示父项。考虑以下查询:
CREATE VIEW V1 AS SELECT key, value from src;
EXPLAIN DEPENDENCY SELECT * FROM V1;
产生以下输出:
{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}
如上所述,Importing 包含视图 V1 和视图 V1 引用的表'src'。
如果通过多个父级访问一个表,则显示所有输出。
CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;
CREATE VIEW V4 AS
SELECT src1.key, src2.value as value1, src3.value as value2
FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;
EXPLAIN DEPENDENCY SELECT * FROM V4;
产生以下输出。
{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}
可以看出,src 是通过父 v1 和 v4 访问的。
授权条款
在EXPLAIN
语句中使用AUTHORIZATION
会显示需要授权才能执行查询和授权失败的所有实体(如果存在)。例如,对于如下查询:
EXPLAIN AUTHORIZATION
SELECT * FROM src JOIN srcpart;
产生以下输出:
INPUTS:
default@srcpart
default@src
default@srcpart@ds=2008-04-08/hr=11
default@srcpart@ds=2008-04-08/hr=12
default@srcpart@ds=2008-04-09/hr=11
default@srcpart@ds=2008-04-09/hr=12
OUTPUTS:
hdfs://localhost:9000/tmp/.../-mr-10000
CURRENT_USER:
navis
OPERATION:
QUERY
AUTHORIZATION_FAILURES:
Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]
使用FORMATTED
关键字,它将以 JSON 格式返回。
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}
LOCKS 子句
这对于了解系统将获得哪些锁以运行指定的查询很有用。从 Hive 版本 3.2.0(HIVE-17683)开始。
For example
EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t WHERE t.a1=5)
将产生这样的输出。
LOCK INFORMATION:
default.source -> SHARED_READ
default.target.p=1/q=2 -> SHARED_READ
default.target.p=1/q=3 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_WRITE
default.target.p=1/q=3 -> SHARED_WRITE
default.target.p=1/q=2 -> SHARED_WRITE
EXPLAIN FORMATTED LOCKS <sql>
还支持将生成 JSON 编码的输出。
VECTORIZATION 条款
将详细信息添加到 EXPLAIN 输出中,以显示为什么未对 Map and Reduce 工作进行矢量化处理。从 Hive 版本 2.3.0(HIVE-11394)开始。
语法:[仅用于说明] [摘要|运算符|表达|详细信息]
-
- ONLY *选项禁止显示大多数非矢量化元素。
-
- SUMMARY *(默认)显示 PLAN 的矢量化信息(已启用矢量化)以及 Map and Reduce 工作的摘要。
-
- OPERATOR *显示用于运算符的向量化信息。例如。过滤器矢量化。包括摘要的所有信息。
-
- EXPRESSION *显示表达式的向量化信息。例如。谓词表达。包括摘要和操作符的所有信息。
-
- DETAIL *显示详细级别的向量化信息。它包含摘要,运算符和表达式的所有信息。
可选子句的默认值不是 ONLY 和 SUMMARY。
有关更多详细信息和示例,请参见HIVE-11394。
ANALYZE 子句
用实际的行数 Comments 计划。从 Hive 2.2.0 开始(HIVE-14362)
格式为:(估算的行数)/(实际的行数)
Example:
对于下表扫描;估计是 500 行;但实际上扫描仅产生 13 行。
[...]
TableScan [TS_13] (rows=500/13 width=178)
Output:["key","value"]
[...]
用户级别的解释输出
自 Hive 1.1.0 中的HIVE-8600以来,我们支持在 set hive.log.explain.output == true(默认值为 false )后,在 log4j INFO 级别对任何查询提供用户级解释扩展输出。
自 Hive 3.1.0 中的HIVE-18469以来,对任何查询的用户级解释扩展输出将在 set hive.server2.webui.explain.output = true (默认值为 false )后显示在 WebUI/Drilldown/Query Plan 中。
从 Hive 1.2.0 中的HIVE-9780开始,我们在 Tez 用户上支持 Hive 的用户级解释。在 set hive.explain.user = true (默认为 false )之后,如果发送了以下查询,则用户可以看到更加清晰易读的操作树。
自 Hive 3.0.0 中的HIVE-11133以来,我们支持 Hive on Spark 用户的用户级解释。 Hive-on-Spark 使用单独的配置 hive.spark.explain.user ,默认情况下设置为 false。
EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31'
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2
File Output Operator [FS_8]
compressed:false
Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
Group By Operator [GBY_6]
| aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
|<-Map 1 [SIMPLE_EDGE]
Reduce Output Operator [RS_5]
sort order:
Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
value expressions:_col0 (type: bigint), _col1 (type: bigint)
Group By Operator [GBY_4]
aggregations:["sum(_col0)","sum(_col1)"]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
Select Operator [SEL_2]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
TableScan [TS_0]
alias:src_orc_merge_test_part
Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE