apache-hive / 3.1.1 / reference / Hive_Schema_Tool.html

蜂房架构工具

Metastore 模式验证

Version

在 Hive 0.12.0 中引入。参见HIVE-3764

Hive 现在将模式版本记录在 metastore 数据库中,并验证 metastore 模式版本与要访问该 metastore 的 Hive 二进制文件兼容。请注意,默认情况下会禁用用于隐式创建或更改现有架构的 Hive 属性。 Hive 不会尝试隐式更改 Metastore 模式。当您对旧模式执行 Hive 查询时,它将无法访问元存储库:

$ build/dist/bin/hive -e "show tables"
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

该日志将包含有关找不到版本信息的错误:

...
Caused by: MetaException(message:Version information not found in metastore. )
...

默认情况下,配置属性 hive.metastore.schema.verification 为 false,如果不匹配,则元存储区将隐式写入架构版本。要启用严格的模式验证,您需要在hive-site.xml中将此属性设置为 true。

有关元存储的一般信息,请参见Hive MetastoreManagement

Hive 架构工具

Version

在 Hive 0.12.0 中引入。参见HIVE-5301。 (有关错误修复,另请参见HIVE-5449。)

Hive 发行版现在包括用于 Hive Metastore 模式操作的脱机工具。该工具可用于初始化当前 Hive 版本的 metastore 模式。它还可以处理将架构从旧版本升级到最新版本的工作。如果可用,它将尝试从元存储中查找当前架构。这将适用于将来的升级,例如从 0.12.0 到 0.13.0. 如果是从 0.7.0 或 0.10.0 之类的旧版本进行升级,则可以将现有元存储的架构版本指定为该工具的命令行选项。

schematool找出初始化或升级架构所需的 SQL 脚本,然后针对后端数据库执行这些脚本。从 Hive 配置中提取了元存储数据库连接信息,例如 JDBC URL,JDBC 驱动程序和数据库凭据。如果需要,您可以提供备用数据库凭据。

schematool 命令

schematool命令使用以下选项调用 Hive 模式工具:

$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type
 -driver <driver>                   Driver name for connection
 -dryRun                            List SQL scripts (no execute)
 -help                              Print this message
 -info                              Show config and schema details
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -metaDbType <metaDatabaseType>     Used only if upgrading the system catalog for hive
 -passWord <password>               Override config file password
 -upgradeSchema                     Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -url <url>                         Connection url to the database
 -userName <user>                   Override config file user name
 -verbose                           Only print SQL statements
(Additional catalog related options added in Hive 3.0.0 (HIVE-19135] release are below.
 -createCatalog <catalog>       Create catalog with given name
 -catalogLocation <location>        Location of new catalog, required when adding a catalog
 -catalogDescription <description>  Description of new catalog
 -ifNotExists                       If passed then it is not an error to create an existing catalog
 -moveDatabase <database>                     Move a database between catalogs.  All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
 -moveTable  <table>                Move a table to a different database.  Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase 
 -toCatalog  <catalog>              Catalog a moving database or table is going to.  This is required if you are moving a database or table.
 -fromCatalog <catalog>             Catalog a moving database or table is coming from.  This is required if you are moving a database or table.
 -toDatabase  <database>            Database a moving table is going to.  This is required if you are moving a table.
 -fromDatabase <database>           Database a moving table is coming from.  This is required if you are moving a table.

dbType 是必需的,可以是以下之一:

derby|mysql|postgres|oracle|mssql

Version

dbType“ mssql”是在 Hive 0.13.1 中使用HIVE-6862添加的。

Usage Examples

  • 初始化到当前架构以进行新的 Hive 设置:
$ schematool -dbType derby -initSchema
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting metastore schema initialization to 0.13.0
Initialization script hive-schema-0.13.0.derby.sql
Initialization script completed
schemaTool completed
  • 获取架构信息:
$ schematool -dbType derby -info
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Hive distribution version:       0.13.0
Metastore schema version:        0.13.0
schemaTool completed
  • 尝试使用旧的元存储库获取模式信息:
$ schematool -dbType derby -info
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Hive distribution version:       0.13.0
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
*** schemaTool failed ***

由于较旧的 metastore 不存储版本信息,因此该工具报告检索错误。

  • 通过指定'from'版本从 0.10.0 版本升级架构:
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting upgrade metastore schema from version 0.10.0 to 0.13.0
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Completed upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-0.12.0.derby.sql
Completed upgrade-0.11.0-to-0.12.0.derby.sql
Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
Completed upgrade-0.12.0-to-0.13.0.derby.sql
schemaTool completed
  • 升级试运行可用于列出给定升级所需的脚本。
$ build/dist/bin/schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting upgrade metastore schema from version 0.7.0 to 0.13.0
Upgrade script upgrade-0.7.0-to-0.8.0.derby.sql
Upgrade script upgrade-0.8.0-to-0.9.0.derby.sql
Upgrade script upgrade-0.9.0-to-0.10.0.derby.sql
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-0.12.0.derby.sql
Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
schemaTool completed

如果您只想查找架构升级所需的所有脚本,这将很有用。

  • 将数据库及其下的表从默认的 Hive 目录移动到自定义的 Spark 目录
build/dist/bin/schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
  • 将表从 Hive 目录移动到 Spark 目录
# Create the desired target database in spark catalog if it doesn't already exist.
beeline ... -e "create database if not exists newdb";
schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark

# Now move the table to target db under the spark catalog.
schematool -moveTable table1 -fromCatalog hive -toCatalog spark  -fromDatabase db1 -toDatabase newdb