On this page
蜂房架构工具
Metastore 模式验证
Version
在 Hive 0.12.0 中引入。参见HIVE-3764。
Hive 现在将模式版本记录在 metastore 数据库中,并验证 metastore 模式版本与要访问该 metastore 的 Hive 二进制文件兼容。请注意,默认情况下会禁用用于隐式创建或更改现有架构的 Hive 属性。 Hive 不会尝试隐式更改 Metastore 模式。当您对旧模式执行 Hive 查询时,它将无法访问元存储库:
$ build/dist/bin/hive -e "show tables"
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
该日志将包含有关找不到版本信息的错误:
...
Caused by: MetaException(message:Version information not found in metastore. )
...
默认情况下,配置属性 hive.metastore.schema.verification 为 false,如果不匹配,则元存储区将隐式写入架构版本。要启用严格的模式验证,您需要在hive-site.xml
中将此属性设置为 true。
有关元存储的一般信息,请参见Hive MetastoreManagement。
Hive 架构工具
Hive 发行版现在包括用于 Hive Metastore 模式操作的脱机工具。该工具可用于初始化当前 Hive 版本的 metastore 模式。它还可以处理将架构从旧版本升级到最新版本的工作。如果可用,它将尝试从元存储中查找当前架构。这将适用于将来的升级,例如从 0.12.0 到 0.13.0. 如果是从 0.7.0 或 0.10.0 之类的旧版本进行升级,则可以将现有元存储的架构版本指定为该工具的命令行选项。
schematool
找出初始化或升级架构所需的 SQL 脚本,然后针对后端数据库执行这些脚本。从 Hive 配置中提取了元存储数据库连接信息,例如 JDBC URL,JDBC 驱动程序和数据库凭据。如果需要,您可以提供备用数据库凭据。
schematool 命令
schematool
命令使用以下选项调用 Hive 模式工具:
$ schematool -help
usage: schemaTool
-dbType <databaseType> Metastore database type
-driver <driver> Driver name for connection
-dryRun List SQL scripts (no execute)
-help Print this message
-info Show config and schema details
-initSchema Schema initialization
-initSchemaTo <initTo> Schema initialization to a version
-metaDbType <metaDatabaseType> Used only if upgrading the system catalog for hive
-passWord <password> Override config file password
-upgradeSchema Schema upgrade
-upgradeSchemaFrom <upgradeFrom> Schema upgrade from a version
-url <url> Connection url to the database
-userName <user> Override config file user name
-verbose Only print SQL statements
(Additional catalog related options added in Hive 3.0.0 (HIVE-19135] release are below.
-createCatalog <catalog> Create catalog with given name
-catalogLocation <location> Location of new catalog, required when adding a catalog
-catalogDescription <description> Description of new catalog
-ifNotExists If passed then it is not an error to create an existing catalog
-moveDatabase <database> Move a database between catalogs. All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
-moveTable <table> Move a table to a different database. Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase
-toCatalog <catalog> Catalog a moving database or table is going to. This is required if you are moving a database or table.
-fromCatalog <catalog> Catalog a moving database or table is coming from. This is required if you are moving a database or table.
-toDatabase <database> Database a moving table is going to. This is required if you are moving a table.
-fromDatabase <database> Database a moving table is coming from. This is required if you are moving a table.
dbType 是必需的,可以是以下之一:
derby|mysql|postgres|oracle|mssql
Version
dbType“ mssql
”是在 Hive 0.13.1 中使用HIVE-6862添加的。
Usage Examples
- 初始化到当前架构以进行新的 Hive 设置:
$ schematool -dbType derby -initSchema
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 0.13.0
Initialization script hive-schema-0.13.0.derby.sql
Initialization script completed
schemaTool completed
- 获取架构信息:
$ schematool -dbType derby -info
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Hive distribution version: 0.13.0
Metastore schema version: 0.13.0
schemaTool completed
- 尝试使用旧的元存储库获取模式信息:
$ schematool -dbType derby -info
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Hive distribution version: 0.13.0
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
*** schemaTool failed ***
由于较旧的 metastore 不存储版本信息,因此该工具报告检索错误。
- 通过指定'from'版本从 0.10.0 版本升级架构:
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting upgrade metastore schema from version 0.10.0 to 0.13.0
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Completed upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-0.12.0.derby.sql
Completed upgrade-0.11.0-to-0.12.0.derby.sql
Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
Completed upgrade-0.12.0-to-0.13.0.derby.sql
schemaTool completed
- 升级试运行可用于列出给定升级所需的脚本。
$ build/dist/bin/schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting upgrade metastore schema from version 0.7.0 to 0.13.0
Upgrade script upgrade-0.7.0-to-0.8.0.derby.sql
Upgrade script upgrade-0.8.0-to-0.9.0.derby.sql
Upgrade script upgrade-0.9.0-to-0.10.0.derby.sql
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-0.12.0.derby.sql
Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
schemaTool completed
如果您只想查找架构升级所需的所有脚本,这将很有用。
- 将数据库及其下的表从默认的 Hive 目录移动到自定义的 Spark 目录
build/dist/bin/schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
- 将表从 Hive 目录移动到 Spark 目录
# Create the desired target database in spark catalog if it doesn't already exist.
beeline ... -e "create database if not exists newdb";
schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark
# Now move the table to target db under the spark catalog.
schematool -moveTable table1 -fromCatalog hive -toCatalog spark -fromDatabase db1 -toDatabase newdb