Type Qualifiers in Hive

Intro

Hive will need to support some kind of type qualifiers/parameters in its type metadata to be able to enforce type features such as decimal precision/scale or char/varchar length and collation. This involves changes to the PrimitiveTypeEntry/TypeInfo/ObjectInspectors, possibly metastore changes,
My impression is that the actual enforcement of the type qualifiers should be done by the ObjectInspectors/Converters/casts operations.  It should be ok to do col * col when col is a decimal(2) value of 99, it would fail if you try to cast the result to decimal(2) or try to insert it to a decimal(2) column.  

Initial prototype work

There is some initial work on this in an initial patch for HIVE-4844. There is a BaseTypeParams object to represent type parameters, with VarcharTypeParams as a varchar-specific subclass containing the string length. The PrimitiveTypeEntryTypeInfo/ObjectInspectors are augmented to contain this BaseTypeParams object if the column/expression has type parameters. There also needed to be additional PrimitiveTypeEntryTypeInfo/ObjectInspectors factory methods which take a BaseTypeParams parameter.

Some issues/questions from this:

MetaStore Changes

There are a few different options here:

No metastore changes

The type qualifiers could simply be stored as part of the type string for a column. The qualifiers would be validated during when creating/altering the column, and they would need to be parsed when creating TypeInfo/ObjectInspectors. This approach has the advantage that no additional metastore changes would be needed, though it would be more difficult to query these type attributes if someone is querying the metastore directly, since parsing of the type string is required.

Add additional columns to COLUMNS_V2 table in metastore

This approach would be similar to the attributes in the INFORMATION_SCHEMA.COLUMNS that some DBMS catalog tables have, such as those listed below:

<pre>

CHARACTER_MAXIMUM_LENGTH

bigint(21) unsigned

YES

 

NULL

 

CHARACTER_OCTET_LENGTH

bigint(21) unsigned

YES

 

NULL

 

NUMERIC_PRECISION

bigint(21) unsigned

YES

 

NULL

 

NUMERIC_SCALE

bigint(21) unsigned

YES

 

NULL

 

CHARACTER_SET_NAME

varchar(32)

YES

 

NULL

 

COLLATION_NAME

varchar(32)

YES

 

NULL

 

</pre>

We could add new columns to the COLUMNS_V2 table for any type qualifiers we are trying to support (initially looks like CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE). Advantages to this would be that it is easier to query these parameters than the first approach, though types with no parameters would still have these columns (set to null).

New table with type qualifiers in megastore

Rather than having to change the COLUMNS_V2 table we could have a new table to hold the type qualifier information. This would mean no additions to the existing COLUMNS_V2 table, and non-parameterized types would have no rows in this new table. But it would mean an extra query to this new table any time we are fetching column metadata from the metastore.

首页