22.2.7 MySQL 分区如何处理 NULL

在 MySQL 中进行分区并不能禁止NULL作为分区 table 达式的值,无论它是列值还是用户提供的 table 达式的值。即使允许使用NULL作为必须以其他方式产生整数的 table 达式的值,也要记住NULL不是数字,这一点很重要。 MySQL 的分区实现将NULL视为小于任何非NULL值,就像ORDER BY一样。

这意味着对NULL的处理在不同类型的分区之间会有所不同,并且如果您不为此做好准备,可能会产生您不希望的行为。在这种情况下,我们将在本节中讨论每种 MySQL 分区类型在确定应存储行的分区时如何处理NULL值,并为每个实例提供示例。

使用 RANGE 分区处理 NULL. 如果将行插入到由RANGE分区的 table 中,使得用于确定分区的列值为NULL,则该行将插入最低的分区。考虑一下名为p的数据库中的这两个 table,该数据库创建如下:

mysql> CREATE TABLE t1 (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY RANGE(c1) (
    ->     PARTITION p0 VALUES LESS THAN (0),
    ->     PARTITION p1 VALUES LESS THAN (10),
    ->     PARTITION p2 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE TABLE t2 (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY RANGE(c1) (
    ->     PARTITION p0 VALUES LESS THAN (-5),
    ->     PARTITION p1 VALUES LESS THAN (0),
    ->     PARTITION p2 VALUES LESS THAN (10),
    ->     PARTITION p3 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.09 sec)

您可以对INFORMATION_SCHEMA数据库中的PARTITIONStable 使用以下查询来查看由这两个CREATE TABLE语句创建的分区:

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH
     >   FROM INFORMATION_SCHEMA.PARTITIONS
     >   WHERE TABLE_SCHEMA = 'p' AND TABLE_NAME LIKE 't_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| t1         | p0             |          0 |              0 |           0 |
| t1         | p1             |          0 |              0 |           0 |
| t1         | p2             |          0 |              0 |           0 |
| t2         | p0             |          0 |              0 |           0 |
| t2         | p1             |          0 |              0 |           0 |
| t2         | p2             |          0 |              0 |           0 |
| t2         | p3             |          0 |              0 |           0 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.00 sec)

(有关此 table 的更多信息,请参见第 24.16 节“ INFORMATION_SCHEMA PARTITIONStable”。)现在,让我们在用作分区键的列中包含一行包含NULL的单行填充这些 table,并使用一对SELECT语句验证是否插入了这些行:

mysql> INSERT INTO t1 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+------+--------+
| id   | name   |
+------+--------+
| NULL | mothra |
+------+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t2;
+------+--------+
| id   | name   |
+------+--------+
| NULL | mothra |
+------+--------+
1 row in set (0.00 sec)

通过针对INFORMATION_SCHEMA.PARTITIONS重新运行上一个查询并检查输出,可以查看用于存储插入行的分区。

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH
     >   FROM INFORMATION_SCHEMA.PARTITIONS
     >   WHERE TABLE_SCHEMA = 'p' AND TABLE_NAME LIKE 't_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| t1         | p0             |          1 |             20 |          20 |
| t1         | p1             |          0 |              0 |           0 |
| t1         | p2             |          0 |              0 |           0 |
| t2         | p0             |          1 |             20 |          20 |
| t2         | p1             |          0 |              0 |           0 |
| t2         | p2             |          0 |              0 |           0 |
| t2         | p3             |          0 |              0 |           0 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.01 sec)

您还可以通过删除以下分区,然后重新运行SELECT语句,来证明这些行存储在每个 table 的最低分区中:

mysql> ALTER TABLE t1 DROP PARTITION p0;
Query OK, 0 rows affected (0.16 sec)

mysql> ALTER TABLE t2 DROP PARTITION p0;
Query OK, 0 rows affected (0.16 sec)

mysql> SELECT * FROM t1;
Empty set (0.00 sec)

mysql> SELECT * FROM t2;
Empty set (0.00 sec)

(有关ALTER TABLE ... DROP PARTITION的更多信息,请参见第 13.1.8 节“ ALTER TABLE 语句”。)

对于使用 SQL 函数的 table 达式进行分区,也以这种方式对待NULL。假设我们使用诸如此类的CREATE TABLE语句定义一个 table:

CREATE TABLE tndate (
    id INT,
    dt DATE
)
PARTITION BY RANGE( YEAR(dt) ) (
    PARTITION p0 VALUES LESS THAN (1990),
    PARTITION p1 VALUES LESS THAN (2000),
    PARTITION p2 VALUES LESS THAN MAXVALUE
);

与其他 MySQL 函数一样,YEAR(NULL)返回NULLdt列值为NULL的行被视为分区 table 达式评估为小于任何其他值的值,因此将其插入分区p0

使用 LIST 分区处理 NULL. 当且仅当使用包含NULL的值列 table 定义了分区之一时,被LIST分区的 table 才允许NULL值。相反的是,用LIST分区的 table 未在值列 table 中显式使用NULL,该 table 将拒绝为分区 table 达式生成NULL值的行,如以下示例所示:

mysql> CREATE TABLE ts1 (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    ->     PARTITION p0 VALUES IN (0, 3, 6),
    ->     PARTITION p1 VALUES IN (1, 4, 7),
    ->     PARTITION p2 VALUES IN (2, 5, 8)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO ts1 VALUES (9, 'mothra');
ERROR 1504 (HY000): Table has no partition for value 9

mysql> INSERT INTO ts1 VALUES (NULL, 'mothra');
ERROR 1504 (HY000): Table has no partition for value NULL

只能将c1值介于08之间的行插入ts1NULL落在该范围之外,就像数字9一样。我们可以创建 tablets2ts3,其值列 table 包含NULL,如下所示:

mysql> CREATE TABLE ts2 (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    ->     PARTITION p0 VALUES IN (0, 3, 6),
    ->     PARTITION p1 VALUES IN (1, 4, 7),
    ->     PARTITION p2 VALUES IN (2, 5, 8),
    ->     PARTITION p3 VALUES IN (NULL)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE ts3 (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY LIST(c1) (
    ->     PARTITION p0 VALUES IN (0, 3, 6),
    ->     PARTITION p1 VALUES IN (1, 4, 7, NULL),
    ->     PARTITION p2 VALUES IN (2, 5, 8)
    -> );
Query OK, 0 rows affected (0.01 sec)

在定义用于分区的值列 table 时,您可以(并且应该)将NULL视为其他任何值。例如,VALUES IN (NULL)VALUES IN (1, 4, 7, NULL)均有效,VALUES IN (1, NULL, 4, 7)VALUES IN (NULL, 1, 4, 7)等均有效。您可以在 tablets2ts3的每个 table 中插入具有NULL的列c1的行:

mysql> INSERT INTO ts2 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO ts3 VALUES (NULL, 'mothra');
Query OK, 1 row affected (0.00 sec)

通过对INFORMATION_SCHEMA.PARTITIONS发出适当的查询,您可以确定使用了哪些分区来存储刚插入的行(与前面的示例一样,我们假设分区 table 是在p数据库中创建的):

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH
     >   FROM INFORMATION_SCHEMA.PARTITIONS
     >   WHERE TABLE_SCHEMA = 'p' AND TABLE_NAME LIKE 'ts_';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| ts2        | p0             |          0 |              0 |           0 |
| ts2        | p1             |          0 |              0 |           0 |
| ts2        | p2             |          0 |              0 |           0 |
| ts2        | p3             |          1 |             20 |          20 |
| ts3        | p0             |          0 |              0 |           0 |
| ts3        | p1             |          1 |             20 |          20 |
| ts3        | p2             |          0 |              0 |           0 |
+------------+----------------+------------+----------------+-------------+
7 rows in set (0.01 sec)

如本节前面所示,您还可以通过删除分区并执行SELECT来验证用于存储行的分区。

使用 HASH 和 KEY 分区处理 NULL. 对于HASHKEY分区的 table,NULL的处理方式有所不同。在这些情况下,任何产生NULL值的分区 table 达式都将被视为其返回值为零。我们可以通过检查创建由HASH分区的 table 并用包含适当值的记录填充 table 对文件系统的影响来验证此行为。假设您具有使用以下语句创建的 tableth(也在p数据库中):

mysql> CREATE TABLE th (
    ->     c1 INT,
    ->     c2 VARCHAR(20)
    -> )
    -> PARTITION BY HASH(c1)
    -> PARTITIONS 2;
Query OK, 0 rows affected (0.00 sec)

可以使用此处显示的查询查看属于该 table 的分区:

mysql> SELECT TABLE_NAME,PARTITION_NAME,TABLE_ROWS,AVG_ROW_LENGTH,DATA_LENGTH
     >   FROM INFORMATION_SCHEMA.PARTITIONS
     >   WHERE TABLE_SCHEMA = 'p' AND TABLE_NAME ='th';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| th         | p0             |          0 |              0 |           0 |
| th         | p1             |          0 |              0 |           0 |
+------------+----------------+------------+----------------+-------------+
2 rows in set (0.00 sec)

每个分区的TABLE_ROWS为 0.现在将两行插入到c1列值为NULL和 0 的th中,并验证是否插入了这些行,如下所示:

mysql> INSERT INTO th VALUES (NULL, 'mothra'), (0, 'gigan');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM th;
+------+---------+
| c1   | c2      |
+------+---------+
| NULL | mothra  |
+------+---------+
|    0 | gigan   |
+------+---------+
2 rows in set (0.01 sec)

回想一下,对于任何整数* N *,NULL MOD N的值始终为NULL。对于被HASHKEY分区的 table,将此结果用于确定正确的分区为0。再次检查INFORMATION_SCHEMA.PARTITIONStable,我们可以看到两行都插入了分区p0

mysql> SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH
     >   FROM INFORMATION_SCHEMA.PARTITIONS
     >   WHERE TABLE_SCHEMA = 'p' AND TABLE_NAME ='th';
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| th         | p0             |          2 |             20 |          20 |
| th         | p1             |          0 |              0 |           0 |
+------------+----------------+------------+----------------+-------------+
2 rows in set (0.00 sec)

通过在 table 的定义中使用PARTITION BY KEY代替PARTITION BY HASH重复最后一个示例,您可以验证NULL对于这种类型的分区也被视为 0.