10.8.5 The binary Collation Compared to _bin Collations
This section describes how the binary
collation for binary strings compares to _bin
collations for nonbinary strings.
Binary strings (as stored using the BINARY
, VARBINARY
, and BLOB
data types) have a character set and collation named binary
. Binary strings are sequences of bytes and the numeric values of those bytes determine comparison and sort order. See Section 10.10.8, “The Binary Character Set”.
Nonbinary strings (as stored using the CHAR
, VARCHAR
, and TEXT
data types) have a character set and collation other than binary
. A given nonbinary character set can have several collations, each of which defines a particular comparison and sort order for the characters in the set. One of these is the binary collation, indicated by a _bin
suffix in the collation name. For example, the binary collation for utf8
and latin1
is named utf8_bin
and latin1_bin
, respectively.
The binary
collation differs from _bin
collations in several respects, discussed in the following sections:
Binary strings are sequences of bytes. For the binary
collation, comparison and sorting are based on numeric byte values. Nonbinary strings are sequences of characters, which might be multibyte. Collations for nonbinary strings define an ordering of the character values for comparison and sorting. For _bin
collations, this ordering is based on numeric character code values, which is similar to ordering for binary strings except that character code values might be multibyte.
A nonbinary string has a character set and is automatically converted to another character set in many cases, even when the string has a _bin
collation:
When assigning column values to another column that has a different character set:
UPDATE t1 SET utf8_bin_column=latin1_column; INSERT INTO t1 (latin1_column) SELECT utf8_bin_column FROM t2;
When assigning column values for
INSERT
orUPDATE
using a string literal:SET NAMES latin1; INSERT INTO t1 (utf8_bin_column) VALUES ('string-in-latin1');
When sending results from the server to a client:
SET NAMES latin1; SELECT utf8_bin_column FROM t2;
For binary string columns, no conversion occurs. For cases similar to those preceding, the string value is copied byte-wise.
Collations for nonbinary character sets provide information about lettercase of characters, so characters in a nonbinary string can be converted from one lettercase to another, even for _bin
collations that ignore lettercase for ordering:
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
mysql> SELECT LOWER('aA'), UPPER('zZ');
+-------------+-------------+
| LOWER('aA') | UPPER('zZ') |
+-------------+-------------+
| aa | ZZ |
+-------------+-------------+
The concept of lettercase does not apply to bytes in a binary string. To perform lettercase conversion, the string must first be converted to a nonbinary string using a character set appropriate for the data stored in the string:
mysql> SET NAMES binary;
mysql> SELECT LOWER('aA'), LOWER(CONVERT('aA' USING utf8mb4));
+-------------+------------------------------------+
| LOWER('aA') | LOWER(CONVERT('aA' USING utf8mb4)) |
+-------------+------------------------------------+
| aA | aa |
+-------------+------------------------------------+
Nonbinary strings have PAD SPACE
behavior for all collations, including _bin
collations. Trailing spaces are insignificant in comparisons:
mysql> SET NAMES utf8 COLLATE utf8_bin;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 1 |
+------------+
For binary strings, all bytes are significant in comparisons, including trailing spaces:
mysql> SET NAMES binary;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 0 |
+------------+
CHAR(
columns store nonbinary strings N
)N
characters long. For inserts, values shorter than N
characters are extended with spaces. For retrievals, trailing spaces are removed.
BINARY(
columns store binary strings N
)N
bytes long. For inserts, values shorter than N
bytes are extended with 0x00
bytes. For retrievals, nothing is removed; a value of the declared length is always returned.
mysql> CREATE TABLE t1 (
a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
b BINARY(10)
);
mysql> INSERT INTO t1 VALUES ('x','x');
mysql> INSERT INTO t1 VALUES ('x ','x ');
mysql> SELECT a, b, HEX(a), HEX(b) FROM t1;
+------+------------+--------+----------------------+
| a | b | HEX(a) | HEX(b) |
+------+------------+--------+----------------------+
| x | x | 78 | 78000000000000000000 |
| x | x | 78 | 78200000000000000000 |
+------+------------+--------+----------------------+