$strLenBytes (aggregation)

在本页面

Definition

$strLenBytes
- 3.4 版的新功能。

返回指定字符串中 UTF-8 编码的字节数。

{ $strLenBytes: <string expression> }

该参数可以是任何有效的expression，只要它可以解析为字符串即可。有关表达式的更多信息，请参见Expressions。

如果参数解析为值null或引用了缺少的字段，则$strLenBytes返回错误。

Behavior

$strLenBytes运算符计算字符串中 UTF-8 编码字节的数量，其中每个字符可以使用一个到四个字节。

例如，US-ASCII 字符使用一个字节编码。带有变音符号的字符和其他拉丁字母字符(即英语字母之外的拉丁字符)使用两个字节进行编码。中文，日文和韩 Literals 符通常需要三个字节，而其他 Unicode 平面(表情符号，math 符号等)则需要四个字节。

$strLenBytes运算符与$strLenCP运算符不同，后者不管指定字符使用多少字节，都对指定字符串中的code points进行计数。

Example	Results	Notes
`{ $strLenBytes: "abcde" }`	`5`	每个字符使用一个字节编码。
`{ $strLenBytes: "Hello World!" }`	`12`	每个字符使用一个字节编码。
`{ $strLenBytes: "cafeteria" }`	`9`	每个字符使用一个字节编码。
`{ $strLenBytes: "cafétéria" }`	`11`	`é`使用两个字节编码。
`{ $strLenBytes: "" }`	`0`	空字符串返回 0.
`{ $strLenBytes: "$€λG" }`	`7`	`€`使用三个字节编码。 `λ`使用两个字节编码。
`{ $strLenBytes: "寿司" }`	`6`	每个字符使用三个字节编码。

Example

单字节和多字节字符集

名为food的集合包含以下文档：

{ "_id" : 1, "name" : "apple" }
{ "_id" : 2, "name" : "banana" }
{ "_id" : 3, "name" : "éclair" }
{ "_id" : 4, "name" : "hamburger" }
{ "_id" : 5, "name" : "jalapeño" }
{ "_id" : 6, "name" : "pizza" }
{ "_id" : 7, "name" : "tacos" }
{ "_id" : 8, "name" : "寿司" }

以下操作使用$strLenBytes运算符计算每个name值的length：

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "length": { $strLenBytes: "$name" }
      }
    }
  ]
)

该操作返回以下结果：

{ "_id" : 1, "name" : "apple", "length" : 5 }
{ "_id" : 2, "name" : "banana", "length" : 6 }
{ "_id" : 3, "name" : "éclair", "length" : 7 }
{ "_id" : 4, "name" : "hamburger", "length" : 9 }
{ "_id" : 5, "name" : "jalapeño", "length" : 9 }
{ "_id" : 6, "name" : "pizza", "length" : 5 }
{ "_id" : 7, "name" : "tacos", "length" : 5 }
{ "_id" : 8, "name" : "寿司", "length" : 6 }

带有_id: 3和_id: 5的文档每个都包含一个变音符号(分别为é和ñ)，需要两个字节进行编码。带有_id: 8的文档包含两个日语字符，每个日语字符使用三个字节进行编码。对于具有_id: 3，_id: 5和_id: 8的文档，这使length大于name中的字符数。

Mongodb 中文文档

$strLenBytes (aggregation)

Definition

Behavior

Example

单字节和多字节字符集