$substrBytes (aggregation)

在本页面

Definition

$substrBytes
- 3.4 版的新功能。

返回字符串的子字符串。子字符串以字符串中指定的 UTF-8 字节索引(从零开始)处的字符开始，并持续指定的字节数。

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

Field	Type	Description
`string expression`	string	从中提取子字符串的字符串。 `string expression`可以是任何有效的expression，只要它可以解析为字符串即可。有关表达式的更多信息，请参见Expressions。

如果参数解析为值null或引用了缺少的字段，则$substrBytes返回空字符串。
如果参数不解析为字符串或null，也没有引用缺少的字段，则$substrBytes返回错误。
| byte index | number |表示子字符串的起点。 byte index可以是任何有效的expression，只要它可以解析为非负整数或可以表示为整数的数字(例如 2.0)即可。
byte index不能引用位于多字节 UTF-8 字符中间的起始索引。
| byte count | number |可以是任何有效的expression，只要它解析为非负整数或可以表示为整数的数字(例如 2.0)即可。
byte count不能导致结尾索引位于 UTF-8 字符的中间。

Behavior

$substrBytes运算符使用 UTF-8 编码字节的索引，其中每个代码点或字符都可以使用一到四个字节进行编码。

例如，US-ASCII 字符使用一个字节编码。带有变音符号的字符和其他拉丁字母字符(即英语字母之外的拉丁字符)使用两个字节进行编码。中文，日文和韩 Literals 符通常需要三个字节，而其他 Unicode 平面(表情符号，math 符号等)则需要四个字节。

注意string expression中的内容很重要，因为在 UTF-8 字符的中间提供byte index或byte count会导致错误。

$substrBytes与$substrCP的不同之处在于$substrBytes对每个字符的字节进行计数，而$substrCP对代码点或字符进行计数，而与字符使用多少字节无关。

Example	Results
`{ $substrBytes: [ "abcde", 1, 2 ] }`	`"bc"`
`{ $substrBytes: [ "Hello World!", 6, 5 ] }`	`"World"`
`{ $substrBytes: [ "cafétéria", 0, 5 ] }`	`"café"`
`{ $substrBytes: [ "cafétéria", 5, 4 ] }`	`"tér"`
`{ $substrBytes: [ "cafétéria", 7, 3 ] }`	消息错误：
`"Error: Invalid range, starting index is a UTF-8 continuation byte."`
`{ $substrBytes: [ "cafétéria", 3, 1 ] }`	错误消息： `"Error: Invalid range, ending index is in the middle of a UTF-8 character."`

Example

单字节字符集

考虑包含以下文档的inventory集合：

{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

以下操作使用$substrBytes运算符将quarter值(仅包含单字节 US-ASCII 字符)分隔为yearSubstring和quarterSubstring。 quarterSubstring字段表示在yearSubstring之后的指定byte index中的其余字符串。它是通过使用$strLenBytes从字符串的长度减去byte index来计算的。

db.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
          ]
        }
      }
    }
  ]
)

该操作返回以下结果：

{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

单字节和多字节字符集

名为food的集合包含以下文档：

{ "_id" : 1, "name" : "apple" }
{ "_id" : 2, "name" : "banana" }
{ "_id" : 3, "name" : "éclair" }
{ "_id" : 4, "name" : "hamburger" }
{ "_id" : 5, "name" : "jalapeño" }
{ "_id" : 6, "name" : "pizza" }
{ "_id" : 7, "name" : "tacos" }
{ "_id" : 8, "name" : "寿司 sushi" }

以下操作使用$substrBytes运算符从name值创建一个三个字节menuCode：

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "menuCode": { $substrBytes: [ "$name", 0, 3 ] }
      }
    }
  ]
)