$bucketAuto (aggregation)

在本页面

Definition

$bucketAuto
- 3.4 版的新功能。

根据指定的表达式将传入文档分类为特定数量的组，称为存储桶。会自动确定存储区边界，以尝试将文档平均分配到指定数量的存储区中。

每个存储段在输出中均表示为文档。每个存储桶的文档包含：

一个_id对象，它指定存储桶的边界。
_id.min字段指定存储桶的包含下限。
- _id.max字段指定存储桶的上限。此限制是除系列中的最后一个存储桶(包括该范围)之外的所有存储桶所独有的。
count字段，其中包含存储桶中的文档数。未指定output文档时，默认情况下会包含count字段。

$bucketAuto阶段具有以下形式：

{
  $bucketAuto: {
      groupBy: <expression>,
      buckets: <number>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
      }
      granularity: <string>
  }
}

Field	Type	Description
`groupBy`	expression	expression用于分组文档。要指定field path，请在字段名前加上美元符号`$`并将其括在引号中。
`buckets`	integer	一个正的 32 位整数，用于指定将 Importing 文档分组到的存储桶数。
`output`	document	可选的。一个文档，除了`_id`字段外，它还指定要包含在输出文档中的字段。要指定要包括的字段，您必须使用accumulator expressions：
<outputfield1>: { <accumulator>: <expression1> }, ... 指定`output`时，默认的`count`字段不包括在输出文档中。明确指定`count`表达式作为`output`文档的一部分以包含它： output: { <outputfield1>: { <accumulator>: <expression1> }, ... count: { $sum: 1 } }
`granularity`	string	可选。一个字符串，用于指定首选数字系列，以确保所计算的边界边缘以首选的整数或 10 的幂为结尾。仅当所有`groupBy`值均为数字且都不是`NaN`时可用。 `granularity`的支持值是：	`"R5"` `"R10"` `"R20"` `"R40"` `"R80"` `"1-2-5"`	`"E6"` `"E12"` `"E24"` `"E48"` `"E96"` `"E192"` `"POWERSOF2"`

Behavior

在以下情况下，可能少于指定数量的存储桶：

Importing 文件的数量少于指定的存储桶数量。
groupBy表达式的唯一值的数量小于指定的buckets的数量。
granularity的间隔少于buckets的间隔。
granularity不够好，无法将文档平均分配到指定数量的buckets中。

如果groupBy表达式引用数组或文档，则在确定存储段边界之前，使用与$sort中相同的 Sequences 排列值。

文档在存储桶中的平均分布取决于groupBy字段的基数或唯一值的数量。如果基数不够高，则$ bucketAuto 阶段可能无法在存储桶之间平均分配结果。

Granularity

$bucketAuto接受可选的granularity参数，该参数可确保所有存储桶的边界都遵守指定的首选数字系列。使用首选的数字系列可以更好地控制groupBy表达式中值范围内存储段边界的设置位置。当groupBy表达式的范围呈指数比例缩放时，它们还可用于对数帮助均匀地设置存储桶边界。

Renard Series

Renard 数列是通过取 10 的 5、10、20、40 或 80 的根求出的数字集，然后包括等于 1.0 到 10.0(10.3 R80)。

将granularity设置为R5，R10，R20，R40或R80以将存储段边界限制为该系列中的值。当groupBy值超出 1.0 到 10.0(对于R80为 10.3)范围时，该系列的值乘以 10 的幂。

Example

R5系列基于 10 的第五根(即 1.58)，并且包括该根的各种幂(四舍五入)，直到达到 10. R5系列的推导如下：

10 0/5 = 1
10 1/5 = 1.584 ~ 1.6
10 2/5 = 2.511 ~ 2.5
10 3/5 = 3.981 ~ 4.0
10 4/5 = 6.309 ~ 6.3
10 5/5 = 10

将相同的方法应用于其他 Renard 系列以提供更精细的粒度，即，在 1.0 和 10.0 之间(R80为 10.3)具有更大的间隔。

E Series

E 数序列与Renard series相似，因为它们将 1.0 到 10.0 的间隔除以 10 的 6、12、24、48、96 或 192 的 10 个根，并具有特定的相对误差。

将granularity设置为E6，E12，E24，E48，E96或E192以将存储段边界限制为该系列中的值。当groupBy值超出 1.0 到 10.0 范围时，该系列的值乘以 10 的幂。要了解有关 E 系列及其各自相对误差的更多信息，请参阅首选数字系列。

1-2-5 Series

如果存在1-2-5系列，则其行为类似于三值Renard series。

将granularity设置为1-2-5，以将存储段边界限制为 10 的第三根的各种幂，四舍五入到一个有效数字。

Example

以下值是1-2-5系列的一部分：0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000，依此类推…

两个系列的幂

将granularity设置为POWERSOF2，以将存储段边界限制为 2 的幂。

Example

以下数字遵循两个系列的功能：

2 0 = 1
2 1 = 2
2 2 = 4
2 3 = 8
2 4 = 16
2 5 = 32
等等…

常见的实现方式是各种计算机组件(例如内存)通常遵循POWERSOF2组首选数字：

1，2、4、8、16、32、64、128、256、512、1024、2048 等...。

比较不同的粒度

以下操作演示了为granularity指定不同的值如何影响$bucketAuto如何确定存储段边界。 things的集合的_id编号为 1 到 100：

{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }

granularity的不同值被替换为以下操作：

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <granularity>
    }
  }
] )

下表中的结果说明了granularity的不同值如何产生不同的存储段边界：

Granularity	Results	Notes
No granularity	{ "_id":{ "min":0, "max":20 }，“ count”：20}
{ "_id":{ "min":20, "max":40 }，“ count”：20} { "_id":{ "min":40, "max":60 }，“ count”：20} { "_id":{ "min":60, "max":80 }，“ count”：20} { "_id":{ "min":80, "max":99 }，“计数”：20}
R20	{ "_id":{ "min":0, "max":20 }，“ count”：20} { "_id":{ "min":20, "max":40 }，“ count”：20} { "_id":{ "min":40, "max":63 }，“ count”：23} { "_id":{ "min":63, "max":90 }，“ count”：27} { "_id":{ "min":90, "max":100 }，“ count”：10}
E24	{ "_id":{ "min":0, "max":20 }，“ count”：20} { "_id":{ "min":20, "max":43 }，“ count”：23} { "_id":{ "min":43, "max":68 }，“ count”：25} { "_id":{ "min":68, "max":91 }，“ count”：23} { "_id":{ "min":91, "max":100 }，“ count”：9}
1-2-5	{ "_id":{ "min":0, "max":20 }，“ count”：20} { "_id":{ "min":20, "max":50 }，“ count”：30} { "_id":{ "min":50, "max":100 }，“ count”：50}	指定的存储桶数超过了序列中的间隔数。
POWERSOF2	{ "_id":{ "min":0, "max":32 }，“ count”：32} { "_id":{ "min":32, "max":64 }，“ count”：32} { "_id":{ "min":64, "max":128 }，“ count”：36}	指定的存储桶数超过了序列中的间隔数。

Example

考虑包含以下文档的集合artwork：

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99"),
    "dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00"),
    "dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04"),
    "dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30"),
    "dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00"),
    "dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00"),
    "dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
    "price" : NumberDecimal("159.00"),
    "dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42"),
    "dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

单面聚合

在以下操作中，根据price字段中的值将 Importing 文档分为四个存储桶：

db.artwork.aggregate( [
   {
     $bucketAuto: {
         groupBy: "$price",
         buckets: 4
     }
   }
] )

该操作返回以下文档：

{
  "_id" : {
    "min" : NumberDecimal("76.04"),
    "max" : NumberDecimal("159.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("159.00"),
    "max" : NumberDecimal("199.99")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("199.99"),
    "max" : NumberDecimal("385.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("385.00"),
    "max" : NumberDecimal("483.00")
  },
  "count" : 2
}

Multi-Faceted Aggregation

$bucketAuto阶段可以在$facet阶段内使用，以处理来自artwork的同一组 Importing 文档上的多个聚合管道。

以下聚合管道基于price，year和计算出的area将来自artwork集合的文档分组为存储桶：

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucketAuto: {
            groupBy: "$price",
            buckets: 4
          }
        }
      ],
      "year": [
        {
          $bucketAuto: {
            groupBy: "$year",
            buckets: 3,
            output: {
              "count": { $sum: 1 },
              "years": { $push: "$year" }
            }
          }
        }
      ],
      "area": [
        {
          $bucketAuto: {
            groupBy: {
              $multiply: [ "$dimensions.height", "$dimensions.width" ]
            },
            buckets: 4,
            output: {
              "count": { $sum: 1 },
              "titles": { $push: "$title" }
            }
          }
        }
      ]
    }
  }
] )

该操作返回以下文档：

{
  "area" : [
    {
      "_id" : { "min" : 432, "max" : 500 },
      "count" : 3,
      "titles" : [
        "The Scream",
        "The Persistence of Memory",
        "Blue Flower"
      ]
    },
    {
      "_id" : { "min" : 500, "max" : 864 },
      "count" : 2,
      "titles" : [
        "Dancer",
        "The Pillars of Society"
      ]
    },
    {
      "_id" : { "min" : 864, "max" : 1568 },
      "count" : 2,
      "titles" : [
        "The Great Wave off Kanagawa",
        "Composition VII"
      ]
    },
    {
      "_id" : { "min" : 1568, "max" : 1568 },
      "count" : 1,
      "titles" : [
        "Melancholy III"
      ]
    }
  ],
  "price" : [
    {
      "_id" : { "min" : NumberDecimal("76.04"), "max" : NumberDecimal("159.00") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("159.00"), "max" : NumberDecimal("199.99") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("199.99"), "max" : NumberDecimal("385.00") },
      "count" : 2 },
    {
      "_id" : { "min" : NumberDecimal("385.00"), "max" : NumberDecimal("483.00") },
      "count" : 2
    }
  ],
  "year" : [
    { "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
    { "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
    { "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
  ]
}

Docs

Docs4dev

Title here

$bucketAuto (aggregation)

Definition

Behavior

Granularity

Renard Series

E Series

1-2-5 Series

两个系列的幂

比较不同的粒度

Example

单面聚合

Multi-Faceted Aggregation