$graphLookup(聚合)

在本页面

更改了 version 3.4.

定义

  • $graphLookup

    • 对集合执行递归搜索,其中包含通过递归深度和查询过滤器限制搜索的选项。

$graphLookup search process 总结如下:

  • 输入文档流入聚合操作的$graphLookup阶段。

  • $graphLookup将搜索目标定位到from参数指定的集合(有关搜索参数的完整列表,请参见下文)。

  • 对于每个输入文档,搜索以startWith指定的 value 开头。

  • $graphLookupstartWith value 与from集合中其他文档中connectToField指定的字段相匹配。

  • 对于每个匹配的文档,$graphLookup获取connectFromField的 value 并检查from集合中的每个文档以查找匹配的connectToField value。对于每个 match,$graphLookupfrom集合中的匹配文档添加到as参数指定的 array 字段。

此 step 以递归方式继续,直到找不到更多匹配的文档,或者直到操作达到maxDepth参数指定的递归深度。 $graphLookup然后将 array 字段附加到输入文档。 $graphLookup在完成对所有输入文档的搜索后返回结果。

$graphLookup具有以下原型形式:

{
   $graphLookup: {
      from: <collection>,
      startWith: <expression>,
      connectFromField: <string>,
      connectToField: <string>,
      as: <string>,
      maxDepth: <number>,
      depthField: <string>,
      restrictSearchWithMatch: <document>
   }
}

$graphLookup采用包含以下字段的文档:

领域描述
from要搜索的$graphLookup操作的目标集合,递归地将connectFromFieldconnectToField匹配。 from集合不能分片,并且必须与操作中使用的任何其他集合位于同一数据库中。有关信息,请参阅Sharded Collections
startWith表达指定connectFromField的 value,用于启动递归搜索。可选地,startWith可以是 array 的值,每个值分别遍历遍历 process。
connectFromField字段 name,其 value $graphLookup用于递归匹配集合中其他文档的connectToField。如果 value 是 array,则每个元素都单独遍历遍历 process。
connectToField其他文档中的字段 name,用于匹配connectFromField参数指定的字段的 value。
as_array 字段的名称添加到每个输出文档。包含在$graphLookup阶段中遍历的文档以到达文档。
注意

as字段中返回的文档不保证在任何 order 中。
maxDepth可选的。 Non-negative 整数,指定最大递归深度。
depthField可选的。 要添加到搜索路径中每个遍历文档的字段的名称。此字段的 value 是文档的递归深度,表示为NumberLong。递归深度 value 从零开始,因此第一个查找对应于零深度。
restrictSearchWithMatch可选的。指定递归搜索的附加条件的文档。语法与查询过滤器语法相同。
注意
您不能在此过滤器中使用任何聚合表达式。对于 example,查询文档如
注意
您不能在此过滤器中使用任何聚合表达。对于 example,查询文档(如
{ lastName: { $ne: "$lastName" } }
)将无法在此 context 中查找lastName value 与输入文档的lastName value 不同的文档,因为"$lastName"将充当 string 文字,而不是字段路径。

注意事项

Sharded Collections

from中指定的集合不能是分片。但是,可以对运行aggregate()方法的集合进行分片。也就是说,在以下内容中:

db.collection.aggregate([
   { $graphLookup: { from: "fromCollection", ... } }
])
  • collection可以分片。

  • fromCollection无法分片。

要加入多个分片集合,请考虑:

  • 修改 client applications 以执行手动查找,而不是使用$graphLookup聚合阶段。

  • 如果可能,使用嵌入数据 model消除加入集合的需要。

最大深度

maxDepth字段设置为0等同于 non-recursive $graphLookup搜索阶段。

记忆

$graphLookup阶段必须保持在 100 兆字节的 memory 限制内。如果为aggregate()操作指定了allowDiskUse: true$graphLookup阶段将忽略该选项。如果aggregate()操作中还有其他阶段,则allowDiskUse: true选项对这些其他阶段有效。

有关更多信息,请参见聚合管道限制

观点和整理

如果执行涉及多个视图的聚合(例如$lookup$graphLookup),则视图必须具有相同的整理

例子

在单一收藏中

名为employees的集合包含以下文档:

{ "_id" : 1, "name" : "Dev" }
{ "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" }
{ "_id" : 3, "name" : "Ron", "reportsTo" : "Eliot" }
{ "_id" : 4, "name" : "Andrew", "reportsTo" : "Eliot" }
{ "_id" : 5, "name" : "Asya", "reportsTo" : "Ron" }
{ "_id" : 6, "name" : "Dan", "reportsTo" : "Andrew" }

以下$graphLookup操作以递归方式匹配employees集合中的reportsToname字段,返回每个人的报告层次结构:

db.employees.aggregate( [
   {
      $graphLookup: {
         from: "employees",
         startWith: "$reportsTo",
         connectFromField: "reportsTo",
         connectToField: "name",
         as: "reportingHierarchy"
      }
   }
] )

该操作返回以下内容:

{
   "_id" : 1,
   "name" : "Dev",
   "reportingHierarchy" : [ ]
}
{
   "_id" : 2,
   "name" : "Eliot",
   "reportsTo" : "Dev",
   "reportingHierarchy" : [
      { "_id" : 1, "name" : "Dev" }
   ]
}
{
   "_id" : 3,
   "name" : "Ron",
   "reportsTo" : "Eliot",
   "reportingHierarchy" : [
      { "_id" : 1, "name" : "Dev" },
      { "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" }
   ]
}
{
   "_id" : 4,
   "name" : "Andrew",
   "reportsTo" : "Eliot",
   "reportingHierarchy" : [
      { "_id" : 1, "name" : "Dev" },
      { "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" }
   ]
}
{
   "_id" : 5,
   "name" : "Asya",
   "reportsTo" : "Ron",
   "reportingHierarchy" : [
      { "_id" : 1, "name" : "Dev" },
      { "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" },
      { "_id" : 3, "name" : "Ron", "reportsTo" : "Eliot" }
   ]
}
{
   "_id" : 6,
   "name" : "Dan",
   "reportsTo" : "Andrew",
   "reportingHierarchy" : [
      { "_id" : 1, "name" : "Dev" },
      { "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" },
      { "_id" : 4, "name" : "Andrew", "reportsTo" : "Eliot" }
   ]
}

以下 table 为文档{ "_id" : 5, "name" : "Asya", "reportsTo" : "Ron" }提供了遍历路径:

开始 value文档的reportsTo value:
{ ... "reportsTo" : "Ron" }
深度 0{ "_id" : 3, "name" : "Ron", "reportsTo" : "Eliot" }
深度 1{ "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" }
深度 2{ "_id" : 1, "name" : "Dev" }

输出生成层次结构Asya -> Ron -> Eliot -> Dev

跨多个集合

$lookup一样,$graphLookup可以访问同一数据库中的另一个集合。

在以下 example 中,数据库包含两个集合:

  • 包含以下文档的集合airports
{ "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] }
{ "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] }
{ "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] }
{ "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] }
{ "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
  • 包含以下文档的集合travelers
{ "_id" : 1, "name" : "Dev", "nearestAirport" : "JFK" }
{ "_id" : 2, "name" : "Eliot", "nearestAirport" : "JFK" }
{ "_id" : 3, "name" : "Jeff", "nearestAirport" : "BOS" }

对于travelers集合中的每个文档,以下聚合操作会在airports集合中查找nearestAirport value,并递归地将connects字段与airport字段匹配。该操作指定2的最大递归深度。

db.travelers.aggregate( [
   {
      $graphLookup: {
         from: "airports",
         startWith: "$nearestAirport",
         connectFromField: "connects",
         connectToField: "airport",
         maxDepth: 2,
         depthField: "numConnections",
         as: "destinations"
      }
   }
] )

该操作返回以下结果:

{
   "_id" : 1,
   "name" : "Dev",
   "nearestAirport" : "JFK",
   "destinations" : [
      { "_id" : 3,
        "airport" : "PWM",
        "connects" : [ "BOS", "LHR" ],
        "numConnections" : NumberLong(2) },
      { "_id" : 2,
        "airport" : "ORD",
        "connects" : [ "JFK" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 1,
        "airport" : "BOS",
        "connects" : [ "JFK", "PWM" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 0,
        "airport" : "JFK",
        "connects" : [ "BOS", "ORD" ],
        "numConnections" : NumberLong(0) }
   ]
}
{
   "_id" : 2,
   "name" : "Eliot",
   "nearestAirport" : "JFK",
   "destinations" : [
      { "_id" : 3,
        "airport" : "PWM",
        "connects" : [ "BOS", "LHR" ],
        "numConnections" : NumberLong(2) },
      { "_id" : 2,
        "airport" : "ORD",
        "connects" : [ "JFK" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 1,
        "airport" : "BOS",
        "connects" : [ "JFK", "PWM" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 0,
        "airport" : "JFK",
        "connects" : [ "BOS", "ORD" ],
        "numConnections" : NumberLong(0) } ]
}
{
   "_id" : 3,
   "name" : "Jeff",
   "nearestAirport" : "BOS",
   "destinations" : [
      { "_id" : 2,
        "airport" : "ORD",
        "connects" : [ "JFK" ],
        "numConnections" : NumberLong(2) },
      { "_id" : 3,
        "airport" : "PWM",
        "connects" : [ "BOS", "LHR" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 4,
        "airport" : "LHR",
        "connects" : [ "PWM" ],
        "numConnections" : NumberLong(2) },
      { "_id" : 0,
        "airport" : "JFK",
        "connects" : [ "BOS", "ORD" ],
        "numConnections" : NumberLong(1) },
      { "_id" : 1,
        "airport" : "BOS",
        "connects" : [ "JFK", "PWM" ],
        "numConnections" : NumberLong(0) }
   ]
}

以下 table 为递归搜索提供了遍历路径,最深为2,其中起始airportJFK

开始 value来自travelers集合的nearestAirport value:
{ ... "nearestAirport" : "JFK" }
深度 0{ "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] }
深度 1{“_id”:1,“机场”:“BOS”,“连接”:[185]}
{“_id”:2,“机场”:“ORD”,“连接”:[187]}
深度 2{ "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] }

使用查询过滤器

以下 example 使用一个包含一组文档的集合,这些文档包含人物名称以及他们的朋友和他们的爱好的数组。聚合操作找到一个特定的人并遍历她的连接网络,以找到在他们的爱好中列出golf的人。

名为people的集合包含以下文档:

{
  "_id" : 1,
  "name" : "Tanya Jordan",
  "friends" : [ "Shirley Soto", "Terry Hawkins", "Carole Hale" ],
  "hobbies" : [ "tennis", "unicycling", "golf" ]
}
{
  "_id" : 2,
  "name" : "Carole Hale",
  "friends" : [ "Joseph Dennis", "Tanya Jordan", "Terry Hawkins" ],
  "hobbies" : [ "archery", "golf", "woodworking" ]
}
{
  "_id" : 3,
  "name" : "Terry Hawkins",
  "friends" : [ "Tanya Jordan", "Carole Hale", "Angelo Ward" ],
  "hobbies" : [ "knitting", "frisbee" ]
}
{
  "_id" : 4,
  "name" : "Joseph Dennis",
  "friends" : [ "Angelo Ward", "Carole Hale" ],
  "hobbies" : [ "tennis", "golf", "topiary" ]
}
{
  "_id" : 5,
  "name" : "Angelo Ward",
  "friends" : [ "Terry Hawkins", "Shirley Soto", "Joseph Dennis" ],
  "hobbies" : [ "travel", "ceramics", "golf" ]
}
{
   "_id" : 6,
   "name" : "Shirley Soto",
   "friends" : [ "Angelo Ward", "Tanya Jordan", "Carole Hale" ],
   "hobbies" : [ "frisbee", "set theory" ]
 }

以下聚合操作使用三个阶段:

  • $match匹配包含 string "Tanya Jordan"name字段的文档。返回一个输出文档。

  • $graphLookup将输出文档的friends字段与集合中其他文档的name字段连接,以遍历Tanya Jordan's连接网络。此阶段使用restrictSearchWithMatch参数仅查找hobbies array 包含golf的文档。返回一个输出文档。

  • $project对输出文档进行整形。 connections who play golf中列出的名称取自输入文档golfers array 中列出的文档的name字段。

db.people.aggregate( [
  { $match: { "name": "Tanya Jordan" } },
  { $graphLookup: {
      from: "people",
      startWith: "$friends",
      connectFromField: "friends",
      connectToField: "name",
      as: "golfers",
      restrictSearchWithMatch: { "hobbies" : "golf" }
    }
  },
  { $project: {
      "name": 1,
      "friends": 1,
      "connections who play golf": "$golfers.name"
    }
  }
] )

该操作返回以下文档:

{
   "_id" : 1,
   "name" : "Tanya Jordan",
   "friends" : [
      "Shirley Soto",
      "Terry Hawkins",
      "Carole Hale"
   ],
   "connections who play golf" : [
      "Joseph Dennis",
      "Tanya Jordan",
      "Angelo Ward",
      "Carole Hale"
   ]
}

额外资源

网络研讨会:在 MongoDB 中使用图形数据