用清漆进行负载平衡

Varnish是功能强大的 HTTP 负载平衡器(反向代理),在缓存方面也非常出色。当运行多个 TSD 时,Varnish 可方便地在 TSD 上分配 HTTP 流量。请记住,默认情况下,写入流量不会使用 HTTP 协议,因此,您只能将 Varnish 用于读取查询。使用 Varnish 将帮助您轻松扩展 TSD 群集的读取容量。

以下是建议与 OpenTSDB 一起使用的示例 Varnish 配置。它使用稍微自定义的负载平衡策略来在 TSD 级别上实现最佳的缓存命中率。此配置至少需要运行 Varnish 2.1.0,但强烈建议使用 Varnish 3.0 或更高版本。

此示例配置适用于两个名为foobar的后端。您至少需要替换主机名。

# VCL configuration for OpenTSDB.

backend foo {
    .host = "foo";
    .port = "4242";
    .probe = {
        .url = "/version";
        .interval = 30s;
        .timeout = 10s;
        .window = 5;
        .threshold = 3;
    }
}

backend bar {
    .host = "bar";
    .port = "4242";
    .probe = {
        .url = "/version";
        .interval = 30s;
        .timeout = 10s;
        .window = 5;
        .threshold = 3;
    }
}

# The `client' director will select a backend based on `client.identity'.
# It's normally used to implement session stickiness but here we abuse it
# to try to send pairs of requests to the same TSD, in order to achieve a
# higher cache hit rate.  The UI sends queries first with a "&json" at the
# end, in order to get meta-data back about the results, and then it sends
# the same query again with "&png".  If the second query goes to a different
# TSD, then that TSD will have to fetch the data from HBase again.  Whereas
# if it goes to the same TSD that served the "&json" query, it'll hit the
# cache of that TSD and produce the PNG directly without using HBase.
#
# Note that we cannot use the `hash' director here, because otherwise Varnish
# would hash both the "&json" and the "&png" requests identically, and it
# would thus serve a cached JSON response to a "&png" request.
director tsd client {
    { .backend = foo; .weight = 100; }
    { .backend = bar; .weight = 100; }
}

sub vcl_recv {
    set req.backend = tsd;
    # Make sure we hit the same backend based on the URL requested,
    # but ignore some parameters before hashing the URL.
    set client.identity = regsuball(req.url, "&(o|ignore|png|json|html|y2?range|y2?label|y2?log|key|nokey)\b(=[^&]*)?", "");
}

sub vcl_hash {
    # Remove the `ignore' parameter from the URL we hash, so that two
    # identical requests modulo that parameter will hit Varnish's cache.
    hash_data(regsuball(req.url, "&ignore\b(=[^&]*)?", ""));
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (hash);
}

在许多 Linux 发行版(包括 Debian 和 Ubuntu)上,您需要将上面的配置放在/etc/varnish/default.vcl中。我们还建议调整varnishd的命令行参数,以便在负担得起的情况下使用约 1GB 的内存支持的缓存。在 Debian/Ubuntu 系统上,这可以通过编辑/etc/default/varnish来确保将-s malloc,1G传递给varnishd来完成。

阅读有关 Varnish 的更多信息:

Note

如果您使用的是 Varnish 2.x(不建议这样做,因为我们强烈建议您迁移到 3.x),则必须替换每个函数调用hash_data(foo);以在上面的 VCL 配置中设置req.hash += foo;