When it comes to timeseries data, there are lots of terms tossed about that can lead to some confusion. This page is a sort of glossary that helps to define words related to the use of OpenTSDB.
Cardinality is a mathematical term defined as the number of elements in a set. In database lingo, it's often used to refer to the number of unique items in an index. With regards to OpenTSDB it can refer to:
- The number of unique time series for a given metric
- The number of unique tag values associated with a tag name
Due to the nature of the OpenTSDB storage schema, metrics with higher cardinality may take longer return results during query execution than those with lower cardinality. E.g. we may have metric
foo with the tag name
datacenter and there are 100 possible values for datacenter. Then we have metric
bar with the tag
host and 50,000 possible values for host. Metric
bar has a higher cardinality than
foo: 50,000 possible time series for
bar an only 100 for
An OpenTSDB compaction takes multiple columns in an HBase row and merges them into a single column to reduce disk space. This is not to be confused with HBase compactions where multiple edits to a region are merged into one. OpenTSDB compactions can occur periodically for a TSD after data has been written, or during a query.
Each of the metrics above can be recorded as a number at a specific time. For example, we could record that Sue worked 8 hours at the end of each day. Or that "mylogo.jpg" was downloaded 400 times in the past hour. Thus a datapoint consists of:
- A metric
- A numeric value
- A timestamp when the value was recorded
- One or more sets of tags
A metric is simply the name of a quantitative measurement. Metrics include things like:
- hours worked by an employee
- webserver downloads of a file
- snow accumulation in a region
Notice that the
metric did not include a specific number or a time. That is becaue a
metric is just a label of what you are measuring. The actual measurements are called
datapoints, as you'll see later.
Unfortunately OpenTSDB requires metrics to be named as a single, long word without spaces. Thus metrics are usually recorded using "dotted notation". For example, the metrics above would have names like:
A collection of two or more data points for a single metric and group of tag name/value pairs.
Timestamps are simply the absolute time when a value for a given metric was recorded.
A value represents the actual numeric measurement of the given metric. One of our employees, Sue, worked 8 hours yesterday, thus the value would be
8. There were 1,024 downloads of
logo.jpg from our webserver in the past hour. And 12 inches of snow fell in New England today.