Elasticsearch for Time Series Analysis
Example: Monitoring CPU Usage
Let’s use the same example to monitor CPU usage with Elasticsearch.
Schema Setup:
PUT /cpu_usage
{
"mappings": {
"properties": {
"server_id": { "type": "keyword" },
"location": { "type": "keyword" },
"usage": { "type": "float" },
"timestamp": { "type": "date" }
}
}
}
Inserting Data:
POST /cpu_usage/_doc/1
{
"server_id": "server1",
"location": "us-west",
"usage": 55.3,
"timestamp": "2023-01-01T00:00:00Z"
}
POST /cpu_usage/_doc/2
{
"server_id": "server2",
"location": "us-east",
"usage": 47.6,
"timestamp": "2023-01-01T01:00:00Z"
}
POST /cpu_usage/_doc/3
{
"server_id": "server1",
"location": "us-west",
"usage": 60.1,
"timestamp": "2023-01-01T02:00:00Z"
}
Querying Data
Using Elasticsearch Query DSL:
POST /cpu_usage/_search
{
"size": 0,
"aggs": {
"cpu_usage_over_time": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "hour"
},
"aggs": {
"average_usage": {
"avg": {
"field": "usage"
}
},
"by_server": {
"terms": {
"field": "server_id"
},
"aggs": {
"average_usage_per_server": {
"avg": {
"field": "usage"
}
}
}
}
}
}
}
}
Output:
{
"aggregations": {
"cpu_usage_over_time": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 1,
"average_usage": {
"value": 55.3
},
"by_server": {
"buckets": [
{
"key": "server1",
"doc_count": 1,
"average_usage_per_server": {
"value": 55.3
}
}
]
}
},
{
"key_as_string": "2023-01-01T01:00:00.000Z",
"key": 1672534800000,
"doc_count": 1,
"average_usage": {
"value": 47.6
},
"by_server": {
"buckets": [
{
"key": "server2",
"doc_count": 1,
"average_usage_per_server": {
"value": 47.6
}
}
]
}
},
{
"key_as_string": "2023-01-01T02:00:00.000Z",
"key": 1672538400000,
"doc_count": 1,
"average_usage": {
"value": 60.1
},
"by_server": {
"buckets": [
{
"key": "server1",
"doc_count": 1,
"average_usage_per_server": {
"value": 60.1
}
}
]
}
}
]
}
}
}
InfluxDB vs Elasticsearch for Time Series Analysis
Time series analysis is a crucial component in many fields, from monitoring server performance to tracking financial markets. Two of the most popular databases for handling time series data are InfluxDB and Elasticsearch. Both have their strengths and weaknesses and understanding these can help you choose the right tool for your specific needs.
In this article, we will explore InfluxDB and Elasticsearch in detail, focusing on their capabilities for time series analysis, with examples and outputs to illustrate their usage.