/
Monitoring Service Check: es_backup
Welcome to the VSHN Knowledge Base
Monitoring Service Check: es_backup
Overview
Elasticsearch has integrated backup functionality to dump indexes into the file system. It keeps metadata within elasticsearch itself. This check verifies that the last backup has succeeded and is not too old, according to the metadata.
Technical Details
Implementation
Global base check | No |
---|---|
Puppet Profiles using this check | profile_elasticsearch |
Production Level | equals host production_level |
Check Plugin
Script / Plugin Name | check_es_backup |
---|---|
Plugin Package | nagios-plugins-elasticsearch |
CheckCommand Name | es_backup |
Source Upstream Link | https://git.vshn.net/vshn/nagios-plugins-elasticsearch |
Documentation Link | https://git.vshn.net/vshn/nagios-plugins-elasticsearch/blob/master/check_es_backup |
List of Variables
Icinga2 variable | Configured in | Description |
---|---|---|
es_backup_prefix | Prefix of the backup name (can be '*') | |
es_backup_repository | Repository of the backup | |
es_backup_warn | Warning threshold in seconds (last backup # seconds ago) | |
es_backup_crit | Critical threshold in seconds (last backup # seconds ago) |
Troubleshooting
Check critical with GCS repository
This usually happens if a cluster is fully restarted. After a cluster restart the credentials have to be reloaded.
If the check is critical you can check if the access to the GCS repository is broken with:
# gcs_dev is the value of es_backup_repository in the check config # replace "dev" with the stage that's throwing the error (dev,test,prod,qual,etc.) curl localhost:9200/_snapshot/gcs_dev/*?pretty If the repository can't be accessed it will return a json like this: { "error" : { "root_cause" : [ { "type" : "repository_exception", "reason" : "[gcs_dev] Unexpected exception when loading repository data" } ], "type" : "repository_exception", "reason" : "[gcs_dev] Unexpected exception when loading repository data", "caused_by" : { "type" : "storage_exception", "reason" : "401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/dev-es-es-snapshots/o/index-10?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.", "caused_by" : { "type" : "google_json_response_exception", "reason" : "401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/dev-es-es-snapshots/o/index-10?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object." } } }, "status" : 500 } # If the issue is about Anonymous access you can run following curl on one of the nodes curl -XPOST localhost:9200/_nodes/reload_secure_settings | jq # This will reload the credentials again
, multiple selections available,
Related content
Monitoring Service Check: es_version
Monitoring Service Check: es_version
More like this
Monitoring Service Check: es_write_lock
Monitoring Service Check: es_write_lock
More like this
Monitoring Service Check: es_documents_count
Monitoring Service Check: es_documents_count
More like this
Monitoring Service Check: es_http_check
Monitoring Service Check: es_http_check
More like this
Monitoring Service Check: es_shard_allocation
Monitoring Service Check: es_shard_allocation
More like this
Monitoring Service Check: restic
Monitoring Service Check: restic
More like this