/
Monitoring Service Check: es_backup

Welcome to the VSHN Knowledge Base

Monitoring Service Check: es_backup

Overview

Elasticsearch has integrated backup functionality to dump indexes into the file system. It keeps metadata within elasticsearch itself. This check verifies that the last backup has succeeded and is not too old, according to the metadata.

Technical Details

Implementation

Global base checkNo
Puppet Profiles using this checkprofile_elasticsearch
Production Levelequals host production_level

Check Plugin

Script / Plugin Namecheck_es_backup
Plugin Packagenagios-plugins-elasticsearch
CheckCommand Namees_backup
Source Upstream Linkhttps://git.vshn.net/vshn/nagios-plugins-elasticsearch
Documentation Linkhttps://git.vshn.net/vshn/nagios-plugins-elasticsearch/blob/master/check_es_backup

List of Variables

Icinga2 variableConfigured inDescription
es_backup_prefix
Prefix of the backup name (can be '*')
es_backup_repository
Repository of the backup
es_backup_warn
Warning threshold in seconds (last backup # seconds ago)
es_backup_crit
Critical threshold in seconds (last backup # seconds ago)

Troubleshooting

Check critical with GCS repository

This usually happens if a cluster is fully restarted. After a cluster restart the credentials have to be reloaded.

If the check is critical you can check if the access to the GCS repository is broken with:

# gcs_dev is the value of es_backup_repository in the check config
# replace "dev" with the stage that's throwing the error (dev,test,prod,qual,etc.)
curl localhost:9200/_snapshot/gcs_dev/*?pretty

If the repository can't be accessed it will return a json like this:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_exception",
        "reason" : "[gcs_dev] Unexpected exception when loading repository data"
      }
    ],
    "type" : "repository_exception",
    "reason" : "[gcs_dev] Unexpected exception when loading repository data",
    "caused_by" : {
      "type" : "storage_exception",
      "reason" : "401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/dev-es-es-snapshots/o/index-10?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
      "caused_by" : {
        "type" : "google_json_response_exception",
        "reason" : "401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/dev-es-es-snapshots/o/index-10?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object."
      }
    }
  },
  "status" : 500
}

# If the issue is about Anonymous access you can run following curl on one of the nodes
curl -XPOST localhost:9200/_nodes/reload_secure_settings | jq

# This will reload the credentials again

Related content