»Telemetry

The Vault server process collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute in-memory.

To view the raw data, you must send a signal to the Vault process: on Unix-style operating systems, this is USR1 while on Windows it is BREAK. When the Vault process receives this signal it will dump the current telemetry information to the process's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Vault is doing.

Telemetry information can also be streamed directly from Vault to a range of metrics aggregation solutions as described in the telemetry Stanza documentation.

The following is an example telemetry dump snippet:

[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109189192.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108408240.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 780953.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 72954392.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.008 Mean: 0.027 Max: 0.183 Stddev: 0.024 Sum: 2.681 LastUpdated: 2017-12-19 20:37:59.848733035 +0000 UTC m=+10463.692105920
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.saveCheckpoint': Count: 4 Min: 0.021 Mean: 0.054 Max: 0.110 Stddev: 0.039 Sum: 0.217 LastUpdated: 2017-12-19 20:37:57.048458148 +0000 UTC m=+10460.891835029
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 73326136.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109195904.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108409568.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 786342.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.consul-': Count: 1 Sum: 0.013 LastUpdated: 2017-12-19 20:38:01.968471579 +0000 UTC m=+10465.811842067
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.consul-': Count: 1 Sum: 0.073 LastUpdated: 2017-12-19 20:38:01.968502743 +0000 UTC m=+10465.811873131
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.pki-': Count: 1 Sum: 0.070 LastUpdated: 2017-12-19 20:38:01.96867005 +0000 UTC m=+10465.812041936
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.auth-app-id-': Count: 1 Sum: 0.012 LastUpdated: 2017-12-19 20:38:01.969146401 +0000 UTC m=+10465.812516689
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.identity-': Count: 1 Sum: 0.063 LastUpdated: 2017-12-19 20:38:01.968029888 +0000 UTC m=+10465.811400276
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.database-': Count: 1 Sum: 0.066 LastUpdated: 2017-12-19 20:38:01.969394215 +0000 UTC m=+10465.812764603
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.barrier.get': Count: 16 Min: 0.010 Mean: 0.015 Max: 0.031 Stddev: 0.005 Sum: 0.237 LastUpdated: 2017-12-19 20:38:01.983268118 +0000 UTC m=+10465.826637008
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.006 Mean: 0.024 Max: 0.098 Stddev: 0.019 Sum: 2.386 LastUpdated: 2017-12-19 20:38:09.848158309 +0000 UTC m=+10473.691527099
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109189192.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108408240.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 780953.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 72954392.000[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.008 Mean: 0.027 Max: 0.183 Stddev: 0.024 Sum: 2.681 LastUpdated: 2017-12-19 20:37:59.848733035 +0000 UTC m=+10463.692105920[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.saveCheckpoint': Count: 4 Min: 0.021 Mean: 0.054 Max: 0.110 Stddev: 0.039 Sum: 0.217 LastUpdated: 2017-12-19 20:37:57.048458148 +0000 UTC m=+10460.891835029[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 73326136.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109195904.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108409568.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 786342.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.consul-': Count: 1 Sum: 0.013 LastUpdated: 2017-12-19 20:38:01.968471579 +0000 UTC m=+10465.811842067[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.consul-': Count: 1 Sum: 0.073 LastUpdated: 2017-12-19 20:38:01.968502743 +0000 UTC m=+10465.811873131[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.pki-': Count: 1 Sum: 0.070 LastUpdated: 2017-12-19 20:38:01.96867005 +0000 UTC m=+10465.812041936[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.auth-app-id-': Count: 1 Sum: 0.012 LastUpdated: 2017-12-19 20:38:01.969146401 +0000 UTC m=+10465.812516689[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.identity-': Count: 1 Sum: 0.063 LastUpdated: 2017-12-19 20:38:01.968029888 +0000 UTC m=+10465.811400276[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.database-': Count: 1 Sum: 0.066 LastUpdated: 2017-12-19 20:38:01.969394215 +0000 UTC m=+10465.812764603[2017-12-19 20:38:00 +0000 UTC][S] 'vault.barrier.get': Count: 16 Min: 0.010 Mean: 0.015 Max: 0.031 Stddev: 0.005 Sum: 0.237 LastUpdated: 2017-12-19 20:38:01.983268118 +0000 UTC m=+10465.826637008[2017-12-19 20:38:00 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.006 Mean: 0.024 Max: 0.098 Stddev: 0.019 Sum: 2.386 LastUpdated: 2017-12-19 20:38:09.848158309 +0000 UTC m=+10473.691527099

You'll note that log entries are prefixed with the metric type as follows:

  • [C] is a counter
  • [G] is a gauge
  • [S] is a summary

The following sections describe available Vault metrics. The metrics interval can be assumed to be 10 seconds when manually triggering metrics output using the above described signals. Some high-cardinality gauges, like vault.kv.secret.count, are emitted every 10 minutes, or at an interval configured in the telemetry stanza.

Some Vault metrics come with additional labels describing the measurement in more detail, such as the namespace in which an operation takes place, or the auth method used to create a token. In the in-memory telemetry, or other telemetry engines that do not support labels, this additional information is incorporated into the metric name. The metric name in the table below is followed by a list of labels supported, in the order in which they appear if flattened.

»Audit Metrics

These metrics relate to auditing.

MetricDescriptionUnitType
vault.audit.log_requestDuration of time taken by all audit log requests across all audit log devicesmssummary
vault.audit.log_responseDuration of time taken by audit log responses across all audit log devicesmssummary
vault.audit.log_request_failureNumber of audit log request failures. NOTE: This is a particularly important metric. Any non-zero value here indicates that there was a failure to make an audit log request to any of the configured audit log devices; when Vault cannot log to any of the configured audit log devices it ceases all user operations, and you should begin troubleshooting the audit log devices immediately if this metric continually increases.failurescounter
vault.audit.log_response_failureNumber of audit log response failures. NOTE: This is a particularly important metric. Any non-zero value here indicates that there was a failure to receive a response to a request made to one of the configured audit log devices; when Vault cannot log to any of the configured audit log devices it ceases all user operations, and you should begin troubleshooting the audit log devices immediately if this metric continually increases.failurescounter

NOTE: In addition, there are audit metrics for each enabled audit device represented as vault.audit.<type>.log_request. For example, if a file audit device is enabled, its metrics would be vault.audit.file.log_request and vault.audit.file.log_response .

»Core Metrics

These metrics represent operational aspects of the running Vault instance.

MetricDescriptionUnitType
vault.barrier.deleteDuration of time taken by DELETE operations at the barriermssummary
vault.barrier.getDuration of time taken by GET operations at the barriermssummary
vault.barrier.putDuration of time taken by PUT operations at the barriermssummary
vault.barrier.listDuration of time taken by LIST operations at the barriermssummary
vault.cache.hitNumber of times a value was retrieved from the LRU cache.cache hitcounter
vault.cache.missNumber of times a value was not in the LRU cache. The results in a read from the configured storage.cache misscounter
vault.cache.writeNumber of times a value was written to the LRU cache.cache writecounter
vault.cache.deleteNumber of times a value was deleted from the LRU cache. This does not count cache expirations.cache deletecounter
vault.core.activeHas value 1 when the vault node is active, and 0 when node is in standby.boolgauge
vault.core.activity.fragment_sizeNumber of entities or tokens (depending on the "type" label) observed by the local node.tokenscounter
vault.core.activity.segment_writeDuration of time taken writing activity log segments to storage.mssummary
vault.core.check_tokenDuration of time taken by token checks handled by Vault coremssummary
vault.core.fetch_acl_and_tokenDuration of time taken by ACL and corresponding token entry fetches handled by Vault coremssummary
vault.core.handle_requestDuration of time taken by requests handled by Vault coremssummary
vault.core.handle_login_requestDuration of time taken by login requests handled by Vault coremssummary
vault.core.leadership_setup_failedDuration of time taken by cluster leadership setup failures which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status.mssummary
vault.core.leadership_lostDuration of time taken by cluster leadership losses which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status.mssummary
vault.core.expiration_time_epochTime as epoch (seconds since Jan 1 1970) at which license will expire.smgauge
vault.core.mount_table.num_entriesNumber of mounts in a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)objectssummary
vault.core.mount_table.sizeSize of a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)objectssummary
vault.core.post_unsealDuration of time taken by post-unseal operations handled by Vault coremssummary
vault.core.pre_sealDuration of time taken by pre-seal operationsmssummary
vault.core.seal-with-requestDuration of time taken by requested seal operationsmssummary
vault.core.sealDuration of time taken by seal operationsmssummary
vault.core.seal-internalDuration of time taken by internal seal operationsmssummary
vault.core.step_downDuration of time taken by cluster leadership step downs. This should be monitored and alerted on for overall cluster leadership status.mssummary
vault.core.unsealDuration of time taken by unseal operationsmssummary
vault.core.unsealedHas value 1 when Vault is unsealed, and 0 when Vault is sealed.boolgauge
vault.metrics.collection (cluster,gauge)Time taken to collect usage gauges, labelled by gauge type.summary
vault.metrics.collection.interval (cluster,gauge)Current value of of usage gauge collection interval.summary
vault.metrics.collection.error (cluster,gauge)Errors while collection usage guages, labeled by gauge type.counter
vault.rollback.attempt.<mountpoint>Time taken to perform a rollback operation on the given mount point. The mount point name has its forward slashes / replaced by -. For example, a rollback operation on the auth/token backend would be reportes as vault.rollback.attempt.auth-token-.mssummary
vault.route.create.<mountpoint>Time taken to dispatch a create operation to a backend, and for that backend to process it. The mount point name has its forward slashes / replaced by -. For example, a create operation to ns1/secret/ would have corresponding metric vault.route.create.ns1-secret-. The number of samples of this metric, and the corresponding ones for other operations below, indicates how many operations were performed per mount point.mssummary
vault.route.delete.<mountpoint>Time taken to dispatch a delete operation to a backend, and for that backend to process it.mssummary
vault.route.list.<mountpoint>Time taken to dispatch a list operation to a backend, and for that backend to process it.mssummary
vault.route.read.<mountpoint>Time taken to dispatch a read operation to a backend, and for that backend to process it.mssummary
vault.route.rollback.<mountpoint>Time taken to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.mssummary

»Runtime Metrics

These metrics collect information from Vault's Go runtime, such as memory usage information.

MetricDescriptionUnitType
vault.runtime.alloc_bytesNumber of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value.bytesgauge
vault.runtime.free_countNumber of freed objectsobjectsgauge
vault.runtime.heap_objectsNumber of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.objectsgauge
vault.runtime.malloc_countCumulative count of allocated heap objectsobjectsgauge
vault.runtime.num_goroutinesNumber of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.goroutinesgauge
vault.runtime.sys_bytesNumber of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.bytesgauge
vault.runtime.total_gc_pause_nsThe total garbage collector pause time since Vault was last startednsgauge
vault.runtime.gc_pause_nsTotal duration of the last garbage collection runnssample
vault.runtime.total_gc_runsTotal number of garbage collection runs since Vault was last startedoperationsgauge

»Policy Metrics

These metrics report measurements of the time spent performing policy operations.

MetricDescriptionUnitType
vault.policy.get_policyTime taken to get a policymssummary
vault.policy.list_policiesTime taken to list policiesmssummary
vault.policy.delete_policyTime taken to delete a policymssummary
vault.policy.set_policyTime taken to set a policymssummary

»Token, Identity, and Lease Metrics

These metrics cover measurement of token, identity, and lease operations, and counts of the number of such objects managed by Vault.

MetricDescriptionUnitType
vault.expire.fetch-lease-timesTime taken to fetch lease timesmssummary
vault.expire.fetch-lease-times-by-tokenTime taken to fetch lease times by tokenmssummary
vault.expire.num_leasesNumber of all leases which are eligible for eventual expiryleasesgauge
vault.expire.num_irrevocable_leasesNumber of leases that cannot be revoked automaticallyleasesgauge
vault.expire.leases.by_expiration (cluster,gauge,expiring,namespace)Number of leases set to expire, grouped by a time interval. This time interval and total number of time intervals are configurable via lease_metrics_epsilon and num_lease_metrics_buckets in the telemetry stanza of a vault server configuration. The default values for these are 1hr and 168 respectively, so the metric will report the number of leases that will expire each hour from the current time to a week from the current time. One can additionally group lease expiration by namespace by setting add_lease_metrics_namespace_labels to true in the config file (default is false).leasesgauge
vault.expire.lease_expirationCount of lease expirationsleasescounter
vault.expire.job_manager.total_jobsTotal pending revocation jobsleasessample
vault.expire.job_manager.queue_lengthTotal pending revocation jobs by auth methodleasessample
vault.expire.lease_expirationCount of lease expirationsleasescounter
vault.expire.lease_expiration.time_in_queueTime taken for lease to get to the front of the revoke queuemssummary
vault.expire.lease_expiration.errorCount of lease expiration errorserrorscounter
vault.expire.revokeTime taken to revoke a tokenmssummary
vault.expire.revoke-forceTime taken to forcibly revoke a tokenmssummary
vault.expire.revoke-prefixTime taken to revoke tokens on a prefixmssummary
vault.expire.revoke-by-tokenTime taken to revoke all secrets issued with a given tokenmssummary
vault.expire.renewTime taken to renew a leasemssummary
vault.expire.renew-tokenTime taken to renew a token which does not need to invoke a logical backendmssummary
vault.expire.registerTime taken for register operationsmssummary
vault.expire.register-authTime taken for register authentication operations which create lease entries without lease IDmssummary
vault.identity.num_entitiesNumber of identity entities stored in Vaultentitiesgauge
vault.identity.entity.active.monthly (cluster, namespace)Number of distinct entities that created a token during the past month, per namespace. Only available if client count is enabled. Reported at the start of each month.entitiesgauge
vault.identity.entity.active.partial_month (cluster)Total number of distinct entities that created a token during the current month. Only available if client count is enabled. Reported periodically within each month.entitiesgauge
vault.identity.entity.active.reporting_period (cluster, namespace)Number of distinct entities that created a token in the past N months, as defined by the client count default reporting period. Only available if client count is enabled. Reported at the start of each month.entitiesgauge
vault.identity.entity.alias.count (cluster, namespace, auth_method, mount_point)Number of identity entities aliases stored in Vault, grouped by the auth mount that created them. This gauge is computed every 10 minutes.aliasesgauge
vault.identity.entity.count (cluster, namespace)Number of identity entities stored in Vault, grouped by namespace.entitiesgauge
vault.identity.entity.creation (cluster, namespace, auth_method, mount_point)Number of identity entities created, grouped by the auth mount that created them.entitiescounter
vault.identity.upsert_entity_txnTime taken to insert a new or modified entity into the in-memory database, and persist it to storage.mssummary
vault.identity.upsert_group_txnTime taken to insert a new or modified group into the in-memory database, and persist it to storage. This operation is performed on group membership changes.mssummary
vault.token.count (cluster, namespace)Number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.tokengauge
vault.token.count.by_auth (cluster, namespace, auth_method)Number of service tokens that were created by a particular auth method.tokensgauge
vault.token.count.by_policy (cluster, namespace, policy)Number of service tokens that have a particular policy attached. If a token has more than one policy, it is counted in each policy gauge.tokensgauge
vault.token.count.by_ttl (cluster, namespace, creation_ttl)Number of service tokens, grouped by the TTL range they were assigned at creation.tokensgauge
vault.token.createThe time taken to create a tokenmssummary
vault.token.create_rootNumber of created root tokens. Does not decrease on revocation.tokenscounter
vault.token.createAccessorThe time taken to create a token accessormssummary
vault.token.creation (cluster, namespace, auth_method, mount_point, creation_ttl, token_type)Number of service or batch tokens created.tokenscounter
vault.token.lookupThe time taken to look up a tokenmssummary
vault.token.revokeTime taken to revoke a tokenmssummary
vault.token.revoke-treeTime taken to revoke a token treemssummary
vault.token.storeTime taken to store an updated token entry without writing to the secondary indexmssummary

»Resource Quota Metrics

These metrics relate to rate limit and lease count quotas. Each metric comes with a label "name" identifying the specific quota.

MetricDescriptionUnitType
vault.quota.rate_limit.violationTotal number of rate limit quota violationsquotacounter
vault.quota.lease_count.violationTotal number of lease count quota violationsquotacounter
vault.quota.lease_count.maxTotal maximum amount of leases allowed by the lease count quotaleasegauge
vault.quota.lease_count.counterTotal current amount of leases generated by the lease count quotaleasegauge

»Merkle Tree and Write Ahead Log Metrics

These metrics relate to internal operations on Merkle Trees and Write Ahead Logs (WAL)

MetricDescriptionUnitType
vault.merkle.flushDirtyTime taken to flush any dirty pages to cold storagemssummary
vault.merkle.flushDirty.num_pagesNumber of pages flushedpagesgauge
vault.merkle.saveCheckpointTime taken to save the checkpointmssummary
vault.merkle.saveCheckpoint.num_dirtyNumber of dirty pages at checkpointpagesgauge
vault.wal.deleteWALsTime taken to delete a Write Ahead Log (WAL)mssummary
vault.wal.gc.deletedNumber of Write Ahead Logs (WAL) deleted during each garbage collection runWALgauge
vault.wal.gc.totalTotal Number of Write Ahead Logs (WAL) on diskWALgauge
vault.wal.loadWALTime taken to load a Write Ahead Log (WAL)mssummary
vault.wal.persistWALsTime taken to persist a Write Ahead Log (WAL)mssummary
vault.wal.flushReadyTime taken to flush a ready Write Ahead Log (WAL) to storagemssummary
vault.wal.flushReady.queue_lenSize of the write queue in the WAL systemWALsummary

»Replication Metrics

These metrics relate to Vault Enterprise Replication. The following metrics are not available in telemetry unless replication is in an unhealthy state: replication.fetchRemoteKeys, replication.merkleDiff, and replication.merkleSync.

MetricDescriptionUnitType
vault.logshipper.streamWALs.missing_guardNumber of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/foundmissing guardscounter
vault.logshipper.streamWALs.guard_foundNumber of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/foundfound guardscounter
vault.logshipper.streamWALs.scanned_entriesNumber of entries scanned in the buffer before the right one was found.scanned entriessummary
vault.logshipper.buffer.lengthCurrent length of the log shipper bufferbuffer entriesgauge
vault.logshipper.buffer.sizeCurrent size in bytes of the log shipper bufferbytesgauge
vault.logshipper.buffer.max_lengthMaximum length of the log shipper bufferbuffer entriesgauge
vault.logshipper.buffer.max_sizeMaximum size in bytes of the log shipper bufferbytesgauge
vault.replication.fetchRemoteKeysTime taken to fetch keys from a remote cluster participating in replication prior to Merkle Tree based delta generationmssummary
vault.replication.merkleDiffTime taken to perform a Merkle Tree based delta generation between the clusters participating in replicationmssummary
vault.replication.merkleSyncTime taken to perform a Merkle Tree based synchronization using the last delta generated between the clusters participating in replicationmssummary
vault.replication.merkle.commit_indexThe last committed index in the Merkle Tree.sequence numbergauge
vault.replication.wal.last_walThe index of the last WALsequence numbergauge
vault.replication.wal.last_dr_walThe index of the last DR WALsequence numbergauge
vault.replication.wal.last_performance_walThe index of the last Performance WALsequence numbergauge
vault.replication.fsm.last_remote_walThe index of the last remote WALsequence numbergauge
vault.replication.wal.gcTime taken to complete one run of the WAL garbage collection processmssummary
vault.replication.rpc.server.auth_requestDuration of time taken by auth requestmssummary
vault.replication.rpc.server.bootstrap_requestDuration of time taken by bootstrap requestmssummary
vault.replication.rpc.server.conflicting_pages_requestDuration of time taken by conflicting pages requestmssummary
vault.replication.rpc.server.echoDuration of time taken by echomssummary
vault.replication.rpc.server.forwarding_requestDuration of time taken by forwarding requestmssummary
vault.replication.rpc.server.guard_hash_requestDuration of time taken by guard hash requestmssummary
vault.replication.rpc.server.persist_alias_requestDuration of time taken by persist alias requestmssummary
vault.replication.rpc.server.persist_persona_requestDuration of time taken by persist persona requestmssummary
vault.replication.rpc.server.stream_wals_requestDuration of time taken by stream wals requestmssummary
vault.replication.rpc.server.sub_page_hashes_requestDuration of time taken by sub page hashes requestmssummary
vault.replication.rpc.server.sync_counter_requestDuration of time taken by sync counter requestmssummary
vault.replication.rpc.server.upsert_group_requestDuration of time taken by upsert group requestmssummary
vault.replication.rpc.client.conflicting_pagesDuration of time taken by client conflicting pages requestmssummary
vault.replication.rpc.client.fetch_keysDuration of time taken by client fetch keys requestmssummary
vault.replication.rpc.client.forwardDuration of time taken by client forward requestmssummary
vault.replication.rpc.client.guard_hashDuration of time taken by client guard hash requestmssummary
vault.replication.rpc.client.persist_aliasDuration of time taken bymssummary
vault.replication.rpc.client.register_authDuration of time taken by client register auth requestmssummary
vault.replication.rpc.client.register_leaseDuration of time taken by client register lease requestmssummary
vault.replication.rpc.client.stream_walsDuration of time taken by client smssummary
vault.replication.rpc.client.sub_page_hashesDuration of time taken by client sub page hashes requestmssummary
vault.replication.rpc.client.sync_counterDuration of time taken by client sync counter requestmssummary
vault.replication.rpc.client.upsert_groupDuration of time taken by client upstert group requestmssummary
vault.replication.rpc.client.wrap_in_cubbyholeDuration of time taken by client wrap in cubbyhole requestmssummary
vault.replication.rpc.dr.server.echoDuration of time taken by DR echo requestmssummary
vault.replication.rpc.dr.server.fetch_keys_requestDuration of time taken by DR fetch keys requestmssummary
vault.replication.rpc.standby.server.echoDuration of time taken by standby echo requestmssummary
vault.replication.rpc.standby.server.register_auth_requestDuration of time taken by standby register auth requestmssummary
vault.replication.rpc.standby.server.register_lease_requestDuration of time taken by standby register lease requestmssummary
vault.replication.rpc.standby.server.wrap_token_requestDuration of time taken by standby wrap token requestmssummary

»Secrets Engines Metrics

These metrics relate to the supported secrets engines.

MetricDescriptionUnitType
database.InitializeTime taken to initialize a database secret engine across all database secrets enginesmssummary
database.<name>.InitializeTime taken to initialize a database secret engine for the named database secrets engine <name>, for example: database.postgresql-prod.Initializemssummary
database.Initialize.errorNumber of database secrets engine initialization operation errors across all database secrets engineserrorscounter
database.<name>.Initialize.errorNumber of database secrets engine initialization operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.Initialize.errorerrorscounter
database.CloseTime taken to close a database secret engine across all database secrets enginesmssummary
database.<name>.CloseTime taken to close a database secret engine for the named database secrets engine <name>, for example: database.postgresql-prod.Closemssummary
database.Close.errorNumber of database secrets engine close operation errors across all database secrets engineserrorscounter
database.<name>.Close.errorNumber of database secrets engine close operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.Close.errorerrorscounter
database.CreateUserTime taken to create a user across all database secrets enginesmssummary
database.<name>.CreateUserTime taken to create a user for the named database secrets engine <name>mssummary
database.CreateUser.errorNumber of user creation operation errors across all database secrets engineserrorscounter
database.<name>.CreateUser.errorNumber of user creation operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.CreateUser.errorerrorscounter
database.RenewUserTime taken to renew a user across all database secrets enginesmssummary
database.<name>.RenewUserTime taken to renew a user for the named database secrets engine <name>, for example: database.postgresql-prod.RenewUsermssummary
database.RenewUser.errorNumber of user renewal operation errors across all database secrets engineserrorscounter
database.<name>.RenewUser.errorNumber of user renewal operations for the named database secrets engine <name>, for example: database.postgresql-prod.RenewUser.errorerrorscounter
database.RevokeUserTime taken to revoke a user across all database secrets enginesmssummary
database.<name>.RevokeUserTime taken to revoke a user for the named database secrets engine <name>, for example: database.postgresql-prod.RevokeUsermssummary
database.RevokeUser.errorNumber of user revocation operation errors across all database secrets engineserrorscounter
database.<name>.RevokeUser.errorNumber of user revocation operations for the named database secrets engine <name>, for example: database.postgresql-prod.RevokeUser.errorerrorscounter
vault.secret.kv.count (cluster, namespace, mount_point)Number of entries in each key-value secret engine.pathsgauge
vault.secret.lease.creation (cluster, namespace, secret_engine, mount_point, creation_ttl)Counts the number of leases created by secret engines.leasescounter

»Storage Backend Metrics

These metrics relate to the supported storage backends.

MetricDescriptionUnitType
vault.azure.putDuration of a PUT operation against the Azure storage backendmssummary
vault.azure.getDuration of a GET operation against the Azure storage backendmssummary
vault.azure.deleteDuration of a DELETE operation against the Azure storage backendmssummary
vault.azure.listDuration of a LIST operation against the Azure storage backendmssummary
vault.cassandra.putDuration of a PUT operation against the Cassandra storage backendmssummary
vault.cassandra.getDuration of a GET operation against the Cassandra storage backendmssummary
vault.cassandra.deleteDuration of a DELETE operation against the Cassandra storage backendmssummary
vault.cassandra.listDuration of a LIST operation against the Cassandra storage backendmssummary
vault.cockroachdb.putDuration of a PUT operation against the CockroachDB storage backendmssummary
vault.cockroachdb.getDuration of a GET operation against the CockroachDB storage backendmssummary
vault.cockroachdb.deleteDuration of a DELETE operation against the CockroachDB storage backendmssummary
vault.cockroachdb.listDuration of a LIST operation against the CockroachDB storage backendmssummary
vault.consul.putDuration of a PUT operation against the Consul storage backendmssummary
vault.consul.transactionDuration of a Txn operation against the Consul storage backendmssummary
vault.consul.getDuration of a GET operation against the Consul storage backendmssummary
vault.consul.deleteDuration of a DELETE operation against the Consul storage backendmssummary
vault.consul.listDuration of a LIST operation against the Consul storage backendmssummary
vault.couchdb.putDuration of a PUT operation against the CouchDB storage backendmssummary
vault.couchdb.getDuration of a GET operation against the CouchDB storage backendmssummary
vault.couchdb.deleteDuration of a DELETE operation against the CouchDB storage backendmssummary
vault.couchdb.listDuration of a LIST operation against the CouchDB storage backendmssummary
vault.dynamodb.putDuration of a PUT operation against the DynamoDB storage backendmssummary
vault.dynamodb.getDuration of a GET operation against the DynamoDB storage backendmssummary
vault.dynamodb.deleteDuration of a DELETE operation against the DynamoDB storage backendmssummary
vault.dynamodb.listDuration of a LIST operation against the DynamoDB storage backendmssummary
vault.etcd.putDuration of a PUT operation against the etcd storage backendmssummary
vault.etcd.getDuration of a GET operation against the etcd storage backendmssummary
vault.etcd.deleteDuration of a DELETE operation against the etcd storage backendmssummary
vault.etcd.listDuration of a LIST operation against the etcd storage backendmssummary
vault.gcs.putDuration of a PUT operation against the Google Cloud Storage storage backendmssummary
vault.gcs.getDuration of a GET operation against the Google Cloud Storage storage backendmssummary
vault.gcs.deleteDuration of a DELETE operation against the Google Cloud Storage storage backendmssummary
vault.gcs.listDuration of a LIST operation against the Google Cloud Storage storage backendmssummary
vault.gcs.lock.unlockDuration of an UNLOCK operation against the Google Cloud Storage storage backend in HA modemssummary
vault.gcs.lock.lockDuration of a LOCK operation against the Google Cloud Storage storage backend in HA modemssummary
vault.gcs.lock.valueDuration of a VALUE operation against the Google Cloud Storage storage backend in HA modemssummary
vault.mssql.putDuration of a PUT operation against the MS-SQL storage backendmssummary
vault.mssql.getDuration of a GET operation against the MS-SQL storage backendmssummary
vault.mssql.deleteDuration of a DELETE operation against the MS-SQL storage backendmssummary
vault.mssql.listDuration of a LIST operation against the MS-SQL storage backendmssummary
vault.mysql.putDuration of a PUT operation against the MySQL storage backendmssummary
vault.mysql.getDuration of a GET operation against the MySQL storage backendmssummary
vault.mysql.deleteDuration of a DELETE operation against the MySQL storage backendmssummary
vault.mysql.listDuration of a LIST operation against the MySQL storage backendmssummary
vault.postgres.putDuration of a PUT operation against the PostgreSQL storage backendmssummary
vault.postgres.getDuration of a GET operation against the PostgreSQL storage backendmssummary
vault.postgres.deleteDuration of a DELETE operation against the PostgreSQL storage backendmssummary
vault.postgres.listDuration of a LIST operation against the PostgreSQL storage backendmssummary
vault.s3.putDuration of a PUT operation against the Amazon S3 storage backendmssummary
vault.s3.getDuration of a GET operation against the Amazon S3 storage backendmssummary
vault.s3.deleteDuration of a DELETE operation against the Amazon S3 storage backendmssummary
vault.s3.listDuration of a LIST operation against the Amazon S3 storage backendmssummary
vault.spanner.putDuration of a PUT operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.getDuration of a GET operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.deleteDuration of a DELETE operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.listDuration of a LIST operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.lock.unlockDuration of an UNLOCK operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.spanner.lock.lockDuration of a LOCK operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.spanner.lock.valueDuration of a VALUE operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.swift.putDuration of a PUT operation against the Swift storage backendmssummary
vault.swift.getDuration of a GET operation against the Swift storage backendmssummary
vault.swift.deleteDuration of a DELETE operation against the Swift storage backendmssummary
vault.swift.listDuration of a LIST operation against the Swift storage backendmssummary
vault.zookeeper.putDuration of a PUT operation against the ZooKeeper storage backendmssummary
vault.zookeeper.getDuration of a GET operation against the ZooKeeper storage backendmssummary
vault.zookeeper.deleteDuration of a DELETE operation against the ZooKeeper storage backendmssummary
vault.zookeeper.listDuration of a LIST operation against the ZooKeeper storage backendmssummary

»Integrated Raft Storage Health

These metrics relate to raft based integrated storage.

MetricDescriptionUnitType
vault.raft.applyNumber of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers.raft transactions / intervalcounter
vault.raft.barrierNumber of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM.blocks / intervalcounter
vault.raft.candidate.electSelfTime to request for a vote from a peer.mssummary
vault.raft.commitNumLogsNumber of logs processed for application to the FSM in a single batch.logsgauge
vault.raft.commitTimeTime to commit a new entry to the Raft log on the leader.mstimer
vault.raft.compactLogsTime to trim the logs that are no longer needed.mssummary
vault.raft.deleteTime to delete file from raft's underlying storage.mssummary
vault.raft.delete_prefixTime to delete files under a prefix from raft's underlying storage.mssummary
vault.raft.fsm.applyNumber of logs committed since the last interval.commit logs / intervalsummary
vault.raft.fsm.applyBatchTime to apply batch of logs.mssummary
vault.raft.fsm.applyBatchNumNumber of logs applied in batch.mssummary
vault.raft.fsm.enqueueTime to enqueue a batch of logs for the FSM to apply.mstimer
vault.raft.fsm.restoreTime taken by the FSM to restore its state from a snapshot.mssummary
vault.raft.fsm.snapshotTime taken by the FSM to record the current state for the snapshot.mssummary
vault.raft.fsm.store_configTime to store the configuration.mssummary
vault.raft.getTime to retrieve file from raft's underlying storage.mssummary
vault.raft.leader.dispatchLogTime for the leader to write log entries to disk.mstimer
vault.raft.leader.dispatchNumLogsNumber of logs committed to disk in a batch.logsgauge
vault.raft.listTime to retrieve list of keys from raft's underlying storage.mssummary
vault.raft.peersNumber of peers in the raft cluster configuration.peersgauge
vault.raft.putTime to persist key in raft's underlying storage.mssummary
vault.raft.replication.appendEntries.logNumber of logs replicated to a node, to bring it up to speed with the leader's logs.logs appended / intervalcounter
vault.raft.replication.appendEntries.rpcTime taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s).mstimer
vault.raft.replication.heartbeatTime taken to invoke appendEntries on a peer, so that it doesn’t timeout on a periodic basis.mstimer
vault.raft.replication.installSnapshotTime taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.mstimer
vault.raft.restoreNumber of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state.operation invoked / intervalcounter
vault.raft.restoreUserSnapshotTime taken by the node to restore the FSM state from a user's snapshot.mstimer
vault.raft.rpc.appendEntriesTime taken to process an append entries RPC call from a node.mstimer
vault.raft.rpc.appendEntries.processLogsTime taken to process the outstanding log entries of a node.mstimer
vault.raft.rpc.appendEntries.storeLogsTime taken to add any outstanding logs for a node, since the last appendEntries was invoked.mstimer
vault.raft.rpc.installSnapshotTime taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.mstimer
vault.raft.rpc.processHeartbeatTime taken to process a heartbeat request.mstimer
vault.raft.rpc.requestVoteTime taken to complete requestVote RPC call.mssummary
vault.raft.snapshot.createTime taken to initialize the snapshot process.mstimer
vault.raft.snapshot.persistTime taken to dump the current snapshot taken by the node to the disk.mstimer
vault.raft.snapshot.takeSnapshotTotal time involved in taking the current snapshot (creating one and persisting it) by the node.mstimer
vault.raft.state.followerNumber of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election.follower state entered / intervalcounter
vault.raft.transition.heartbeat_timeoutNumber of times node has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader.timeouts / intervalcounter
vault.raft.transition.leader_lease_timeoutNumber of times quorum of nodes were not able to be contacted.contact failurescounter
vault.raft.verify_leaderNumber of times node checks whether it is still the leader or not.checks / intervalcounter
vault.raft-storage.deleteTime to insert log entry to delete path.mstimer
vault.raft-storage.getTime to retrieve value for path from FSM.mstimer
vault.raft-storage.putTime to insert log entry to persist path.mstimer
vault.raft-storage.listTime to list all entries under the prefix from the FSM.mstimer
vault.raft-storage.transactionTime to insert operations into a single log.mstimer
vault.raft-storage.entry_sizeThe total size of a Raft entry during log application in bytes.bytessample

»Integrated Raft Storage Leadership Changes

MetricDescriptionUnitType
vault.raft.leader.lastContactMeasures the time since the leader was last able to contact the follower nodes when checking its leader leasemssummary
vault.raft.state.candidateIncrements whenever raft server starts an electionElectionscounter
vault.raft.state.leaderIncrements whenever raft server becomes a leaderLeaderscounter

Why they're important: Normally, your raft cluster should have a stable leader. If there are frequent elections or leadership changes, it would likely indicate network issues between the raft nodes, or that the raft servers themselves are unable to keep up with the load.

What to look for: For a healthy cluster, you're looking for a lastContact lower than 200ms, leader > 0 and candidate == 0. Deviations from this might indicate flapping leadership.

»Integrated Raft Storage Automated Snapshots

These metrics related to the Enterprise feature Raft Automated Snapshots.

MetricDescriptionUnitType
vault.autosnapshots.total.snapshot.sizeFor storage_type=local, space on disk used by saved snapshotsbytesgauge
vault.autosnapshots.percent.maxspace.usedFor storage_type=local, percent used of maximum allocated spacepercentagegauge
vault.autosnapshots.save.errorsIncrements whenever an error occurs trying to save a snapshotn/acounter
vault.autosnapshots.save.durationMeasures the time taken saving a snapshotmssummary
vault.autosnapshots.last.success.timeEpoch time (seconds since 1970/01/01) of last successful snapshot saven/agauge
vault.autosnapshots.snapshot.sizeMeasures the size in bytes of snapshotsbytessummary
vault.autosnapshots.rotate.durationMeasures the time taken to rotate (i.e. delete) old snapshots to satisfy configured retentionmssummary
vault.autosnapshots.snapshots.in.storageNumber of snapshots in storagen/agauge

»Metric Labels

MetricDescriptionExample
auth_methodAuthorization engine type .userpass
clusterThe cluster name from which the metric originated; set in the configuration file, or automatically generated when a cluster is createvault-cluster-d54ad07
creation_ttlTime-to-live value assigned to a token or lease at creation. This value is rounded up to the next-highest bucket; the available buckets are 1m, 10m, 20m, 1h, 2h, 1d, 2d, 7d, and 30d. Any longer TTL is assigned the value +Inf.7d
mount_pointPath at which an auth method or secret engine is mounted.auth/userpass/
namespaceA namespace path, or root for the root namespacens1
policyA single named policydefault
secret_engineThe [secret engine][secrets-engine] type.aws
token_typeIdentifies whether the token is a batch token or a service token.service
peer_idUnique identifier of a peer.node-1
snapshot_config_nameFor automated snapshots, the name of the configurationconfig1