» Terraform Enterprise Active/Active

When your organization requires increased reliability or performance from Terraform Enterprise that your current single application instance cannot provide, it is time to scale to the Active/Active architecture. Please refer to Terraform Enterprise Active/Active for more information on moving to Active/Active.

» Active/Active Admin Commands

The Active/Active operational mode disables use of the Replicated Admin Console. To accommodate this, there are now admin commands to facilitate configuration changes, safe application stops, and producing support bundles. There is a new container in the TFE architecture to support this called "tfe-admin". You must login to a node in the Active/Active cluster using SSH to run these commands from the command line.

» Commands

Note that tfe-admin is an alias for replicated admin, and can be used interchangeably.

» support-bundle

tfe-admin support-bundle

This command generates a support bundle for all nodes. The support bundles will be uploaded to the same object store bucket that is used to store Terraform state files. The support bundles for a specific run of the admin command will all be uploaded to a directory with the same JobID (a timestamp in RFC3339 format). If you are sending a support bundle to HashiCorp Support, package and send all associated bundles to ensure that we have all the necessary information.

Example upload structure

support-bundles
└── 2020-11-10T02:03:05Z
    ├── 10.0.0.5
    │   └── replicated-support702524260.tar.gz
    └── 10.0.0.6
        └── replicated-support577188727.tar.gz

» node-drain

tfe-admin node-drain

This command will quiesce the current node and remove it from service. It will allow current work to complete and safely stop the node from picking up any new jobs from the Redis queue, allowing the application to be safely stopped. Currently, it only affects localhost (it does not support running on one node to drain other nodes). Currently, there is no reverse drain command - a restart is needed to restore the node.

» app-config

tfe-admin app-config -k <KEY> -v <VALUE>

This command allows you to use the CLI to make real-time application changes, such as capacity_concurrency. You must provide both an allowable <KEY> (setting name) and <VALUE> (new setting value). Run replicatedctl app-config export for a complete list of the current app-config settings.

For the configuration changes to take effect, you must restart the Terraform Enterprise application on each node instance. To restart Terraform Enterprise: 1. Run replicatedctl app stop to stop the application. 2. Run replicatedctl app status to confirm the application is stopped. 3. Run replicatedctl app start to start the application.

# shortcut: tfe-app-config <KEY> <VALUE>
tfe-app-config ()
{
        tfe-admin app-config -k "$1" -v "$2"
}

» list-nodes

tfe-admin list-nodes

This command lists the IP addresses of all active nodes in the active/active installation. Nodes send a heartbeat every 5 seconds to signal that they are active. If Terraform Enterprise does not receive a heartbeat from a node within 30 seconds, it considers the node inactive and removes the node from the list.

» Other Supporting Commands

There are additional commands available for checking status and troubleshooting directly on nodes. You can use them to confirm successful installation or to check on the status of a running node as part of troubleshooting activities. Also, there are additional command aliases available that allow you to run more abbreviated versions of commands like just support-bundle. Run an alias command with no parameters to see the list of available command aliases.

» Commands

» health-check

tfe-admin health-check
(alias health-check)

This command tests and reports on the status of the major TFE services. Each will be listed as PASS or FAIL. If any are marked as FAIL, your TFE implementation is NOT healthy and additional action must be taken.

» replicated status

replicatedctl system status

Displays status info on the Replicated sub-system. Key values to note are that status values return as "ready". This reports on the status of the system on the node instance that it is run on.

» tfe application status

replicatedctl app status

Displays status info on the TFE application. Key values to note are that State and DesiredState are both "started" and IsTransitioning is false. This reports on the status of the application on the node instance that it is run on.

» Upgrading TFE or Patching TFE Node Instances

The mechanism used to upgrade the TFE node instances is to fully repave the instances (destroy and rebuild entirely). This is another reason why using automation to build the instances is important. Currently, the safest way to perform and upgrade is to shut down all node instances, rebuild one node to validate a successful upgrade, and then scale to additional nodes (currently max 2).

These are the steps required to repave the node instances:

  • Run the node-drain command as described previously on each node to complete active work and stop new work from being processed.
  • Update the instance build configuration such as setting a new ReleaseSequence to upgrade versions and/or make any other alterations such as patching the base image used for building the instances.
  • Follow the instructions in Terraform Enterprise Active/Active to scale down to zero nodes and proceed through scaling up to one node, validating success, and then scaling additional nodes.

If planned and orchestrated efficiently, the total downtime for the repaving will be the amount of time it has taken to build one node as processing will resume as soon as the first node is functional.