/

/

Deploying a Cosmos validator

Cosmos Network

Deploying a Cosmos validator

Deploying a Cosmos validator

Deploying a Cosmos validator

0 min read

May 25, 2023

Purpose of this Guide

Running a validator can be attractive to anyone for various reasons — participating to the rise of cryptocurrencies, generating some extra income, or just playing with a new technology.

However this activity requires an array of skills and entails a number of risks that should be mitigated by applying some key principles. Blockchains working in real-time by definition, it is paramount to ensure a steady quality of service so as to avoid downtimes and slashing — because basically there is money involved, and often not only the node operator’s.

The below guide aims at teaching how to implement these principles in order to achieve the desired performance and reliability, without diving into too much technical details.

It is strongly advised to start by running a node on a testnet: as there is no actual money involved, it is appropriate for experimenting, learning and possibly crashing everything without any actual consequences.

Pay extra attention to the items in bold and to the warning signs ⚠.

If you have questions or some instructions are unclear, please feel free to join our Discord — we’ll be happy to assist.

1. Hardware Requirements

A node is, at a high level, a local copy of the blockchain that needs to be synchronized with the other nodes worldwide.

A validator node participates in the chain consensus, i.e. to the calculation of new blocks. As such, it cannot be late otherwise it would miss blocks and eventually be excluded: that’s a jail event, which usually comes with slashing, in which a tiny portion of its delegated funds is burned.

To avoid this and achieve an appropriate performance, a node should run on a server with key specifications:

CPU: typically 4 cores at least, sometimes 8 or more.
Memory: 8GB is the usual minimum, sometimes 16 or more are required.
Disk: a node writes and reads heavily, and the disk should be fast and large enough. NVMe storage is highly recommended (regular SSD are often too slow), with a capacity ranging from 200GB to several TB. This depends on the purpose of the node and the pruning options (i.e. the node would remove old blocks, or at the contrary retain everything).
Whatever the selected option, the disk usage will gradually increase.
Bandwidth: by nature, blockchains make a heavy use of internet, and a fast connection is necessary (download AND upload). Typically a fiber optics liaison is required — you can’t run a node on ADSL for example, or on a high latency connection.

⚠ Check the required specs in the documentation, Discord or Telegram.

The fastest option is to rent a cloud VM — there are many providers to chose from — , which is usually scalable and very quick to deploy. Dedicated servers are another option, or even a home server (which is actually the best solution from the decentralization perspective, with the caveat that the reliability may be lower due to internet or power outages with no backups).

2. Software

Nodes typically run on Linux, and the below instructions are specifically meant for Debian-based systems like Ubuntu. The commands should be adjusted for other distributions.

2.1 Logging into the server — Option 1 ( ⚠ not ideal)

A preferred option will be described in Section 3, after having created a new user with limited privileges.

It is highly recommended to use a SSH key to authenticate instead of a password that could be brute-forced.

This method explains how to achieve this.

The below instructions are extremely simplified. For more details you can check this page, or this one if you use Windows and PuTTY.

On the computer that you will use to log into the node server — assuming it runs Linux — , type the following:

$ ssh-keygen -t ed25519

Hit Enter on the next prompts — set up your email, passphrase, etc. or not as you see fit.

Replacing server_ip with the actual IP.

This will add the SSH public key to the /root/.ssh/authorized_keys file on the node server, allowing you to log in without having to type a password (if you specified a passphrase, you will need to enter it however).

Verify that all works fine by logging in as root:

No password should be requested, meaning that the ssh key authentication works.

If so, we’ll then disable the password login for user root on this server.
⚠⚠ after the below command you will not be able to log into the server as root anymore.

⚠ One important detail is that if you misplace or overwrite your id_ed25519.pub, you’ll be locked out of the server. And if someone else has it and know its IP, they’ll have full access to said server.

You can also disable the password authentication altogether, so that anyone trying to log in will be shut down.
As root, edit /etc/ssh/sshd_config and set the line PasswordAuthentication to “no”.

2.2 Installing the required packages

The node needs some specific software. As root, run:

Most distributions ship with all the other important packages preinstalled. If some are actually missing, the errors are usually quite explicit so you’ll just need to install them.

It also requires Go.
⚠ The Go version depends on the chains, and this needs to be verified to select the correct one.

See here how to download and install.

3. Creating a User

You should not run the node as root. There are multiple reasons for that, but in short: don’t.
Instead, let us create a user with limited privileges that will run this node.
Let’s call this user validator.


Set a password, preferably a strong one. This is the password you will need to type when requesting elevated privileges later (i.e., to become root).

The above allows the validator user to access the go package that we installed above, necessary to install and run the node.

The cd command, which is executed when switching to the user, allows to navigate directly to the user's home directory.

You will now be the validator user, in its home at /home/validator/, and can proceed with the install. The shell prompt should look like this:

When logged in with this restricted user, you cannot install or uninstall software through apt or dpkg, modify system files, or control services.
Hence whatever you do, you shouldn’t be able to damage the operating system too much.

3.1 Logging into the server — Option 2 (preferred)

If you haven’t selected the Option 1 defined in step 2.1, and now that a non-root user has been created, you can use it to log into the server, and from there acquire the elevated privileges required to manage it.
The first steps are identical:

Try to log in through this user from another terminal with:

You should be logged in as validator, without a password being asked.
Next, we need to allow this user to acquire root privileges, as root:

Now switch to the root user with:

And type the user password you defined earlier.

Once you are root, you can proceed with securing the server access by preventing the root user from logging in — if anyone tries to brute force the root password, he will never succeed as the server will simply not allow this user to connect remotely.

Edit the line “PermitRootLogin” — uncomment it by removing the initial # and switch the value to “no”:

You can also disable the password authentication with:

Save and exit the file, then restart the ssh service:

Now if you attempt to log into the server as root, you will remain locked out.

4. Installing and Initializing the Node

We’ll take the example of a Cosmos node.

4.1.1 Installation — Basic Method

This is the simple method. Next section show how to use Cosmovisor, which is highly recommended.

Let’s start with cloning the chain Git repository, from which we will build and install the node. The appropriate repository is to be found in the chain documentation (which should be read).


v23.3.0 is the current Cosmos version at the time of this writing, but chain upgrades are rather frequent. You need to verify the actual version in use.
The appropriate version should of course be selected for the concerned network.

Once this command has completed, the node will be installed.

⚠ If the above command terminated with an error, carefully read the output as it explains what went wrong and gives you hints to resolve the issue.
If it is related to a missing package, you need to exit this user and return to being root by typing exit or hitting Ctrl+d. You can then install the required package with apt install <package_name>.
Once done, type su validator and cd gaia, then attempt the installation again.

4.1.2 Installation with Cosmovisor — Recommended Approach

Cosmovisor is a tool developed by the Cosmos SDK team that allows nodes to be upgraded in a seamless and nearly completely automated fashion.
As it is important to upgrade the node on time to avoid being jailed, it is highly recommended to use Cosmovisor.

The full documentation is here for reference.

Let’s start with installing it:

⚠ Sometimes a specific version is required. Check the chain’s documentation and install the recommended version if there is one.

At the time of the writing, the latest version is 1.7.1, which works reliably.

Now we’ll create the relevant directories:


And copy the current binary to that first directory:

For now, that’s it. Let’s continue with the setup.

4.2 Initialization

Let’s init the node:

The binary, moniker and chain id are of course to be adjusted accordingly.

This will create the following under the validator home:


priv_validator_key.json is the most important file here. It identifies your validator node and should be safely copied and stored elsewhere.
⚠⚠ If you lose it, you lose your validator. There is no coming back from this.
⚠⚠⚠ Lastly, never, ever run two nodes with the same priv_validator_key.json simultaneously. This will cause you to double-sign and be tombstoned (= permanently jailed), in addition to being slashed of typically 5% of your total stake. Delegators will NOT be happy.

4.3 Creating the validator key

Which is essentially creating a wallet that will be linked to the validator and will be used for all operations — voting, withdrawing the rewards, etc.
Let’s call this key validator_key.

You will be prompted to set a password (that will be required each time the key is accessed through a vote, withdrawal, etc.), and the output will display the seed phrase (mnemonic) for this wallet.
This seed phrase will never appear again so make sure you save it somewhere safe.

Note: if you are running a testnet node, you can skip the password step by specifying --keyring-backend test instead. That way, no password will be asked for any operation.
If you select that, you will need to append this flag to each subsequent commands, otherwise they will fail with “key not found”.
⚠ Do not select this option on a mainnet node.

New files will appear, e.g.:


The add command displayed the wallet address along with the mnemonic, but you can also retrieve it by typing:

And to obtain your validator address, the “valoper”, type:

4.4 Configuring and Starting the Node

Before moving on to the creation of the validator, the node needs to be started and synchronized.
In most cases, the chain will have started long ago and catching up from block 1 would take ages.
A couple mechanisms exist to bootstrap a node from a recent height, allowing it to quickly get in sync.
But first off, some configuration is required so that the node can connect to the rest of the chain.
With your favorite editor, e.g. nano or vim, open $ .gaia/config/config.toml and update the following items (we’ll assume here that you will use the default peer-to-peer and RPC ports 26656 and 26657):

In the [p2p] section:
external_address = “your_actual_ip:26656” → this allows other nodes to connect to yours. You can change the default ports (26656 for the p2p, 26657 for the rpc), but if so, make sure to change them everywhere in that file.
seeds = “…” → find one or several seeds (in the documentation or the chain’s Discord / Telegram) and add them here, comma-separated. Otherwise you can find persistent_peers and list them in the line that follows, but seeds are preferable.

You can also update the below limits so that the node can connect to more peers:
max_num_inbound_peers = 40
max_num_outbound_peers = 40

Note: you can adjust these values as you see fit. To ensure that the node remains synchronized with the chain, enough outbound peers are required. However, the more peers the more resources usage — in particular, bandwidth.
You can run tests and find the right values.

In the [tx_index] section:
If the node is purely a validator, and except for special cases like Terra Classic, the index is not required. Disabling it helps reduce the disk usage.
indexer = “null”

Now edit $ .gaia/config/app.toml:

At the top of the file:
Usually it is required to specify something here:
minimum-gas-prices = “…” → check the documentation, Discord or Telegram for the correct value.
pruning = “default” →
you can customize this, including setting “everything”, but unless you know what you are doing it is best left as it is.

in the [grpc] section:
You probably won’t have much use for this.
enable = false

In the [state-sync] section:
If not the case already:
snapshot-interval = 0

Lastly, we’ll create a systemd service to control the node:

As root:

4.4.1 NOT USING Cosmovisor:

Paste the following:

[Unit] 
Description=Cosmos node After=network.target

[Service] 
Type=simple 
Restart=on-failure 
RestartSec=5 
User=validator 
ExecStart=/home/cosmos/go/bin/gaiad start 
LimitNOFILE=65535 

[Install]

4.4.2 USING Cosmovisor — which is, again, recommended:

[Unit] 
Description=Cosmos node 
After=network.target  

[Service] 
Type=simple 
Restart=on-failure 
RestartSec=5 
User=validator 
ExecStart=/home/validator/go/bin/cosmovisor run start
LimitNOFILE=65535
Environment="DAEMON_NAME=gaiad"
Environment="DAEMON_HOME=/home/validator/.gaia"
Environment="DAEMON_ALLOW_DOWNLOAD_BINARIES=false"
Environment="DAEMON_RESTART_AFTER_UPGRADE=true"
Environment="UNSAFE_SKIP_BACKUP=true"

[Install]

The Environment items instruct Cosmovisor how to perform. They are all self-explanatory except maybe these two items:

This means that it won’t be allowed to automatically download the new upgrades. This is out of caution: if an issue occurs when downloading the binary, Cosmovisor would restart anyway. Another possibility is if somehow the chain repository was hacked and a malicious binary pushed, it would avoid that the node automatically starts using it.
While not a likely situation, it is best to keep it to “false”.

By default, Cosmovisor makes a full backup of the data folder when an upgrade takes place — if the upgrade fails and the data becomes corrupted, it allows to restore the backup and attempt the process again.
However, this can take a long time and use up a lot of disk space.
This flag disables this backup.

Of course update the user and binary path accordingly.

Then save and exit the file.

4.4.3 Upgrading the Binary

When a new binary becomes available, it needs to be installed and the node restarted at the target height — not before!

For an upgrade to binary version v24.0.0 (just an example. Always replace the user and binary path accordingly):

  • Without Cosmovisor:


When the upgrade height is reached, your node will stop and the log will display an error like panic: UPGRADE "v24" NEEDED at height: …
At this point — and not before — you should stop your node and copy over the binary you built to the production directory:

$ mv ~/gaia/build/gaiad go/bin

Alternatively, and if you do not want to build the new binary in advance, you can run the commands above and finish with make install instead of make build — this automatically places the new binary in the production directory.

Then restart the node. Blocks will be produced again once over 66% of the voting power has upgraded.

  • With Cosmovisor:

    If you set the flag "DAEMON_ALLOW_DOWNLOAD_BINARIES=true" in your service file, in most cases you won't have to do anything. Cosmovisor will download the new version when the upgrade height is reached and restart the node ; you will not need to take any action.


    Otherwise, you'll need to prepare the switch. First off, you need to find out the upgrade name, which appears in the governance proposal. But more simply, you can run:

    $ gaiad query upgrade plan


    Which will return something like this:


    plan:
      height: "26050900"
      info: '{"binaries": {"darwin/amd64": "https://github.com/cosmos/gaia/releases/download/v24.0.0/gaiad-v24.0.0-darwin-amd64?checksum=sha256:baaa3580defa20afbb7fc982ca38559930ac178032e0d517805d5c2faf103d61",
        "darwin/arm64": "https://github.com/cosmos/gaia/releases/download/v24.0.0/gaiad-vv24.0.0-darwin-arm64?checksum=sha256:f63214251b9f814e17977aa2b34b5e8119e3ad5d0b02936e3bca3be56d5b0cd9",
        "linux/amd64": "https://github.com/cosmos/gaia/releases/download/v24.0.0/gaiad-v24.0.0-linux-amd64?checksum=sha256:7c61b1d29b68749cf0d6ef5170b264a6050f42815e86f2ffdc154a4c04661047"}}'
      name: v24
      time: "0001-01-01T00:00:00Z"


    The name is what you need here.
    You can now either build the new binary, or download a prebuilt version from the links above (in most cases you need the linux/amd64 version).


    $ cd ~/gaia
    $ git fetch
    $ git checkout v24.0.0
    $ make install


    Then create the Cosmovisor upgrade folder using the upgrade name and copy over the new binary into it:

    $ mkdir -p ~/.gaia/cosmovisor/upgrades/v24/bin
    $ cp ~/go/bin/gaiad ~/.gaia/cosmovisor/upgrades/v24/bin
    
    

And that’s it. When the height is reached, Cosmovisor will automatically swap the binary and restart.

This is particularly useful when the upgrade time isn’t convenient, e.g. occurring in the middle of the night — although it’s always best to be around in case something goes wrong.

4.5 Firewall

For the node to work, it must obviously be able to communicate with the others. However, it is important to implement a minimal security on the server, by opening only the relevant ports.

In our case, only the peer-to-peer port, default 26656, should be open to incoming connections. The RPC port 26657 is for your own use and should not be accessible to the rest of the world.

Managing the firewall depends on the cloud provider — on Digital Ocean for example, you can configure it directly from the user interface. Basically and for the moment, you should only allow inbound connections to ports 22 (SSH) and 26656.

If you don’t have a user interface to manage the firewall, you can configure it with ufw:

Without exiting the current session, open a new terminal and ssh into the server. This is to verify that the firewall isn’t misconfigured and lets you connect to the machine.

Now we can prepare the node to allow it to start and quickly synchronize with the rest of the chain.

4.6 Running the Node

4.6.1 Using a Quicksync Snapshot

This is an archive containing the data of a node up to a certain height, which can be downloaded and extracted in $ .gaia/, replacing the current data folder.
Many teams provide such snapshots, including us at https://tools.highstakes.ch/snapshots.
Follow the instructions to deploy it, paying attention to the possible presence of a wasm folder.

There are multiple sizes available, depending on the start and end blocks in the snapshots and their type (full, archive, pruned, containing the index or not).
Once deployed, the node will start catching up from the last block of the snapshot, which should ideally be recent.

4.6.2 Using Statesync

This is a little more complicated. You need to find a RPC node and specify some details in config.toml.
In short, these RPCs are nodes that are configured to take regular snapshots of their data — not the same kind than above — , and serve them to the network.

  • The RPC: basically an IP and a port. There are plenty, or very few, depending on the network. You can ask in Discord or Telegram for the information.

  • Trusted height and hash: once this RPC is obtained, you can query it to get the current height:

  • With this current height, for example 15427931, you can get the trusted height and its trusted hash — basically, the oldest block from which your node will accept to restore a snapshot from the RPC.
    Typically we take a height that is 2000 to 5000 blocks below the current one, in this case say 15426000. Now we can get its hash:

  • This returns something like “D56FF01D27CD36D24EA97F069BCA5F81E9EE7AA8401B109CDD990E6EB7B5C232”

  • Now onto editing .gaia/config/config.toml, in the [statesync] section:


  • → ⚠ You need to include “http://” (or “https://” if relevant). You also must specify 2 RPC nodes — but it can be the same one twice. You also must specify the port.


  • It is also recommended to add the RPC peer to the persistent_peers item in config.toml. This peer is in the form “151192fd3ac51c0a9b1fae3a64ae7cbd3df161c@rpc_IP:rpc_P2P_PORT”.
    While there is no real rule here, without it the node can take a very long time to discover an available snapshot, or even fail doing so.

  • Save and exit the file, then start the node as root with:

  • You can monitor its activity with:

  • which should output a lot of text. If everything works fine you will see something like:

Followed by:

And once the node has started syncing:

The process can end in error, for example:

  • Something saying “Tree must be empty” → You may have attempted this earlier. Remove everything in .gaia/data/ except priv_validator_state.json.

→ restart the statesync configuration, selecting an earlier trusted height.

  • You can check the status of the node by running:

  • Which should return something like the following:

If “catching_up” is “true”, you should wait some more.

4.7 Verifying the Performance

Earlier we installed glances, which is a tool to display key information about the server usage and performance. You can use it to quickly check if the server runs normally or is undersized for the task at hand.

A server running 2 (backup) nodes simultaneously.

The iowait is an important item. In a very simplified way: when it’s high —steadily or often above 8% is a problem, give or take —, it means that the disk has a hard time keeping up: there is data to be written (blocks, in our case), but it can’t write it fast enough so it’s piling up.
This causes the node to be late and miss blocks, with a result that can look like this in Minstcan:

A pretty extreme example.

⚠ If the node is synchronized and you still have a high iowait, cpu or memory usage, it may mean that you should upgrade the server to better specs.

4.8 Tweaking the node parameters!

A common issue with a PoS node is the disk usage, which can quickly become a problem and requires tens or even hundreds of GB.
There are however some options to drastically limit the growth of the data folder, with some trade-offs however.

In config.toml:

discard_abci_responses = true

This has no particular downsides.

indexer = "null"

A validator node doesn't usually require it except in a few occasions (e.g. Terra Classic for its oracle)

in app.toml:

pruning = "custom"

pruning-keep-recent = "10" 
pruning-interval = "13"

min-retain-blocks = 10

The pruning options define how the node will remove older entries from its database instead of keeping a very large history.
The above instruct it to keep only the 10 latest blocks, and to run a sweep every 13 blocks.
This value can be changed — it is recommended to use a prime number, and the pruning should occur not too often (it takes resources), but not at a too large interval either otherwise the process may be heavy and cause the node to miss blocks.
Somewhere between 10 and 30 is usually fine.

The last line tells the node to also prune the blockstore.

This combination of parameters will ensure that you have the smallest node possible — but it will be a validator only, not fit for much else. In addition, this is fairly resource-intensive (in particular with fast chains like Injective) and may adversely affect the uptime.
You need to monitor the performance and adjust these settings accordingly until you find the right compromise.

5. Registering as a Validator

Once the node is operational and synchronized, we can move on to the final step: becoming a validator.

5.1 Funding the Wallet

You will need some tokens in the wallet you created earlier to pay for the transaction fees and create a validator. You can either use a faucet, or purchase tokens and transfer them to the wallet.
You don’t need many for the next steps, 2–5 should be enough.

5.2 Creating the Validator

Then you can create the validator, by passing the following command — adjust to your situation (binary, fees, etc.).
⚠ Some items are permanent, i.e. cannot be edited later (commission-max-rate, commission-max-change-rate, min-self-delegation, pubkey).

$ gaiad tx staking create-validator \ --amount 1000000uatom \ --chain-id=cosmoshub-4 \ --pubkey=$(gaiad tendermint show-validator) \--moniker=”Your custom moniker” \--website=”https://your_website.com” \--details=”A brief description of your team/project" \ -- identity your_keybase_identity \--security-contact “your_contact_email” \--commission-rate=”0.05" \--commission-max-rate=”0.20"--commission-max-change-rate=”0.01"--min-self-delegation=”1" \--from validator-key \-

Newer versions of the Cosmos chains have a different method for creating a validator, now requiring to create a file with the same details than above:

cat <<EOF > validator.json
{
  "pubkey": $(gaiad tendermint show-validator),
  "amount": "1000000uatom",
  "moniker": "$YOUR_MONIKER_HERE",
  "identity": "<your keybase identity>",
  "website": "your_website.com",
  "security": "contact@yourdomain.com",
  "details": "a brief description of your team",
  "commission-rate": "0.1",
  "commission-max-rate": "0.2",
  "commission-max-change-rate": "0.01"
}
EOF

Followed by:

gaiad tx staking create-validator validator.json --from validator --gas auto --gas-adjustment 1.3 --gas-prices 0.005uatom -y

⚠ If you created the key with --keyring-backend test, then make sure to add this flag to the above command.

You can set the commission items to whatever you like : 0.05 means 5%, “commission-max-rate” is self-explanatory, and commission-max-change-rate=”0.01" means that you can only increase the commission by 1 point every 24h.
This is reassuring for the stakers as it means that you can’t bump the commission instantly to the maximum you defined (which can absolutely be configured to 100%!).
The identity flag is optional, but this is usually how the explorers retrieve and display the team logo.
Website, details and security contact are also optional.

If the command succeeded, you can run $ curl localhost:26657/status | jq again: the “voting power” part should now indicate “1”.

5.3 Joining the Active Set

Your validator is now registered and operational, but you may not actually participate to the consensus yet as your voting power is too low — and therefore you do not earn rewards.

In order to join the active set, you should either find delegators or stake your own funds so that you have more voting power than the current last validator in the set.

This is actually the hardest part. You need to provide a stable quality of service, have a competitive commission, and do some marketing and communication to make the community aware of your presence and start having holders trust you with their stake.
There are plenty of ways to try and stand out, but that is entirely up to you.

The quality of service is essential. If you miss blocks, then you accrue less rewards and so do your delegators; and if you miss too many, you end up being jailed and slashed.
Delegators may also redelegate to another validator.

⚠ This is why implementing a monitoring solution is the next step in this journey. Such tool will constantly evaluate the performance of the node and alert you whenever a problem arises, giving you the opportunity to resolve it before it visibly affects your quality of service.

This will be the subject of an upcoming article.

Interested in Staking with Us?

Join 7,431 other stakers earning optimized rewards with our high-performance validators.
Get started and maximize your staking potential.