By using this site, you agree to our use of cookies, which we use to analyse our traffic in accordance with our Privacy Policy. We also share information about your use of our site with our analytics partners.

Blockchain Explained

How To Safely Migrate Your Ethereum 2.0 Validator Client

This last installment will be about safely migrating my validator from one virtual service provider (AWS) to another (Digital Ocean). Learn how to prevent slashing.
by Coogan BrennanFebruary 27, 2021
My Journey to Becoming a Validator on Ethereum 2 0  Part 2

This article is the last in a four-part series on how to run your own Eth2 validator. If you are new to this series, be sure to check out Part 1: Getting Started, Part 2: Setting Up Your Client and Part 3: Installing Metrics and Analyzing P&L.

Back in November 2020, I set up an Ethereum 2.0 validator client on Amazon Web Services (AWS). Since then, the Beacon chain has launched and I鈥檝e written two additional pieces documenting the journey. This included registering for Infura as an Ethereum 1.0 endpoint, installing and setting up Teku as an Eth2 client, and analyzing my node鈥檚 metrics.

This last installment will be about safely migrating my validator from one virtual service provider (AWS) to another (Digital Ocean). Having the same validator keys on two different instances could result in the slashing and freezing of my staked ether, which would be not good.

Also, Proof of Stake blockchains present a unique trust issue to new clients syncing. We鈥檒l discuss this and ways to solve it.

Here we go!

  1. Initializing new instance
  2. Syncing and Weak Subjectivity
  3. Slashing Prevention
  4. Migration

Initializing New Instance

In Part 1, I mentioned that I was considering using my 8GB RAM Raspberry Pi but didn鈥檛 want to have to worry about internet connection, making the site power is still on, overheating and speed, or if my dog kicks over my laptop when I鈥檓 away. After a 80 hour power-outage in Texas last week, I鈥檓 really glad that I decided to go with a cloud based service instead.聽

Some readers were dubious about my decision to spin up an AWS 16GB node for a single validator and those readers have been proven correct. Before genesis, it was difficult to definitively say much about the upper-end network load. And there are still concerns about the increased load brought on by the merge of Ethereum 1.0 and 2.0 scheduled for this year.

AWS is great for large fleets of validators. For my single validator, however, I鈥檝e decided to migrate to Digital Ocean. (See Part 2 for the virtual / local hosting discussion I had).

Another reason for me bulking up the validator, perhaps unnecessarily, may lie in my concept of what a miner is in a major blockchain: Huge server farms on mainland China or, in 2017, dozens of GPUs making a friend鈥檚 garage suffocated with heat.聽

Matt Garnett helped correct this bias by reminding me about the original design premise for Ethereum 2.0: security of the network uncoupled from enormous amounts of computing power. He pointed out the 鈥渞aspberry pi鈥 computing unit benchmark proposed by Gavin Wood in 2015. Even in 2019, Justin Drake spoke publicly about his hope for Eth2 validators to run on the new Raspberry Pi Model 4:

Source

Digital Ocean Droplet聽

Using Mara Schmiedt and Collin Meyers鈥 Validator Guide, I purchased the 鈥淏asic鈥 Droplet (Digital Ocean鈥檚 terminology for an instance) with 8GB RAM and 160GB memory. For current network conditions, it鈥檚 probably overkill on the memory, but the 4GB RAM Droplet is not enough processing power.

If you choose Ubuntu 20.04 as the operating system, you can follow Part 2鈥檚 setup (you can also follow Somer Esat鈥檚 excellent installation tutorials). You do not have to specify networking rules for P2P exchange because Digital Ocean has those ports open by default (???). For that reason, it鈥檚 very important to set up SSH and disable root login.

(The one weird quirk I found with Digital Ocean was adding an SSH key after deploying the instance. This seems to be a common issue with folks, so if you鈥檙e having trouble with this, here鈥檚 a canonical thread I found to be helpful.)

You鈥檒l also want to set up SSH because we鈥檙e going to use scp to export a few things from Instance 1 (AWS) to Instance 2 (DO):

  • Validator Keys
  • Current Network State
  • Slashing Protection.

The validator keys are needed to run our Teku client as the validator on the new instance. The Current Network State and Slashing Protection are crucial for making our migration as safe and fast as possible.聽

Syncing

Proof of Stake, the new consensus mechanism for Ethereum 2.0, has significant differences compared to Proof of Work. One of those is the concept of finality: when the Beacon chain finalizes an epoch, it鈥檚 taking a snapshot of all the activity and balances on the network. That snapshot, called a checkpoint, might as well be its own genesis block. The network is not going back.聽

It鈥檚 a common misunderstanding that Proof of Work also offers this finality. In fact, Proof of Work chains, like Bitcoin, never fully guarantee the chain won鈥檛 be reorganized. It鈥檚 more that, over time, the probability of a chain reorganization becomes successively smaller with each block confirmation. At a certain point, the probability of a reorganization for a particular block becomes infinitesimally small. This is why, on Bitcoin and Ethereum 1.0, a transaction is considered 鈥渟afely included in the chain鈥 only after a certain number of blocks are confirmed after the one containing it.

There is a security weakness in finality, though. From Teku docs:

If 鈪 of validators withdraw their stake and continue signing blocks and attestations, they can form a chain which conflicts with the finalized state. If your node is far enough behind the chain head to not be aware that they鈥檝e withdrawn their funds, the exited validators can trick you into following the wrong chain.

https://docs.teku.consensys.net/en/latest/Concepts/Weak-Subjectivity/

Well-behaved validators who have successfully and properly exited the chain can sell their private keys on the black market to a malicious actor. (There is no financial disincentive for them to do this as their funds have safely exited the protocol) That malicious actor can then amass enough keys to find validators coming back online after quite a bit of time and commit a Sybil attack. See Meredith Baxter鈥檚 excellent explanation below:

What鈥檚 the solution? Weak Subjectivity Checkpoints. These are pointers to a relatively recent network state confirmed by a majority of validators. If a node with relatively scarce network information wants to sync to the Beacon chain, they can start with the genesis block and the weak subjectivity checkpoint. As the node communicates with other peers, they can check to make sure they haven鈥檛 been led astray by making sure they end up with the correct network state reflected in the weak subjectivity checkpoints.

Where does one get these checkpoints? That鈥檚 a tricky question. Teku Product Lead Ben Edgington shares this insight:聽

It鈥檚 up to the user to set their trust level and act accordingly. One suggestion is for client teams to set the checkpoints since they are implicitly trusted by their users in any case. As a client dev, I don鈥檛 really like this, but I suppose it鈥檚 the reality. If a bunch of block explorers, the EF, all the client teams, a few exchanges, some staking services, are all advertising the same checkpoint you鈥檙e very unlikely to go wrong. Having a diversity of inputs is good to avoid cartels.

Teku provides the start-up flag 鈥搘s-checkpoint which accepts the checkpoint for syncing.聽

Another option with Teku is --initial-state. This is only available on Teku right now and requires a path or URL to an SSZ-encoded state file. It reduces the sync time to sometimes seconds which is fantastic particularly if you鈥檙e concerned about validator downtime.聽

For now, the best sources for --initial-state are your own. The best use 鈥渋s if you鈥檙e maintaining a number of nodes and need to spin up new ones from time to time,鈥 according to Teku Blockchain Protocol Engineer Adrian Sutton. This is what I鈥檒l use when switching on my new validator instance. Later in this post, I鈥檒l show you how to export it safely from Teku.

Slashing

The last concept to discuss before migration is the ever-dreadful sounding slashing. Slashing is the financial disincentive against validators for submitting bad data to the network. The penalty is forfeiting of a portion of your stake and being politely escorted to the door. Beaconcha.in shows there have been 133 validators slashed as of this writing (although, curiously, Beaconscan lists 132?).聽

Needless to say, we want to avoid being slashed. Luckily, slashing is only for behavior that violates the protocol. We don鈥檛 get slashed simply for inactivity.

However, a common reason for slashing is the exact circumstance I鈥檓 attempting now: An individual inadvertently running the same validator key on two different instances. This appears to the network as a validator acting maliciously, as they could appear to be attesting to two different network states.

Luckily, there exists slashing protection in the form of EIP-3076, 鈥淎 standard format for transferring a key鈥檚 signing history allows validators to easily switch between clients without the risk of signing conflicting messages.鈥 It鈥檚 a JSON file with a list of all the blocks and attestations the client has made. It鈥檚 exported by one client and consumed by another in a separate process from actually running the node. In Teku, we will export our slashing protection file from our first instance using the command teku slashing-protection export --to=FILENAME --data-path=PATH/TO/TEKU/DATA-PATH

We鈥檒l then send the slashing protection file to the new instance and feed into the new validator client with the following command: teku slashing-protection import --data-path=PATH/TO/TEKU/DATA-PATH --from=FILENAME

We do this before turning our validator on for the first time to prevent our client from accidentally submitting slashable activity.

(For more information about EIP-3076 and slashing, please check out Ethereum Cat Herders鈥 recent episode interviewing Sacha Saint-Leger, Michael Sproul and Danny Ryan about its development and implementation.)

Migration

With those two concepts out of the way, let鈥檚 get down to the nitty-gritty. It鈥檚 not hard to do this, I just had to triple-check that I had the steps correct. I would advise anyone attempting it to do the same! Jumbling them up would be problematic, to say the least. Here鈥檚 the rundown:

  1. Download initial state from the first Teku node (AWS)
  2. Stop first Teku node (AWS)
  3. Export slashing protection data from first Node (AWS)
  4. Transfer initial state and slashing protection data from first Teku node (AWS) to second Teku node (DO)
  5. Import slashing protection data to second Teku node (DO)
  6. Start second Teku node (DO) using initial state from second Teku node (AWS)*

*For the extra-paranoid, Meredith Baxter suggests starting the second Teku client with 鈥損2p-enabled=false while the client is consuming the initial-state to prevent communication with other nodes. If you do this, be sure to restart the second Teku client without --p2p-enabled=false while the client is consuming the initial-state to prevent communication with other nodes. If you do this, be sure to restart the second Teku client without --p2p-enabled=false to allow you to communicate with the network

Here are the commands for each of the steps, broken down and detailed:

1) Download initial state from the first Teku Node (AWS)

This has to be done while the first Teku node is still running. Download your current network state from your Teku client鈥檚 API by entering the following API call in the first node鈥檚 terminal (we鈥檙e assuming Teku is running in the background):

curl -X GET "http://localhost:5051/teku/v1/beacon/blocks/finalized/state" --output initial-state.ssz

This will download the initial state as initial-state.ssz from whatever directory you鈥檙e currently in.

2) Stop first Teku node (AWS)

I鈥檓 assuming you have a similar setup to Part 2 (Ubuntu 20.04), specifically the systemd service we set up for Teku. If that鈥檚 the case, stop the first node with:

sudo systemctl stop teku

Double check it has indeed stopped by running:

sudo systemctl status teku

3) Export slashing protection data from first node (AWS)

Now that the node has stopped, we need to get the slashing protection schema. To do so, run the following command:

sudo teku slashing-protection export --to=slashing-protection --data-path .local/share/teku/

This exports the slashing protection file named slashing-protection.聽

4) Transfer initial state and slashing protection data from first Teku node (AWS) to second Teku node (DO)

Since both of my service providers have SSH setup, we can use scp to copy the network state and slashing protection from one to the other. There might be a way to pipe these, but the stuff I read suggested scp doesn鈥檛 allow piping (if someone can find that, let me know!)

Here are the commands to transfer the two files out of the first node (AWS) to our desktop assuming you don鈥檛 change the filenames from above:

scp -i PATH/TO/SSH/KEY [email protected]_INFO.REGION.amazonaws.com:/home/ubuntu/initial-state.ssz ~/Desktop

scp -i PATH/TO/SSH/KEY [email protected]AWS_INFO.REGION.amazonaws.com:/home/ubuntu/slashing-protection ~/Desktop

Here are the commands to transfer the files from your Desktop to your second node:

scp -i PATH/TO/SSH/KEY ~/Desktop/slashing-protection [email protected]_NODE_IP:~

scp -i PATH/TO/SSH/KEY ~/Desktop/initial-state.ssz [email protected]_NODE_IP:~

5) Import slashing protection data to second Teku node (DO)

Before we start our second node, we need to feed in the slashing protection to make sure we don鈥檛 get slashed.

sudo teku slashing-protection import --data-path=/var/lib/teku --from=./slashing-protection

You should get a success message from the client when it鈥檚 done.

6) Start second Teku node (DO) using initial state from second Teku node (AWS)

We use the same systemd service script for Teku that we used previously with one exception:

ExecStart=/home/ubuntu/teku-20.11.1/bin/teku --network=mainnet --eth1-endpoint=INFURA_ETH1_HTTP_ENDPOINT_GOES_HERE --initial-state=FILENAME --validator-keys=/home/ubuntu/validator_key_info/KEYSTORE-M_123456_789_ABCD.json:/home/ubuntu/validator_key_info/validator_keys/KEYSTORE-M_123456_789_ABCD.txt --rest-api-enabled=true --rest-api-docs-enabled=true --metrics-enabled --validators-keystore-locking-enabled=false --data-base-path=/var/lib/teku

The --initial-state=FILENAME flag is where we put in the location of the initial-state.ssz file we transferred in from our first node.

Previous versions of Teku required us to remove this command from the systemd service once it had started, but newer versions can ignore it after the first run. Check the version and adjust accordingly.

Once we have altered and saved our systemd file, we reboot the systemd service to implement the changes, then start Teku and cross our fingers!

sudo systemctl daemon-reload

sudo systemctl start teku

I quadrupled checked to make sure my first node was not running before running the start script, FYI!

As before, check to make sure Teku is booting up and running okay:

sudo systemctl status teku

If you get any errors, you can get more details by running:

sudo journalctl -f -u teku.service

Once you鈥檙e satisfied it鈥檚 running okay, run the following command to make sure it restarts if anything happens:

sudo systemctl enable teku

Thus concludes the final installment of this first series. I鈥檓 sure there will be more excitement as developments continue and we鈥檒l do our best to update this and provide more resources when needed.

Happy staking!

Thank you: Aditya Asgaonkar, Meredith Baxter, Ben Edgington, Adrian Sutton, Alex Tudorache, and James Beck.