If you’ve ever used OpenStack Nova, you’ve probably at least seen our plethora of confusing commands that involve migrating instances away from a failing (or failed) compute host. If you are brave, you’ve tried to use one or more of them, and have almost certainly chosen poorly. That is because we have several commands with similar names that do radically different things. Sorry about that.
Below I am going to try to clarify the current state of things, hopefully for the better so that people can choose properly and not be surprised (or at least, surprised again, after the initial surprise that caused you to google for this topic). Note that we have discussed changing some of these names at some point in the future. That may happen (and may reduce or increase confusion), but this post is just aimed at helping understand the current state of things.
At the heart of this issue, are the three(ish) core operations that Nova can do (server-side) to move instances around at your request. It helps to understand these first, before trying to understand everything that you can do from the client.
1. Cold Migration
Cold migration in nova means:
- Shutting down the instance (if necessary)
- Copying the image from the source host to the destination host
- Reassigning the instance to the destination host in the database
- Starting it back up (if it was running before)
This process is nice because it works without shared storage, and even lets you check it out on the destination host before telling the source host to completely forget about it. It is, however, rather disruptive as the instance has to be powered off to be moved.
Also note that the “resize” command in Nova is exactly the same as a cold migration, except that we start up the instance with more (or less) resources than it had before. Otherwise the process is identical. Cold migration (and resize) are usually operations granted to regular users.
2. Live Migration
Live migration is what it sounds like, and what most people think of when they hear the term: moving the instance from one host to another without the instance noticing (or needing to be powered off). This process is typically admin-only, requires a lot of planets to be aligned, but is very useful if tested and working properly.
Evacuate is the confusing one. If you look at the actual english definition of evacuate in this context, it basically says (paraphrased):
To remove someone (or something) from a dangerous situation to avoid harm and reach safety.
So when people see the evacuate command in Nova, they usually think “this is a thing I should do to my instances when I need to take host down for maintenance, or if a host signals a pre-failure situation.” Unfortunately, this is not at all what Nova (at the server side) means by “evacuate”.
An evacuation of an instance is done (and indeed only allowed) if the compute host that the instance is running on is marked as down. That means the failure has already happened. The core of the evacuate process in nova is actually rebuild, which in many cases is a destructive operation. So if we were to state Nova’s definition of evacuate in english words, it would be:
After a compute host has failed, rebuild my instance from the original image in another place, keeping my name, uuid, network addresses, and any other allocated resources that I had before.
Unless your instance is volume-backed*, the evacuation process is destructive. Further, you can only initiate this process after the compute host the instance is running on is down. This is clearly a much different definition from the english word, which implies an action taken to avoid failure or loss.
Footnote: In the case of volume-backed instances, the root disk of the instance is usually in a common location such as on a SAN device. In this case, the root disk is not destroyed, but any other instance state is recreated (which includes memory, ephemeral disk, swap disk, etc).
Now that you understand the fundamental operations that the server side of Nova can perform, we should talk about the client. Unfortunately, the misnamed server-side evacuate operation is further confused by some additional things in the client.
In the client, if you want to initiate any of the above operations, there is a straightforward command that maps to each:
||Cold Migration||Power off and move|
||Cold Migration (with resize)||Power off, move, resize|
||Live Migration||Move while running|
||Evacuation||Rebuild somewhere else|
The really confusing bit comes into view because the client has a few extra commands to help automate some things.
nova host-evacuate command does not translate directly to a server-side operation, but is more of a client-side macro or “meta operation.” When you call this command, you provide a hypervisor hostname, which the client uses to list and trigger evacuate operations on each instance running on that hypervisor. You would use this command post-failure (just like the single-instance evacuate command) to trigger evacuations of all the instances on a failed compute host.
The nova host-servers-migrate command also does not directly translate to a server-side operation, but like the above, issues a cold migration request for each instance running on the supplied hypervisor hostname. You would use this command pre-failure (just like the single-instance migrate command) to trigger migrations of all the instances on a still-running compute host.
Ready for the biggest and most confusing one, saved for last? The client also has a command called
host-evacuate-live. You might be thinking: “Evacuate in nova means that the compute host is already down, how could we migrate anything live off of a dead host?” — and you would be correct. Unfortunately, this command does not do nova-style evacuations at all, but rather reflects the english definition of the word evacuate. Like its sibling above, this is a client-side meta command, that lists all instances running on the compute host, but triggers
live-migration operations for each one of them. This too is a pre-failure command to get instances migrated off of a compute host before a failure or maintenance event occurs.
In tabular format to mirror the above:
||Evacuation||Run evacuate (rebuild elsewhere)
on all instances on host
||Cold Migration||Run migrate on all
instances on host
||Live Migration||Run live migration on all
instances on host
Hopefully the above has helped demystify or clarify the meanings of these highly related but very different operations. Unfortunately, I can’t demystify the question of “how did the naming of these commands come to be so confusing in the first place?”