Upgrading Nova to Kilo with minimal downtime

Starting in Icehouse, Nova gained the ability to do partial live upgrades. This first step meant that control services (which are mostly stateless) could be upgraded along with database schema before any of the compute nodes. After that step was done, individual compute nodes could be upgraded one-by-one, even migrating workloads off to newer compute nodes in order to facilitate hardware or platform upgrades in the process.

In the Kilo cycle, Nova made a concerted effort to break that initial atomic chunk of work into two pieces: the database schema upgrades and the code upgrades of the control services. It’s our first stab at this, so it’s not guaranteed to be perfect, but initial testing shows that it worked.

What follows is a high-level guide for doing a rolling Nova upgrade, using Juno-to-Kilo as the example. It’s not detailed enough to blindly follow, but is more intended to give an overview of the steps involved. It’s also untested and not something you should do on a production machine — test this procedure in your environment first and prove (to yourself) that it works.

The following steps also make some assumptions:

  • You’re using nova-network. If you’re using neutron, you are probably okay to do this, but you will want to use care around the compute-resident neutron agent(s) if you’re running them. If you’re installing system-level packages and dependencies, it may be difficult to upgrade Nova or Neutron packages without upgrading both.
  • You’re running non-local conductor (i.e. you have nova-conductor services running and [conductor]/use_local=False in your config). The conductor is a major part of insulating the newer and older services in a meaningful way. Without it, none of this will work.

Step 0: Prepare for what is coming

In order to have multiple versions of nova code running, there is an additional price in the form of extra RPC traffic between the compute nodes and the conductors. Compute nodes will start receiving data they don’t understand and they will start kicking that data back to conductor for help translating it into a format they understand. That may mean you want to start up some extra conductor workers to handle this load. How many additional workers you will need depends on the characteristics of your workload and there is really no rule of thumb to go by here. Also, if you plan to convert your compute nodes fairly quickly, you may need only a little extra overhead. If you have some stubborn compute nodes that will continue to run older code for a long time, they will be a constant source of additional traffic until they’re upgraded.

Further, as soon as you start running Kilo code, the upgraded services will be doing some online data migrations. That will generate some additional load on your database. As with the additional conductor load, the amount and impact depends on how active your cloud is and how much data needs to be migrated.

Step 1: Upgrade the schema

For this, you’ll need to get a copy of Kilo code installed somewhere. This should be a mostly temporary location that has access to the database and won’t affect any other running things. Once you’ve done that, you should be able to apply the schema updates:

$ nova-manage db sync

This should complete rather quickly as it does no invasive data migration or examination.

You should grab the code of whatever you’re going to deploy and run the database sync from that. If you’re installing from pip, use the same package to do this process. If you’re deploying distro packages, use those. Just be careful, regardless of where you do this, to avoid service disruption. It’s probably best to spin up a VM or other sandbox environment from which to perform this action.

Step 2: Pin the compute RPC version

This step ensures that everyone in the cloud will speak the same version of the compute RPC API. Right now, it won’t change anything, but once you start upgrading services, it will ensure that newer services will send messages that are compatible with the old ones.

In nova.conf, set the following pin:

[upgrade_levels]
compute = juno

You should do this on any node that could possibly talk to a compute node. That includes the compute nodes themselves, as they do talk to other compute nodes as well. If you’re not sure which services talk to compute nodes, just be safe and do this everywhere.

You don’t technically need to restart all your services after you’ve made this change, since it’s really mostly important for the newer code. However, it wouldn’t hurt to make sure that everything is happy with this version pin in place before you proceed.

I’ll also point out here that juno is an alias for 3.35. We try to make sure the aliases are there for the given releases, but this doesn’t always happen and it sometimes becomes invalid after changes are backported. This obviously is not a nice user experience, but it is what it is at this point. You can see the aliases, and history, defined in the compute/rpcapi.py file.

Step 3: Upgrade the control services

This is the first step where you actually deploy new code. Make sure that you don’t accidentally overwrite the changes you made in step 2 to your nova.conf, or that your new one includes the version pin. Nova, by convention, supports running a new release with the old release’s config file so you should be able to leave that in place for now.

In this step, you will upgrade everything but the compute nodes. This means nova-api, nova-scheduler, nova-conductor, nova-consoleauth, nova-network, and nova-cert. In reality, this needs to be done fairly atomically. So, shut down all of the affected services, roll the new code, and start them back up. This will result in some downtime for your API, but in reality, it should be easy to quickly perform the swap. In later releases, we’ll reduce the pain felt here by eliminating the need for the control services to go together.

Step 4: Watch and wait

At this point, you’ve got control services running on newer code with compute nodes running old stuff. Hopefully everything is working, and your compute nodes are slamming your conductors with requests for help with the newer versions of things.

Things to be on the lookout for are messages in the compute logs about receiving messages for an unsupported version, as well as version-related failures in the nova-api or nova-conductor logs. This example from the compute log is what you would see, along with some matching messages on the sending-side of calls that expect to receive a response:

Exception during message handling: Endpoint does not support RPC version 4.0. Attempted method: build_and_run_instance

If you see these messages, it means that either you set the pin to an incorrect value, or you missed restarting one of the services to pick up the change. In general, it’s the sender who sent the bad message, so if you see this on a compute node, suspect a conductor or api service as the culprit. Not all messages that the senders send expect a response, so trying to find the bad sender by matching up a compute error with an api error, for example, will not always be possible.

If everything looks good at this point, then you can proceed to the next step.

Step 5: Upgrade computes

This step may take an hour or a month, depending on your requirements. Each compute node can be upgraded independently to the new code at this point. When you do, it will just stop needing to ask conductor to translate things.

Don’t unpin the compute version just yet, even on upgraded nodes. If you do any resize/migrate/etc operations, a newer compute will have to talk to an older one, and the version pin needs to remain in place in order for that to work.

When you upgrade your last compute node, you’re technically done. However, the steps after 5 include some cleanup and homework before you can really declare completion and have that beer you’re waiting for.

Step 6: Drop the version pins

Once all the services are running the new code, you can remove (or comment out) the compute line in the upgrade_levels section and restart your services. This will cause all the services to start sending kilo-level messages. ¬†You could set this to “kilo” instead of commenting it out, but it’s better to leave it unset so that the newest version is always sent. If we were to backport something that was compatible with all the rest of kilo, but you had a pin set, you might be excluded from an important bug fix.

Because all of your services are new enough to accept old and new messages, you can stage the restarts of your services however you like in order to apply this change. It does not need to be atomic.

Step 7: Perform online data migrations

This step is your homework. There is a due date, but it’s a long way off. So, it’s more like a term project. You don’t have to do it now, but you will have to do it before you graduate to Liberty. If you’re responsible and mindful, you’ll get this out of the way early.

If you’re a seasoned stacker, you probably remember previous upgrades where the “db sync” phase was long, painful, and intense on the database. In Kilo, we’ve moved to making those schema updates (hopefully) lightweight, and have moved the heavy lifting to code that can execute at runtime. In fact, when you completed Step 3, you already had some data migrations happening in the background as part of normal operation. As instances are loaded from and saved to the database, those conversions will happen automatically. However, not everything will be migrated this way.

Before you will be able to move to Liberty, you will have to finish all your homework. That means getting all your data migrated to the newer formats. In Kilo, there is only one such migration to be performed and there is a new nova-manage command to help you do it. The best way to do this is to run small chunks of the upgrade over time until all of the work is done. The size of the chunks you should use depend on your infrastructure and your tolerance for the work being done. If you want to do ten instances at a time, you’d do this over and over:

$ nova-manage migrate_flavor_data --max-number 10

If you have lots of un-migrated instances, you should see something like this:

10 instances matched query, 10 completed

Once you run the command enough times, you should get to the point where it matches zero instances, at which point you know you’re done. If you start getting to the point where you have something like this:

7 instances matched query, 0 completed

…then you still have work to do. Instances that are in a transitional state (such as in the middle of being resized, or in ERROR state) are normally not migrated. Let these instances complete their transition and re-run the migration. Eventually you should be able to get to zero.

NOTE: The invocation of this migration function is actually broken in the Kilo release. There are a couple of backport patches proposed that will fix it, but it’s likely not fixed in your packages if you’re reading this soon after the release. Until then, you have a pass to not work on your homework until your distro pulls in the fixes[1][2].

Summary and Next Steps

If you’ve gotten this far, then you’ve upgraded yourself from Juno to Kilo with the minimal amount of downtime allowed by the current technology. It’s not perfect yet, but it’s a lot better than having to schedule the migration at a time where you can tolerate a significant outage window for database upgrades, and where you can take every node in your cluster offline for an atomic code deployment.

Going forward, you can expect this process to continue to get easier. Ideally we will continue to reduce the number of services that need to be upgraded together, including even partial upgrades of individual services. For example, right now you can’t really upgrade your API nodes separate from your conductors, and certainly not half of your conductors before the other half. However, that reality does exist in the future, and will allow a much less impactful transition.

As I said at the beginning, this is new stuff. It should work, and it does in our gate testing. However, be diligent about testing it on non-production systems and file bugs against the project if you find gaps and issues.

Category(s): OpenStack
Tags: , , , ,

5 Responses to Upgrading Nova to Kilo with minimal downtime

    Will Auld says:

    How is gate testing for a divided upgrade (controllers first then compute nodes) being done? Grenade will not currently do this on its own. Details will be helpful.

    Thanks,

    Will

    • We have a nova partial-ncpu job that upgrades all the control services but leaves the compute service on the older code. Grenade has hacky support for this today, which is how we do it. That means we get coverage for Kilo->Master right now, for example. In general, we catch most upgrade issues that way, but things do slip through. We have plans to extend grenade to do multiple nodes, which will let us do many more things, but it’s a lot of work to get there.

      More discussion is probably best done on one of the mailing lists if you have other questions.

      Thanks!

      -Dan

  1. “fixes[1][2]” link to the same gerrit review?

One Response in another blog/article

  1. […] able to upgrade regularly can be important! To that end, here’s a walk through of¬†upgrading Nova to Kilo with minimal […]