Start and Monitor Image Pre-cache Operations in Nova

When you boot an instance in Nova, you provide a reference to an image. In many cases, once Nova has selected a host, the virt driver on that node downloads the image from Glance and uses it as the basis for the root disk of your instance. If your nodes are using a virt driver that supports image caching, then that image only needs to be downloaded once per node, which means the first instance to use that image causes it to be downloaded (and thus has to wait). Subsequent instances based on that image will boot much faster as the image is already resident.

If you manage an application that involves booting a lot of instances from the same image, you know that the time-to-boot for those instances could be vastly reduced if the image is already resident on the compute nodes you will land on. If you are trying to avoid the latency of rolling out a new image, this becomes a critical calculation. For years, people have asked for or proposed solutions in Nova for allowing some sort of image pre-caching to solve this, but those discussions have always become stalled in detail hell. Some people have resorted to hacks like booting host-targeted tiny instances ahead of time, direct injection of image files to Nova’s cache directory, or local code modifications. Starting in the Ussuri release, such hacks will no longer be necessary.

Image pre-caching in Ussuri

Nova’s now-merged image caching feature includes a very lightweight and no-promises way to request that an image be cached on a group of hosts (defined by a host aggregate). In order to avoid some of the roadblocks to success that have plagued previous attempts, the new API does not attempt to provide a rich status result, nor a way to poll for or check on the status of a caching operation. There is also no scheduling, persistence, or reporting of which images are cached where. Asking Nova to cache one or more images on a group of hosts is similar to asking those hosts to boot an instance there, but without the overhead that goes along with it. That means that images cached as part of such a request will be subject to the same expiry timer as any other. If you want them to remain resident on the nodes permanently, you must re-request the images before the expiry timer would have purged them. Each time an image is pre-cached on a host, the timestamp for purge is updated if the image is already resident.

Obviously for a large cloud, status and monitoring of the cache process in some way is required, especially if you are waiting for it to complete before starting a rollout. The subject of this post is to demonstrate how this can be done with notifications.

Example setup

Before we can talk about how to kick off and monitor a caching operation, we need to set up the basic elements of a deployment. That means we need some compute nodes, and for those nodes to be in an aggregate that represents the group that will be the target of our pre-caching operation. In this example, I have a 100-node cloud with numbered nodes that look like this:

$ nova service-list --binary nova-compute
+--------------+--------------+
| Binary | Host |
+--------------+--------------+
| nova-compute | guaranine1 |
| nova-compute | guarnaine2 |
| nova-compute | guaranine3 |
| nova-compute | guaranine4 |
| nova-compute | guaranine5 |
| nova-compute | guaranine6 |
| nova-compute | guaranine7 |
.... and so on ...
| nova-compute | guaranine100 |
+--------------+--------------+

In order to be able to request that an image be pre-cached on these nodes, I need to put some of them into an aggregate. I will do that programmatically since there are so many of them like this:

$ nova aggregate-create my-application
+----+-----------------+-------------------+-------+----------+--------------------------------------+
| Id | Name | Availability Zone | Hosts | Metadata | UUID |
+----+-----------------+-------------------+-------+----------+--------------------------------------+
| 2 | my-application | - | | | cf6aa111-cade-4477-a185-a5c869bc3954 |
+----+-----------------+-------------------+-------+----------+--------------------------------------+
$ for i in seq 1 95; do nova aggregate-add-host my-application guaranine$i; done
... lots of noise ...

Now that I have done that, I am able to request that an image be pre-cached on all the nodes within that aggregate by using the nova aggregate-cache-images command:

$ nova aggregate-cache-images my-application c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2

If all goes to plan, sometime in the future all of the hosts in that aggregate will have fetched the image into their local cache and will be able to use that for subsequent instance creation. Depending on your configuration, that happens largely sequentially to avoid storming Glance, and with so many hosts and a decently-sized image, it could take a while. If I am waiting to deploy my application until all the compute hosts have the image, I need some way of monitoring the process.

Monitoring progress

Many of the OpenStack services send notifications via the messaging bus (i.e. RabbitMQ) and Nova is no exception. That means that whenever things happen, Nova sends information about those things to a queue on that bus (if so configured) which you can use to receive asynchronous information about the system.

The image pre-cache operation sends start and end versioned notifications, as well as progress notifications for each host in the aggregate, which allows you to follow along. Ensure that you have set [notifications]/notification_format=versioned in your config file in order to receive these. A sample intermediate notification looks like this:

{
'index': 68,
'total': 95,
'images_failed': [],
'uuid': 'ccf82bd4-a15e-43c5-83ad-b23970338139',
'images_cached': ['c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2'],
'host': 'guaranine68',
'id': 1,
'name': 'my-application',
}

This tells us that host guaranine68 just completed its cache operation for one image in the my-application aggregate. It was host 68 of 95 total. Since the image ID we used is in the images_cached list, that means it was either successfully downloaded on that node, or was already present. If the image failed to download for some reason, it would be in the images_failed list.

In order to demonstrate what this might look like, I wrote some example code. This is not intended to be production-ready, but will provide a template for you to write something of your own to connect to the bus and monitor a cache operation. You would run this before kicking off the process, it waits for a cache operation to begin, prints information about progress, and then exists with a non-zero status code if there were any errors detected. For the above example invocation, the output looks like this:

$ python image_cache_watcher.py
Image cache started on 95 hosts
Aggregate 'foo' host 95: 100% complete (8 errors)
Completed 94 hosts, 8 errors in 2m31s
Errors from hosts:
guaranine2
guaranine3
guaranine4
guaranine5
guaranine6
guaranine7
guaranine8
guaranine9
Image c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2 failed 8 times

In this case, I intentionally configured eight hosts so that the image download would fail for demonstration purposes.

Future

The image caching functionality in Nova may gain more features in the future, but for now, it is a best-effort sort of thing. With just a little bit of scripting, Ussuri operators should be able to kick off and monitor image pre-cache operations and substantially improve time-to-boot performance for their users.

Category(s): OpenStack
Tags: , , ,

Comments are closed.