All your DB are belong to conductor

Well, it’s done. Hopefully.

Over the last year, Nova has had a goal of removing direct database access from nova-compute. This has a lot of advantages, especially around security and rolling upgrade abilities, but also brings some complexity and change. Much of this is made possible by utilizing the new nova-conductor service to proxy requests to the database over RPC on behalf of components that are not allowed to talk to the database directly. I authored many of the changes to either use conductor to access the database, or refactor things to not require it at all. I also had the distinct honor of committing the final patch to functionally disable the database module within the compute service. This will help ensure that folks doing testing between Grizzly-3 and the release will hit a reasonable (and reportable) error message, even if their compute nodes still have access to the database.

Security-wise, nova-compute nodes are the most likely targets for any sort of attack, since they run the untrusted customer workloads. Escaping from a VM or compromising one of the services that runs there previously meant full access to the database, and thus the cluster. By removing the ability (and need) to connect directly to the database, it is significantly easier for an administrator to limit the exposure caused by a compromised compute node. In the future, the gain realized from things like trusted RPC messaging will be even greater, as access to information about individual instances from a given host can be limited by conductor on a need-to-know basis.

From an upgrade point of view, decoupling nova-compute from the database also decouples it from the schema. That means that rolling upgrades can be supported through RPC API versioning without worrying about old code accessing new database schemas directly. No additional modeling is added between the database and the compute nodes, but having the RPC layer there provides a much better way to provide a stable N and N+1 interface.

Of course, neither of the above points imply that your cluster is now secure, or that you can safely do a rolling upgrade from Folsom to Grizzly or Grizzly to Havana. This no-db-compute milestone is one (major) step along the path to enabling both, but there’s still plenty of work to do. Since nova is large and complex, there is also no guarantee that all the direct database accesses have been removed. Since we recently started gating on full tempest runs, the fact that the disabling patch passed all the tests is a really good sign. However, it is entirely likely that a few more things needing attention will shake out of the testing that folks will do between Grizzly-3 and the release.

Let the bug reporting commence!

Category(s): Codemonkeying, Linux, OpenStack
Tags: , , , ,

One Response in another blog/article

  1. […] a new service, nova-conductor. Some previous posts about this service can be found here and here. This post is intended to provide some additional insight into how this service should be deployed […]