A brief overview of Nova’s new object model (Part 1)

As discussed at the Havana summit, I have been working with Chris Behrens (and others) on the unified-object-model blueprint for Nova. The core bits of it made their way into the tree a while ago and work is underway to implement the Instance object and convert existing code to use it. This unifies the direct-to-database query methods, as well as the mirrored conductor RPC interfaces into a single versioned object-oriented API. It aims to address a few problems for us:

  1. Letting SQLAlchemy objects escape the DB API layer has caused us a lot of problems because they can’t be sent over RPC efficiently. The new object model is self-serializing.
  2. Objects in the database aren’t versioned (although the schema itself is). This means that sending a primitive representation of it over RPC runs the risk of old code breaking on new schema, or vice versa. The new object model is versioned for both interface methods and data format.
  3. Database isolation (no-db-compute) results in mirroring a bunch of non-OO interfaces in nova-conductor for use by isolated services like nova-compute. The new object model entirely hides the fact that object operations may be going direct or over RPC to achieve the desired result.

Hopefully the first two items above are fairly obvious, but the third may deserve a little explanation. Currently, we have things in the nova/db/sqlalchemy/api.py like the following:

@require_context
def instance_get_by_uuid(context, uuid, columns_to_join=None):
    return _instance_get_by_uuid(context, uuid,
            columns_to_join=columns_to_join)

@require_context
def _instance_get_by_uuid(context, uuid, session=None, columns_to_join=None):
    result = _build_instance_get(context, session=session,
                                 columns_to_join=columns_to_join).\
                filter_by(uuid=uuid).\
                first()

    if not result:
        raise exception.InstanceNotFound(instance_id=uuid)

    return result

def _build_instance_get(context, session=None, columns_to_join=None):
    query = model_query(context, models.Instance, session=session,
                        project_only=True).\
            options(joinedload_all('security_groups.rules')).\
            options(joinedload('info_cache'))
    if columns_to_join is None:
        columns_to_join = ['metadata', 'system_metadata']
    for column in columns_to_join:
        query = query.options(joinedload(column))
    #NOTE(alaski) Stop lazy loading of columns not needed.
    for col in ['metadata', 'system_metadata']:
        if col not in columns_to_join:
            query = query.options(noload(col))
    return query

This is more complicated than it needs to be, but basically we’ve got the instance_get_by_uuid() method at the top, which calls a couple of helpers below to build the SQLAlchemy query that actually hits the database. This is the interface that is used all over nova-api to fetch an instance object from the database by UUID, and used to be used in nova-compute to do the same. In Grizzly, we introduced a new service called nova-conductor, which took on the job of proxying access to these database interfaces over RPC so that services like nova-compute could be isolated from the database. That means we got a new set of versioned RPC interfaces such as the one mirroring the above in nova/conductor/api.py:

def instance_get_by_uuid(self, context, instance_uuid,
                         columns_to_join=None):
    return self._manager.instance_get_by_uuid(context, instance_uuid,
                                              columns_to_join)

I’ll spare you the details, but this turns into an RPC call to the nova-conductor service, which in turn makes the DB API call above, serializes and returns the result. This was a big win in terms of security in that the least-trusted nova-compute services weren’t able to talk directly to the database, and potentially also brought scalability benefits of not having every compute node hold a connection to the database server. However, it meant that we had to add a new API to conductor for every database API, and while those were versioned, it didn’t really solve our problem with versioning the actual data format of what gets returned from those calls.

What we really want is everything using the same interface to talk to the database, whether it can go direct or is required to make an RPC trip. Ideally, services that can talk to the database and those that can’t should be able to pass objects they retrieved from the database to each other over RPC without a lot of fuss. When nova-api pulls an object with the first interface above and wants to pass it to nova-compute which is required to use the second, a horrific serialization process must take place to enable that to happen.

Enter the Unified Object Model. It does all of the above and more. It even makes coffee. (okay, it doesn’t make coffee — yet).

Continued in Part 2.

Category(s): OpenStack
Tags: , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

 

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>