Don’t touch my Schiit!

As a work-from-home-er and a music fan, having good audio in my home office is important. Until recently, I’ve used my laptop to feed a two-channel amp and some decent speakers. A large collection of digital audio, combined with MPD means I basically have a personal radio station with no commercials that only plays stuff I like. I just turn on the amp when I want to hear it, turn it off when I don’t. It’s awesome.

Downsides to this setup are generally that if I want to watch a YouTube video on my laptop or do something that otherwise requires sound, I have to turn on the amp, pause the music, and unpause when I’m done. Something dedicated would be nice. I set out to build something like volumio, which turns a raspberry pi into a dedicated audio device. Since the audio out on the pi is really sub-standard, I decided to step up my game. So, I bought a Schiit Modi:

This thing is beautiful and highly-regarded as a top-notch async USB DAC for the (relatively low) $100 price point. Not too much to ask for a well-designed beautiful piece of audio gear, even if it is just a USB sound card.

Unfortunately, no matter what I do, I can’t seem to get it to play nice with the pi. Occasional skips and pops plague the audio, despite trying all the tricks for getting it to work. Since a regular cheapo USB sound card works fine on the pi, I was suspicious. Feeding the Modi from my laptop works fine and the audio is fantastic. However, whilst plugging and unplugging during all my testing, I encountered a nasty flaw.

If I walked across the room and then touched the beautiful metal case of the Modi, the audio would mute and slowly fade back. Occasionally, the Modi would actually jump off the USB bus entirely, and when it came back, audio would be distorted until I power cycled it. Being also into RF-related toys, I recognized this as some sort of static sensitivity, which often means design issues around grounding. Not cool.

Given that I was having so much trouble with my Modi and the pi (even though others seem okay with it), I decided to email Schiit and ask if the static issue was expected, thinking maybe I had a defective unit. To my horror, I got this short response from “Nick T”:

Yep, Modi can be static-sensitive.
The solution: avoid touching it.

I was more than a little surprised. Don’t touch it? Now, I can imagine some situations where static could be affecting the audio path, but the USB side should be totally solid. I can’t think of any reason why jumping off the USB bus is reasonable. Imagine if your printer or external disk did that! I replied and made sure I was clear about the USB side of the issue. Again, I got a short response from Nick:

Yes, it can be static-sensitive in some systems.
The solution: move it where you won’t touch it.

Okay, Nick, I get it. The thing is so beautiful, it needs to be in a glass case. Form over function, right? No thanks, this Schiit is going back.

Schiit charges a 15% restocking fee on the Modi, so by the time I pay for the original shipping and return shipping, I’ll have paid for over a quarter of the device itself. That’s okay, maybe these guys can use the extra money to include some star washers in future products (that’s a grounding joke).

Posted in Hardware, Linux Tagged , , , , ,

A Manual Tune Button for the LDG AT-7000

The LDG AT-7000 automatic tuner is very popular among Icom HF radio owners, especially those with IC-7000 and IC-706 radios. It’s small, simple, and works really well. The problem is, it’s so simple that it has no controls of its own. Since it depends on the radio to tell it when to tune or go into bypass, you can’t use it with any radio that doesn’t support an AH-4 tuner. I have mostly Icom radios, so this is not a problem, but occasionally I want to be able to use the thing with another brand, which isn’t possible. Or is it?

The AH-4 tuning control signals from the radio are super simple, which means it’s pretty easy to put a manual tuning button on the device. If you provide it power somehow, it suddenly becomes a versatile works-with-anything tuner. Here is my finished product:

IMG_6435

 

I started at Surplus Gizmos to find a momentary button. I wanted something that would be small and unobtrusive, as well as look like it belonged there in the first place. As usual, they had the perfect thing. It’s a small momentary pushbutton that uses just a thin black plunger, requiring only a small hole to be drilled in the case.

Disassembly of the AT-7000 starts easily, but quickly becomes a chore with lots of little nuts holding connectors to the chassis. Every single one has a star washer, so be sure not to lose any of these as they’re important for good bonds between the connectors and the chassis. Specifically, this last one on the SO-239 for the radio hits a diode before coming all the way off:

IMG_6430

 

To get it out, slide the board partially out of the chassis, which moves the diode just enough to pull the nut off and free it from the case. On the other side, the connections to the mini-din are accessible. The tune button needs to connect the “start” button to ground in order to control the thing, so jumper on the back of these pins:

IMG_6428

 

While the board is out, the hole for the button should be drilled to avoid getting filings everywhere. Once that is done the board can go back in the chassis and all the tiny fittings can be re-mounted. I found plenty of space to mount my button securely in the back left corner above the radio control connector, which keeps the jumpers short:

IMG_6434

 

At this point, the cover can go back on the case. The only remaining thing that needs to be done is to build a cable for the tuner so that you can power it without plugging it into an Icom radio. The connector is the same as a PS/2 connector, so cutting one off of an old device is the easiest way. By the pinout in the manual, pins 1 and 4 are +12V and Ground respectively.

Controlling the tuner is basically the same as it is when connected to an Icom radio, except that the radio doesn’t automatically transmit for you during the process. Pressing the button in various patterns controls the mode:

  • Less than 500ms: Bypass mode
  • Between 500ms and 2500ms: Memory tune
  • More than 2500ms: Full tune

You have to get the radio to transmit a carrier at a few watts in order for the tuner to have a signal to work with, so put your radio in CW, RTTY, FM, or AM mode and hold down PTT while triggering the appropriate mode. Keep the carrier until the tuning has finished.

The added benefit of this tune button is that you can now reset the tuner’s memory without taking the case off. Holding down the button for a few seconds while powering up the tuner will erase all the stored tune memories.

Posted in Radio Tagged , , , , ,

A brief overview of Nova’s new object model (Part 3)

In parts one and two, I talked about the reasoning for developing an object model inside of Nova, as well as showed a sample implementation for a toy object. In this part, I will examine parts of some “real” objects that are currently under development in the Nova tree.

The biggest object in Nova is (and probably always will be) Instance. It’s fairly complicated though, so let’s start with something a little simpler, such as the SecurityGroup object. Here is the field definition:

class SecurityGroup(base.NovaObject):
    fields = {
        'id': int,
        'name': str,
        'description': str,
        'user_id': str,
        'project_id': str,
    }

There is an integral ID and a few strings, so it is pretty simple. There are two ways to query for those objects, by name or by ID:

@base.remotable_classmethod
def get(cls, context, secgroup_id):
    db_secgroup = db.security_group_get(context, secgroup_id)
    return cls._from_db_object(cls(), db_secgroup)

@base.remotable_classmethod
def get_by_name(cls, context, project_id, group_name):
    db_secgroup = db.security_group_get_by_name(context,
                                                project_id,
                                                group_name)
    return cls._from_db_object(cls(), db_secgroup)

Both of these methods use a common pattern as many of the other objects, which is to query the database for the SQLAlchemy model, and then pass that to a generic function (not shown here) that constructs the new object. Both of these are decorated with remotable_classmethod, which makes them callable from across RPC and at a class level. Querying for a security group would look something like this:

from nova.objects import security_group
secgroup = security_group.SecurityGroup.get(context, 1234)

Unlike the fictitious example in Part 2, there is another way to query for security group objects, which is by a collection based on some common attribute. This is often done by project ID, for example. The objects framework provides a way to easily define an object that is a list of objects, such that the list can be queried directly or over RPC in the same way, and so that the list itself contains inbuilt serialization, which handles the serialization of the objects contained within. See the SecurityGroupList object:

class SecurityGroupList(base.ObjectListBase, base.NovaObject):
    @base.remotable_classmethod
    def get_all(cls, context):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_all(context))

    @base.remotable_classmethod
    def get_by_project(cls, context, project_id):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_by_project(
                context, project_id))

    @base.remotable_classmethod
    def get_by_instance(cls, context, instance):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_by_instance(
                context, instance.uuid))

The first line shows that this special object is not only a NovaObject, but also an ObjectListBase, which provides the special list behavior. Note that the order of inheritance is important, so they must be in the order shown.

The ObjectListBase definition assumes a single field of “objects” and handles typical list-like behaviors like iteration of the things in the objects field, as well as membership (i.e. contains) operations. Thus, all you need to do is fill out the “foo.objects” list, like the _make_secgroup_list() helper function does:

def _make_secgroup_list(context, secgroup_list, db_secgroup_list):
    secgroup_list.objects = []
    for db_secgroup in db_secgroup_list:
        secgroup = SecurityGroup._from_db_object(
            SecurityGroup(), db_secgroup)
        secgroup._context = context
        secgroup_list.objects.append(secgroup)
    secgroup_list.obj_reset_changes()
    return secgroup_list

This method simply populates the “objects” list of the SecurityGroupList object with SecurityGroup objects it constructs from the raw database models provided. It uses the same _from_db_object() helper method as the SecurityGroup object itself. You can use the result of this just like a real list:

secgroups = security_group.SecurityGroupList.get_all(context)
for secgroup in secgroups:
    print secgroup.name

The massive Instance object is similar to what we’ve seen in previous examples, and the SecurityGroup example above. There is a base Instance object, and an InstanceList object to provide an implementation for all the ways we can query for multiple instances at once. It’s too big to show here, but here is a subset of the field definition:

class Instance(base.NovaObject):
    fields = {
        'id': int,
        'user_id': obj_utils.str_or_none,
        'project_id': obj_utils.str_or_none,
        'launch_index': obj_utils.int_or_none,
        'scheduled_at': obj_utils.datetime_or_str_or_none,
        'launched_at': obj_utils.datetime_or_str_or_none,
        'terminated_at': obj_utils.datetime_or_str_or_none,
        'locked': bool,
        'access_ip_v4': obj_utils.ip_or_none(4),
        'access_ip_v6': obj_utils.ip_or_none(6),
 . . . }

Finally, an object with some interesting fields! We see the usual integral ID field at the top, but notice that most of the other fields use “or none” helpers from the utils module. Since many of the fields in the instance can be empty (nullable=True in the database definition), we need to handle “either a string or None” in cases such as user_id. The utils module provides some helpers for datetime and ip address functions, which return datetime.datetime and netaddr.IPAddress objects respectively. Just like the int and str type functions, these take a string and convert it into the complex type when someone does something like this:

inst = instance.Instance()
inst.access_ip_v4 = '1.2.3.4'  # Stored as netaddr.IPAddress('1.2.3.4')

These fields with complicated data types bring us to our first concrete example of something needing special handling during serialization and deserialization. The Instance object contains methods like _attr_scheduled_at_to_primitive() and _attr_scheduled_at_from_primitive() that handle converting the datetime objects to and from strings properly. Handlers (and handler-builders) for these types are provided in the utils module. The IP address fields provide a useful example for illustration, such as this serialization method for the IPv4 address:

    def _attr_access_ip_v4_to_primitive(self):
        if self.access_ip_v4 is not None:
            return str(self.access_ip_v4)
        else:
            return None

This gets called by the object’s serialization method when it encounters the complex IPv4 address field. Although not obvious to the layer above us, the netaddr.IPAddress object can serialize itself through simple string coercion, so we do just that. However, since the field could be None, we want to be sure not to convert that to a string resulting with the string “None” instead of None itself. Luckily, we need no special deserialization because the result of the above string coercion is sufficient to pass to the field’s type function itself, which the deserialization routine will try if no special handler is provided.

In a subsequent part, I will talk about advanced topics like lazy-loading, versioning, and object nesting.

Posted in OpenStack Tagged , , , , , ,

A brief overview of Nova’s new object model (Part 2)

In Part 1, I described the problems that the Unified Object Model aims to solve within Nova. Next, I’ll describe how the infrastructure behind the scenes achieves some of the magic of making things easier for developers to implement their objects.

The first concept to understand is the registry of objects. This registry contains a database of objects that we know about, and for each, what data is contained within and what methods are implemented. In Nova, simply inheriting from the NovaObject base class registers your object through some metaclass magic:

class NovaObject(object):
    """Base class and object factory.

    This forms the base of all objects that can be remoted or instantiated
    via RPC. Simply defining a class that inherits from this base class
    will make it remotely instantiatable. Objects should implement the
    necessary "get" classmethod routines as well as "save" object methods
    as appropriate.
    """
    __metaclass__ = NovaObjectMetaclass

In order to make your object useful, you need to do a few other things in most cases:

  1. Declare the data fields and their types
  2. Provide serialization and de-serialization routines for any non-primitive fields
  3. Provide classmethods to query for your object
  4. Provide a save() method to write your changes back to the database

Notice that nowhere in the list is “provide an RPC API”. That’s one of the many magical powers that you get for free, simply by inheriting from NovaObject and registering your object.

To declare your fields, you need something like the following in your class:

fields = {'foo': int,
          'bar': str,
          'baz': my_other_type_fn,
         }

This magic description of your fields describes the names and data types they should have. The key  of each pair is, of course, the field name, and the value is a function that can coerce data into the proper format and/or raise an exception if that is not possible. Thus, if I set the “foo” attribute to a string of “1″ the integer 1 will be actually stored. If I try to store the string “abc” into the same attribute, I’ll get a ValueError, as you would expect.

The next step is (de-)serialization routines for our attributes. Our “foo” and “bar” attributes are primitives, so we can ignore those, but our “baz” attribute is presumably something more complex, which requires a little more careful handling. So, we define a couple of specially-named methods in our object, which will be called when serialization or de-serialization of that attribute is required:

def _attr_baz_from_primitive(self, value):
    return somehow_deserialize_this(value) # Do something smart

def _attr_baz_to_primitive(self):
    return somehow_serialize_this(self.baz) # Do something smart

Now that our object has a data format and the ability to (de-)serialize itself, we probably need some methods to query the object. Assuming our “foo” attribute is a unique key that we can query by, we will define the following query method:

@remotable_classmethod
def get_by_foo(cls, context, foo):
    # Query the underlying database
    data = query_database_by_foo(foo)

    # Create an instance of our object
    obj = cls()
    obj.foo = data['foo']
    obj.bar = data['bar']
    obj.baz = data['baz']

    # Reset the dirty flags so the caller sees this as clean
    obj.obj_reset_changes()

    return obj

The above example papers over the part about querying the database. Right now, the objects implementations in Nova use the old DB API to do this part, but eventually, the dirty work could reside here in the object methods themselves.

Now, there is some magic here. If I am inside of nova-api (or some other part of nova with direct access to the database) and I call the above classmethod, the decorator is a no-op and  the code within the method runs as you would expect, queries the database, and returns the resulting object. If, however, I am in nova-compute and I call the above method, the decorator actually remotes the call through conductor, executes the method there, and returns the result to me over RPC. Either way, the use of the object is exactly the same in both cases:

obj = MyObj.get_by_foo(context, 123)
print obj.foo # Prints 123
print obj.bar # Prints the value of bar
# etc...

Now, before we’re done, we need to make sure that changes to our object can be put back into the database. Since a “save” happens on an instance, we define a regular instance method, but decorate it as “remotable” like this:

@remotable
def save(self, context):
    # Iterate through items that have changed
    updates = {}
    for field in self.obj_what_changed():
        updates[field] = getattr(self, field)

    # Actually save them to the database
    save_things_to_db_by_foo(self.foo, updates)

    # Reset the changes so that the object is clean now
    self.obj_reset_changes()

This implementation checks to see which of the attributes of the object have been modified, constructs a dictionary of changes, and calls a database method to update those values. This pattern is very common in Nova and should be recognizable by people used to using DB API methods.

Now that we have all of these things built into our object, we can use it from anywhere in nova like this:

obj = MyObj.get_by_foo(context, 123)
obj.bar = 'hey, this is neat!'
obj.save()

One more bit of magic to note is the “sticky context”. Since you queried the object with a context, the object hides the context within itself so that you don’t have to provide it to the save() method (or any other instance methods) for the lifetime of the object. You can, of course, pass a different context to save if you need to for some reason, but if you don’t it will use the one you queried it with.

Nifty, huh? In Part 3, I will break from the world of fictitious objects and examine a real one that is already in the Nova tree, as well as fill out some of the other implementation details required.

Posted in OpenStack Tagged , , , , ,

A brief overview of Nova’s new object model (Part 1)

As discussed at the Havana summit, I have been working with Chris Behrens (and others) on the unified-object-model blueprint for Nova. The core bits of it made their way into the tree a while ago and work is underway to implement the Instance object and convert existing code to use it. This unifies the direct-to-database query methods, as well as the mirrored conductor RPC interfaces into a single versioned object-oriented API. It aims to address a few problems for us:

  1. Letting SQLAlchemy objects escape the DB API layer has caused us a lot of problems because they can’t be sent over RPC efficiently. The new object model is self-serializing.
  2. Objects in the database aren’t versioned (although the schema itself is). This means that sending a primitive representation of it over RPC runs the risk of old code breaking on new schema, or vice versa. The new object model is versioned for both interface methods and data format.
  3. Database isolation (no-db-compute) results in mirroring a bunch of non-OO interfaces in nova-conductor for use by isolated services like nova-compute. The new object model entirely hides the fact that object operations may be going direct or over RPC to achieve the desired result.

Hopefully the first two items above are fairly obvious, but the third may deserve a little explanation. Currently, we have things in the nova/db/sqlalchemy/api.py like the following:

@require_context
def instance_get_by_uuid(context, uuid, columns_to_join=None):
    return _instance_get_by_uuid(context, uuid,
            columns_to_join=columns_to_join)

@require_context
def _instance_get_by_uuid(context, uuid, session=None, columns_to_join=None):
    result = _build_instance_get(context, session=session,
                                 columns_to_join=columns_to_join).\
                filter_by(uuid=uuid).\
                first()

    if not result:
        raise exception.InstanceNotFound(instance_id=uuid)

    return result

def _build_instance_get(context, session=None, columns_to_join=None):
    query = model_query(context, models.Instance, session=session,
                        project_only=True).\
            options(joinedload_all('security_groups.rules')).\
            options(joinedload('info_cache'))
    if columns_to_join is None:
        columns_to_join = ['metadata', 'system_metadata']
    for column in columns_to_join:
        query = query.options(joinedload(column))
    #NOTE(alaski) Stop lazy loading of columns not needed.
    for col in ['metadata', 'system_metadata']:
        if col not in columns_to_join:
            query = query.options(noload(col))
    return query

This is more complicated than it needs to be, but basically we’ve got the instance_get_by_uuid() method at the top, which calls a couple of helpers below to build the SQLAlchemy query that actually hits the database. This is the interface that is used all over nova-api to fetch an instance object from the database by UUID, and used to be used in nova-compute to do the same. In Grizzly, we introduced a new service called nova-conductor, which took on the job of proxying access to these database interfaces over RPC so that services like nova-compute could be isolated from the database. That means we got a new set of versioned RPC interfaces such as the one mirroring the above in nova/conductor/api.py:

def instance_get_by_uuid(self, context, instance_uuid,
                         columns_to_join=None):
    return self._manager.instance_get_by_uuid(context, instance_uuid,
                                              columns_to_join)

I’ll spare you the details, but this turns into an RPC call to the nova-conductor service, which in turn makes the DB API call above, serializes and returns the result. This was a big win in terms of security in that the least-trusted nova-compute services weren’t able to talk directly to the database, and potentially also brought scalability benefits of not having every compute node hold a connection to the database server. However, it meant that we had to add a new API to conductor for every database API, and while those were versioned, it didn’t really solve our problem with versioning the actual data format of what gets returned from those calls.

What we really want is everything using the same interface to talk to the database, whether it can go direct or is required to make an RPC trip. Ideally, services that can talk to the database and those that can’t should be able to pass objects they retrieved from the database to each other over RPC without a lot of fuss. When nova-api pulls an object with the first interface above and wants to pass it to nova-compute which is required to use the second, a horrific serialization process must take place to enable that to happen.

Enter the Unified Object Model. It does all of the above and more. It even makes coffee. (okay, it doesn’t make coffee — yet).

Continued in Part 2.

Posted in OpenStack Tagged , , , , , , ,

A tool for watching Zuul and Jenkins

In my work on OpenStack Nova, I often have multiple patches in flight somewhere on the CI system. When patches are first submitted (or resubmitted) they go into Zuul’s “check” queue for a first pass of the tests. After a patch is approved, it goes into the “gate” queue, which is a serialized merge process across all the projects. Keeping track of one’s patches as they flow through the system can be done simply by waiting for Jenkins to report the job results back into Gerrit and/or for the resulting email notification that will occur  as a result.

I like to keep close watch of my patches, both to know when they’re close to merging, as well as to know early when they’re failing a test. Catching something early and pushing a fix will kill the job currently in progress and start over with the new patch. This is a more efficient use of resources and lowers the total amount of time before Jenkins will vote on the patch in such a case.

Since Zuul provides information about what’s going on, you can go to the status page and see all the queues, jobs, etc. The problem with this is that the information from gerrit (specifically owner and commit title) isn’t merged with the view, making it hard to find your patch in a sea of competing ones.

To make this a little easier on the eyes, I wrote a very hacky text “dashboard” that merges the information from Gerrit and Zuul, and provides a periodically-refreshed view of what is going on. After contributions and ideas from several other folks, it now supports things like watching an entire project, as well as your own contributions, your own starred reviews, etc. Here is what it looked like at one point on the day of this writing:

Selection_021

 

The above was generated with the following command:

python dash.py -u danms -p openstack/nova -r 30 -s -O OR -o danms

Basically, the above says: “Show me any patches owned by danms, or in the project openstack/nova, or starred by danms, refreshed every 30 seconds”. This provides me a nice dashboard of everything going on in Nova, with my own patches highlighted for easier viewing.

Patches of my own are highlighted in green, unless they’re already failing some tests, in which case they’re red. If they are in the gate queue and dependent on something that is also failing tests, they will be yellow (meaning: maybe failing, depending on where the failure was introduced).

You can see the gate queue at the top, which has fifteen items in it, seven of which are matching the current set of view filters, as well as the jobs and their queue positions. Below that is the (unordered) check queue, which has 58 items in it. Each job shows the review number, revision number, title, time-in-queue, and the percentage of test jobs that are finished running. Note that since some jobs take much longer than others, the completion percentage doesn’t climb linearly throughout the life of the job.

The dashboard will also provide a little bit of information about Zuul’s status, when appropriate, such as when it enters queue-only mode prior to a restart, or is getting behind on processing events. This helps quickly identify why a patch might have been waiting for a long time without a vote.

If you’re interested in using the dashboard, you can get the code on github.

Posted in OpenStack Tagged , , , , ,

Dan’s Partial Summary of the Nova Track

Last week, OpenStack developers met in Portland, OR for the Havana Design Summit. Being mostly focused on Nova development, I spent almost all of my time in that track. Below are some observations not yet covered by other folks.

Baremetal

After working hard to get the baremetal driver landed in Nova for the Grizzly release, it looks like the path forward is to actually kick it out to a separate project. Living entirely underneath the nova/virt hierarchy brings some challenges with it, and those were certainly felt by developers and reviewers while trying to get all of that code merged in the first place. The consensus in the room seemed to be that baremetal (as it is today) will remain in Havana, but be deprecated, and then removed in the I release. This will provide deployers time to plan their migration. The virt driver will become a small client of the new service, hopefully reducing the complexity that has to remain in Nova itself.

Live Upgrade

Almost the entire first day was dedicated to the idea of “Live Upgrade” or “Rolling Upgrade”. As OpenStack deployments get larger and more complicated, the need to avoid downtime while upgrading the code becomes very important. The discussions on Monday circled around how we can make that happen in Nova.

One of the critical decisions that came out of those discussions was the need for a richer object format within Nova, and one that can be easily passed over RPC between the various sub-components. In Grizzly, as we moved away from direct database access for much of Nova, we started converting any and all objects to Python primitives. This brought with it a large and inefficient function to convert rich objects to primitives in a general way, and also mostly eliminated the ability to lazy-load additional data from those objects if needed. Further, the structure of the primitives was entirely dependent on the database schema, which is a problem for live upgrade as older nodes may not understand newer schema.

Once we have smarter objects that could potentially insulate the components from the actual database schema, we need to have the ability for the services to speak an older version of the actual RPC protocol until all the components have been upgraded. We’ve had backwards compatibility in the RPC server ends for a while, but being able to clamp to the lowest common version is important for making the transition graceful.

Moving State From Compute to Conductor

Another enemy for a graceful upgrade process is state contained on the compute nodes. Likely the biggest example of this is the various resize and migration tasks that are tracked by nova-compute. Since these are user-initiated and often require user input to finish, it’s likely that any real upgrade will need to gracefully handle situations where these operations are in progress. Further, for various reasons, there are several independent code paths in nova-compute that all accomplish the same basic thing in different ways. The “offline” resize/migrate operations follow a different path from the “live” migrate function, which is also different from the post-failure rebuild/evacuate operation.

Most everyone in the room agreed that the various migrate-related operations needed to be cleaned up and refactored to share as much code as possible, while still achieving the desired result. Further, the obvious choice of moving the orchestration of these processes to conductor provides a good opportunity to start fresh in the pursuit of that goal. This also provides an opportunity to move state out of the compute nodes (of which there are many) to the conductor (of which there are relatively few).

Since nova-conductor will likely house this critical function in the future, the question of how to deal with the fact that it is currently optional in Grizzly came up. Due to a bug in eventlet which can result in a deadlock under load, it is not feasible for many large installations to make the leap just yet. However, expecting that the issue will be resolved before Havana, it may be possible to promote nova-conductor to “not optional” status by then.

Virt Drivers

There was a lot of activity around new and updated virtualization drivers for Nova over the course of the week. There was good involvement from VMware surrounding their driver, both in terms of feature parity with other drivers, as well as new features such as exposing support for clustered resources as Nova host aggregates.

The Hyper-V session was similar, laying out plans to support new virtual disk formats and operations, as well as more complicated HA-related operations, similar to those of VMware.

The final session on the last day was a presentation by some folks at HP that had a proof-of-concept implementation of an oVirt driver for OpenStack. It sounded like this could provide an interesting migration path for folks that have existing oVirt resources and applications dependent on the “Pet VM” strategy to move gracefully to OpenStack.

Posted in OpenStack Tagged , , , , , ,

All your DB are belong to conductor

Well, it’s done. Hopefully.

Over the last year, Nova has had a goal of removing direct database access from nova-compute. This has a lot of advantages, especially around security and rolling upgrade abilities, but also brings some complexity and change. Much of this is made possible by utilizing the new nova-conductor service to proxy requests to the database over RPC on behalf of components that are not allowed to talk to the database directly. I authored many of the changes to either use conductor to access the database, or refactor things to not require it at all. I also had the distinct honor of committing the final patch to functionally disable the database module within the compute service. This will help ensure that folks doing testing between Grizzly-3 and the release will hit a reasonable (and reportable) error message, even if their compute nodes still have access to the database.

Security-wise, nova-compute nodes are the most likely targets for any sort of attack, since they run the untrusted customer workloads. Escaping from a VM or compromising one of the services that runs there previously meant full access to the database, and thus the cluster. By removing the ability (and need) to connect directly to the database, it is significantly easier for an administrator to limit the exposure caused by a compromised compute node. In the future, the gain realized from things like trusted RPC messaging will be even greater, as access to information about individual instances from a given host can be limited by conductor on a need-to-know basis.

From an upgrade point of view, decoupling nova-compute from the database also decouples it from the schema. That means that rolling upgrades can be supported through RPC API versioning without worrying about old code accessing new database schemas directly. No additional modeling is added between the database and the compute nodes, but having the RPC layer there provides a much better way to provide a stable N and N+1 interface.

Of course, neither of the above points imply that your cluster is now secure, or that you can safely do a rolling upgrade from Folsom to Grizzly or Grizzly to Havana. This no-db-compute milestone is one (major) step along the path to enabling both, but there’s still plenty of work to do. Since nova is large and complex, there is also no guarantee that all the direct database accesses have been removed. Since we recently started gating on full tempest runs, the fact that the disabling patch passed all the tests is a really good sign. However, it is entirely likely that a few more things needing attention will shake out of the testing that folks will do between Grizzly-3 and the release.

Let the bug reporting commence!

Posted in Codemonkeying, Linux, OpenStack Tagged , , , ,

Managing ECX 2013 Logistics with Drupal

If you know me, you know that one of my favorite events each year is the Eagle Cap Extreme Sled Dog Race. No, I’m not a big fan of dogs, or sleds, but when you put the two together, you get a really fun annual event in the wilderness of Eastern Oregon. The race runs 200 miles through the Wallowa Mountains near Joseph, OR, and is far from any commercial communications infrastructure. Each year, I go to great effort and expense to travel to the other side of the state with lots of gear and help the other hams provide excellent communications facilities in the woods for a few days, where there would otherwise be none.

The communications team is headed up by an excellent guy with finely-honed organizational skills suitable for running a group responsible for life-safety operations like this. Last year, we discussed a way to make things better, by logging all events and personnel in an electronic system. This would provide a digital record of the entire race, as well as a way to display more in-depth information to the administrative folks at HQ, and potentially to the spectating public. The net control and administrative folks run the race from the community center in Joseph, OR, which has commercial power, heat, and an internet connection, so an electronic system like this is possible, as long as it doesn’t become a liability.

We settled on a drupal-based system, which could be made to provide almost all of what we needed out of the box with things like Views and CCK. Being web-based meant that it was easy to access from multiple devices, and to collaborate on the design and implementation ahead of time. The only non-standard thing we really needed was facilitated by a small module I wrote to provide some additional fields to a few Views queries.

We smoke-tested this system over the summer at the Hells Canyon Relay Race, a shorter and slightly less complicated event, but with many of the same challenges and requirements of ECX. The goal was to have the system accessible in two ways:

  1. The net control folks had to have everything local in the building. This was Eastern Oregon, not Manhattan, and internet access reliable enough to depend on for something like this was not available. To provide this, we ran the MySQL and Apache/PHP servers on a Linux laptop, with a local web browser. This allowed someone to sit at the laptop and operate as an island, if necessary, but also for other laptops in the room to connect to the system as well.
  2. Anyone outside the room that needed access to the system connected to one of my colocated servers to do so. This machine received replication updates from the master copy of the database on the laptop server to keep it in sync, and was marked read-only to avoid anyone inserting something that net control wouldn’t be able to see.

This provided a reasonably robust setup, avoiding the need for external folks to come into the system over the temporary internet connection to the net control building. I used a persistent SSH tunnel to the external server from the laptop, which allowed MySQL traffic in both directions if necessary.

Of course, during the HCR event, the net control station’s internet connection never went down, but the folks working there wouldn’t have even noticed since they appeared to be working only on a local copy of the data at all times. Organizationally, the system was a big success, and things looked good for its use in ECX this year. There was only one concern: what if the net control building got hit by a missile and someone external needed to take over net control responsibilities and start to modify the data on the external server? Since that copy was marked read-only, this wouldn’t be possible.

For ECX, I’ve now set things up with MySQL multi-master replication. This allows both copies to be writable in both places, effectively allowing either to become an island if necessary. As long as the two systems can see each other, anything added to one is also added to the other. If they become separated for a period of time, they’re still functional, and they sync back up as soon as they’re able to talk again. While this is rather nightmarish for a bank or stock market, it’s actually exactly how we want the system to behave in our scenario.

ECX 2013 is only a few weeks away, so we’ll be running this at full scale pretty soon!

Posted in Linux, Radio Tagged , , , ,

9600 baud packet on a Kenwood TK-840

The Kenwood TK-840 is a nice commercial UHF radio that is starting to go for $50-$100 on eBay due to the fact that it is not narrow-band capable. It is happy in the ham bands, has a good screen, excellent rubber-covered buttons, and is quite small and rugged.

While not frequency-agile or field-programmable, it is more than adequate for a fixed installation, such as a remote base or digital mode transceiver. However, not much is available “out there” on how to interface it to a high-speed TNC. While you could use the well-documented mic and speaker jacks for 1200 baud, 9600 baud and faster require low-level access to the radio’s internals.

This rig is similar to (but much newer than) the oft-used Kenwood TK-805, for which there are documents available about general interfacing. This one is pretty common, but it actually only describes high-level audio connections, which aren’t suitable for high-speed stuff. However, you can follow those instructions to remove the speaker jack, jumper the proper traces to enable the internal speaker, and route a cable through the resulting hole in the case for interfacing.

The service manual can be found on repeater-builder, which shows the various boards and the signals on each of the inter-board connectors. In order to make high-speed packet work, you need access to the modulator for TX audio, the detector output for RX audio, ground, and of course PTT to transmit. In the manual, these signals are listed as DI (external modulator input), DEO (detector output), E (earth) and PTT respectively. If you want to power your TNC from the radio, you also need SB (switched battery).

On the main TX/RX board of the radio, on the left side (if facing the front panel), there is a small group of three connectors, two small and one large eight pin socket labeled CN2. The pins on the large connector are numbered from right to left, with the right-most pin being #1 and the left-most being #8. DEO is pin 1, DI is pin 4, and PTT is pin 7.

Since the pins aren’t exposed on the bottom side of the board, I carefully soldered to the top of each as they leave the board and enter the socket. It takes a steady hand and a good eye, as these pins are tiny. The nice thing about the older TK-805 is that all the components are larger and easier to solder to.

To the left of CN2 (above, in the picture) is the external alarm socket, which contains labeled pins for E (ground) and SB (switched battery). I soldered to the top of each pin here to gain access.

With everything buttoned up, I adjusted the TNC for the appropriate amount of drive to get about 3kHz of deviation. This took quite a bit of drive compared to the amateur radio I had been using with the same TNC for testing, but the Kantronics KPC-9612+ has plenty of oomph to accomplish the task. The radio appears to perform quite well with minimal additional tweaking.

Posted in Hardware, Radio Tagged , , ,