A brief overview of Nova’s new object model (Part 3)

In parts one and two, I talked about the reasoning for developing an object model inside of Nova, as well as showed a sample implementation for a toy object. In this part, I will examine parts of some “real” objects that are currently under development in the Nova tree.

The biggest object in Nova is (and probably always will be) Instance. It’s fairly complicated though, so let’s start with something a little simpler, such as the SecurityGroup object. Here is the field definition:

class SecurityGroup(base.NovaObject):
    fields = {
        'id': int,
        'name': str,
        'description': str,
        'user_id': str,
        'project_id': str,

There is an integral ID and a few strings, so it is pretty simple. There are two ways to query for those objects, by name or by ID:

def get(cls, context, secgroup_id):
    db_secgroup = db.security_group_get(context, secgroup_id)
    return cls._from_db_object(cls(), db_secgroup)

def get_by_name(cls, context, project_id, group_name):
    db_secgroup = db.security_group_get_by_name(context,
    return cls._from_db_object(cls(), db_secgroup)

Both of these methods use a common pattern as many of the other objects, which is to query the database for the SQLAlchemy model, and then pass that to a generic function (not shown here) that constructs the new object. Both of these are decorated with remotable_classmethod, which makes them callable from across RPC and at a class level. Querying for a security group would look something like this:

from nova.objects import security_group
secgroup = security_group.SecurityGroup.get(context, 1234)

Unlike the fictitious example in Part 2, there is another way to query for security group objects, which is by a collection based on some common attribute. This is often done by project ID, for example. The objects framework provides a way to easily define an object that is a list of objects, such that the list can be queried directly or over RPC in the same way, and so that the list itself contains inbuilt serialization, which handles the serialization of the objects contained within. See the SecurityGroupList object:

class SecurityGroupList(base.ObjectListBase, base.NovaObject):
    def get_all(cls, context):
        return _make_secgroup_list(
            context, cls(),

    def get_by_project(cls, context, project_id):
        return _make_secgroup_list(
            context, cls(),
                context, project_id))

    def get_by_instance(cls, context, instance):
        return _make_secgroup_list(
            context, cls(),
                context, instance.uuid))

The first line shows that this special object is not only a NovaObject, but also an ObjectListBase, which provides the special list behavior. Note that the order of inheritance is important, so they must be in the order shown.

The ObjectListBase definition assumes a single field of “objects” and handles typical list-like behaviors like iteration of the things in the objects field, as well as membership (i.e. contains) operations. Thus, all you need to do is fill out the “foo.objects” list, like the _make_secgroup_list() helper function does:

def _make_secgroup_list(context, secgroup_list, db_secgroup_list):
    secgroup_list.objects = []
    for db_secgroup in db_secgroup_list:
        secgroup = SecurityGroup._from_db_object(
            SecurityGroup(), db_secgroup)
        secgroup._context = context
    return secgroup_list

This method simply populates the “objects” list of the SecurityGroupList object with SecurityGroup objects it constructs from the raw database models provided. It uses the same _from_db_object() helper method as the SecurityGroup object itself. You can use the result of this just like a real list:

secgroups = security_group.SecurityGroupList.get_all(context)
for secgroup in secgroups:
    print secgroup.name

The massive Instance object is similar to what we’ve seen in previous examples, and the SecurityGroup example above. There is a base Instance object, and an InstanceList object to provide an implementation for all the ways we can query for multiple instances at once. It’s too big to show here, but here is a subset of the field definition:

class Instance(base.NovaObject):
    fields = {
        'id': int,
        'user_id': obj_utils.str_or_none,
        'project_id': obj_utils.str_or_none,
        'launch_index': obj_utils.int_or_none,
        'scheduled_at': obj_utils.datetime_or_str_or_none,
        'launched_at': obj_utils.datetime_or_str_or_none,
        'terminated_at': obj_utils.datetime_or_str_or_none,
        'locked': bool,
        'access_ip_v4': obj_utils.ip_or_none(4),
        'access_ip_v6': obj_utils.ip_or_none(6),
 . . . }

Finally, an object with some interesting fields! We see the usual integral ID field at the top, but notice that most of the other fields use “or none” helpers from the utils module. Since many of the fields in the instance can be empty (nullable=True in the database definition), we need to handle “either a string or None” in cases such as user_id. The utils module provides some helpers for datetime and ip address functions, which return datetime.datetime and netaddr.IPAddress objects respectively. Just like the int and str type functions, these take a string and convert it into the complex type when someone does something like this:

inst = instance.Instance()
inst.access_ip_v4 = ''  # Stored as netaddr.IPAddress('')

These fields with complicated data types bring us to our first concrete example of something needing special handling during serialization and deserialization. The Instance object contains methods like _attr_scheduled_at_to_primitive() and _attr_scheduled_at_from_primitive() that handle converting the datetime objects to and from strings properly. Handlers (and handler-builders) for these types are provided in the utils module. The IP address fields provide a useful example for illustration, such as this serialization method for the IPv4 address:

    def _attr_access_ip_v4_to_primitive(self):
        if self.access_ip_v4 is not None:
            return str(self.access_ip_v4)
            return None

This gets called by the object’s serialization method when it encounters the complex IPv4 address field. Although not obvious to the layer above us, the netaddr.IPAddress object can serialize itself through simple string coercion, so we do just that. However, since the field could be None, we want to be sure not to convert that to a string resulting with the string “None” instead of None itself. Luckily, we need no special deserialization because the result of the above string coercion is sufficient to pass to the field’s type function itself, which the deserialization routine will try if no special handler is provided.

In a subsequent part, I will talk about advanced topics like lazy-loading, versioning, and object nesting.

Posted in OpenStack Tagged , , , , , ,

A brief overview of Nova’s new object model (Part 2)

In Part 1, I described the problems that the Unified Object Model aims to solve within Nova. Next, I’ll describe how the infrastructure behind the scenes achieves some of the magic of making things easier for developers to implement their objects.

The first concept to understand is the registry of objects. This registry contains a database of objects that we know about, and for each, what data is contained within and what methods are implemented. In Nova, simply inheriting from the NovaObject base class registers your object through some metaclass magic:

class NovaObject(object):
    """Base class and object factory.

    This forms the base of all objects that can be remoted or instantiated
    via RPC. Simply defining a class that inherits from this base class
    will make it remotely instantiatable. Objects should implement the
    necessary "get" classmethod routines as well as "save" object methods
    as appropriate.
    __metaclass__ = NovaObjectMetaclass

In order to make your object useful, you need to do a few other things in most cases:

  1. Declare the data fields and their types
  2. Provide serialization and de-serialization routines for any non-primitive fields
  3. Provide classmethods to query for your object
  4. Provide a save() method to write your changes back to the database

Notice that nowhere in the list is “provide an RPC API”. That’s one of the many magical powers that you get for free, simply by inheriting from NovaObject and registering your object.

To declare your fields, you need something like the following in your class:

fields = {'foo': int,
          'bar': str,
          'baz': my_other_type_fn,

This magic description of your fields describes the names and data types they should have. The key  of each pair is, of course, the field name, and the value is a function that can coerce data into the proper format and/or raise an exception if that is not possible. Thus, if I set the “foo” attribute to a string of “1” the integer 1 will be actually stored. If I try to store the string “abc” into the same attribute, I’ll get a ValueError, as you would expect.

The next step is (de-)serialization routines for our attributes. Our “foo” and “bar” attributes are primitives, so we can ignore those, but our “baz” attribute is presumably something more complex, which requires a little more careful handling. So, we define a couple of specially-named methods in our object, which will be called when serialization or de-serialization of that attribute is required:

def _attr_baz_from_primitive(self, value):
    return somehow_deserialize_this(value) # Do something smart

def _attr_baz_to_primitive(self):
    return somehow_serialize_this(self.baz) # Do something smart

Now that our object has a data format and the ability to (de-)serialize itself, we probably need some methods to query the object. Assuming our “foo” attribute is a unique key that we can query by, we will define the following query method:

def get_by_foo(cls, context, foo):
    # Query the underlying database
    data = query_database_by_foo(foo)

    # Create an instance of our object
    obj = cls()
    obj.foo = data['foo']
    obj.bar = data['bar']
    obj.baz = data['baz']

    # Reset the dirty flags so the caller sees this as clean

    return obj

The above example papers over the part about querying the database. Right now, the objects implementations in Nova use the old DB API to do this part, but eventually, the dirty work could reside here in the object methods themselves.

Now, there is some magic here. If I am inside of nova-api (or some other part of nova with direct access to the database) and I call the above classmethod, the decorator is a no-op and  the code within the method runs as you would expect, queries the database, and returns the resulting object. If, however, I am in nova-compute and I call the above method, the decorator actually remotes the call through conductor, executes the method there, and returns the result to me over RPC. Either way, the use of the object is exactly the same in both cases:

obj = MyObj.get_by_foo(context, 123)
print obj.foo # Prints 123
print obj.bar # Prints the value of bar
# etc...

Now, before we’re done, we need to make sure that changes to our object can be put back into the database. Since a “save” happens on an instance, we define a regular instance method, but decorate it as “remotable” like this:

def save(self, context):
    # Iterate through items that have changed
    updates = {}
    for field in self.obj_what_changed():
        updates[field] = getattr(self, field)

    # Actually save them to the database
    save_things_to_db_by_foo(self.foo, updates)

    # Reset the changes so that the object is clean now

This implementation checks to see which of the attributes of the object have been modified, constructs a dictionary of changes, and calls a database method to update those values. This pattern is very common in Nova and should be recognizable by people used to using DB API methods.

Now that we have all of these things built into our object, we can use it from anywhere in nova like this:

obj = MyObj.get_by_foo(context, 123)
obj.bar = 'hey, this is neat!'

One more bit of magic to note is the “sticky context”. Since you queried the object with a context, the object hides the context within itself so that you don’t have to provide it to the save() method (or any other instance methods) for the lifetime of the object. You can, of course, pass a different context to save if you need to for some reason, but if you don’t it will use the one you queried it with.

Nifty, huh? In Part 3, I will break from the world of fictitious objects and examine a real one that is already in the Nova tree, as well as fill out some of the other implementation details required.

Posted in OpenStack Tagged , , , , ,

A brief overview of Nova’s new object model (Part 1)

As discussed at the Havana summit, I have been working with Chris Behrens (and others) on the unified-object-model blueprint for Nova. The core bits of it made their way into the tree a while ago and work is underway to implement the Instance object and convert existing code to use it. This unifies the direct-to-database query methods, as well as the mirrored conductor RPC interfaces into a single versioned object-oriented API. It aims to address a few problems for us:

  1. Letting SQLAlchemy objects escape the DB API layer has caused us a lot of problems because they can’t be sent over RPC efficiently. The new object model is self-serializing.
  2. Objects in the database aren’t versioned (although the schema itself is). This means that sending a primitive representation of it over RPC runs the risk of old code breaking on new schema, or vice versa. The new object model is versioned for both interface methods and data format.
  3. Database isolation (no-db-compute) results in mirroring a bunch of non-OO interfaces in nova-conductor for use by isolated services like nova-compute. The new object model entirely hides the fact that object operations may be going direct or over RPC to achieve the desired result.

Hopefully the first two items above are fairly obvious, but the third may deserve a little explanation. Currently, we have things in the nova/db/sqlalchemy/api.py like the following:

def instance_get_by_uuid(context, uuid, columns_to_join=None):
    return _instance_get_by_uuid(context, uuid,

def _instance_get_by_uuid(context, uuid, session=None, columns_to_join=None):
    result = _build_instance_get(context, session=session,

    if not result:
        raise exception.InstanceNotFound(instance_id=uuid)

    return result

def _build_instance_get(context, session=None, columns_to_join=None):
    query = model_query(context, models.Instance, session=session,
    if columns_to_join is None:
        columns_to_join = ['metadata', 'system_metadata']
    for column in columns_to_join:
        query = query.options(joinedload(column))
    #NOTE(alaski) Stop lazy loading of columns not needed.
    for col in ['metadata', 'system_metadata']:
        if col not in columns_to_join:
            query = query.options(noload(col))
    return query

This is more complicated than it needs to be, but basically we’ve got the instance_get_by_uuid() method at the top, which calls a couple of helpers below to build the SQLAlchemy query that actually hits the database. This is the interface that is used all over nova-api to fetch an instance object from the database by UUID, and used to be used in nova-compute to do the same. In Grizzly, we introduced a new service called nova-conductor, which took on the job of proxying access to these database interfaces over RPC so that services like nova-compute could be isolated from the database. That means we got a new set of versioned RPC interfaces such as the one mirroring the above in nova/conductor/api.py:

def instance_get_by_uuid(self, context, instance_uuid,
    return self._manager.instance_get_by_uuid(context, instance_uuid,

I’ll spare you the details, but this turns into an RPC call to the nova-conductor service, which in turn makes the DB API call above, serializes and returns the result. This was a big win in terms of security in that the least-trusted nova-compute services weren’t able to talk directly to the database, and potentially also brought scalability benefits of not having every compute node hold a connection to the database server. However, it meant that we had to add a new API to conductor for every database API, and while those were versioned, it didn’t really solve our problem with versioning the actual data format of what gets returned from those calls.

What we really want is everything using the same interface to talk to the database, whether it can go direct or is required to make an RPC trip. Ideally, services that can talk to the database and those that can’t should be able to pass objects they retrieved from the database to each other over RPC without a lot of fuss. When nova-api pulls an object with the first interface above and wants to pass it to nova-compute which is required to use the second, a horrific serialization process must take place to enable that to happen.

Enter the Unified Object Model. It does all of the above and more. It even makes coffee. (okay, it doesn’t make coffee — yet).

Continued in Part 2.

Posted in OpenStack Tagged , , , , , , ,

A tool for watching Zuul and Jenkins

In my work on OpenStack Nova, I often have multiple patches in flight somewhere on the CI system. When patches are first submitted (or resubmitted) they go into Zuul’s “check” queue for a first pass of the tests. After a patch is approved, it goes into the “gate” queue, which is a serialized merge process across all the projects. Keeping track of one’s patches as they flow through the system can be done simply by waiting for Jenkins to report the job results back into Gerrit and/or for the resulting email notification that will occur  as a result.

I like to keep close watch of my patches, both to know when they’re close to merging, as well as to know early when they’re failing a test. Catching something early and pushing a fix will kill the job currently in progress and start over with the new patch. This is a more efficient use of resources and lowers the total amount of time before Jenkins will vote on the patch in such a case.

Since Zuul provides information about what’s going on, you can go to the status page and see all the queues, jobs, etc. The problem with this is that the information from gerrit (specifically owner and commit title) isn’t merged with the view, making it hard to find your patch in a sea of competing ones.

To make this a little easier on the eyes, I wrote a very hacky text “dashboard” that merges the information from Gerrit and Zuul, and provides a periodically-refreshed view of what is going on. After contributions and ideas from several other folks, it now supports things like watching an entire project, as well as your own contributions, your own starred reviews, etc. Here is what it looked like at one point on the day of this writing:



The above was generated with the following command:

python dash.py -u danms -p openstack/nova -r 30 -s -O OR -o danms

Basically, the above says: “Show me any patches owned by danms, or in the project openstack/nova, or starred by danms, refreshed every 30 seconds”. This provides me a nice dashboard of everything going on in Nova, with my own patches highlighted for easier viewing.

Patches of my own are highlighted in green, unless they’re already failing some tests, in which case they’re red. If they are in the gate queue and dependent on something that is also failing tests, they will be yellow (meaning: maybe failing, depending on where the failure was introduced).

You can see the gate queue at the top, which has fifteen items in it, seven of which are matching the current set of view filters, as well as the jobs and their queue positions. Below that is the (unordered) check queue, which has 58 items in it. Each job shows the review number, revision number, title, time-in-queue, and the percentage of test jobs that are finished running. Note that since some jobs take much longer than others, the completion percentage doesn’t climb linearly throughout the life of the job.

The dashboard will also provide a little bit of information about Zuul’s status, when appropriate, such as when it enters queue-only mode prior to a restart, or is getting behind on processing events. This helps quickly identify why a patch might have been waiting for a long time without a vote.

If you’re interested in using the dashboard, you can get the code on github.

Posted in OpenStack Tagged , , , , ,

Dan’s Partial Summary of the Nova Track

Last week, OpenStack developers met in Portland, OR for the Havana Design Summit. Being mostly focused on Nova development, I spent almost all of my time in that track. Below are some observations not yet covered by other folks.


After working hard to get the baremetal driver landed in Nova for the Grizzly release, it looks like the path forward is to actually kick it out to a separate project. Living entirely underneath the nova/virt hierarchy brings some challenges with it, and those were certainly felt by developers and reviewers while trying to get all of that code merged in the first place. The consensus in the room seemed to be that baremetal (as it is today) will remain in Havana, but be deprecated, and then removed in the I release. This will provide deployers time to plan their migration. The virt driver will become a small client of the new service, hopefully reducing the complexity that has to remain in Nova itself.

Live Upgrade

Almost the entire first day was dedicated to the idea of “Live Upgrade” or “Rolling Upgrade”. As OpenStack deployments get larger and more complicated, the need to avoid downtime while upgrading the code becomes very important. The discussions on Monday circled around how we can make that happen in Nova.

One of the critical decisions that came out of those discussions was the need for a richer object format within Nova, and one that can be easily passed over RPC between the various sub-components. In Grizzly, as we moved away from direct database access for much of Nova, we started converting any and all objects to Python primitives. This brought with it a large and inefficient function to convert rich objects to primitives in a general way, and also mostly eliminated the ability to lazy-load additional data from those objects if needed. Further, the structure of the primitives was entirely dependent on the database schema, which is a problem for live upgrade as older nodes may not understand newer schema.

Once we have smarter objects that could potentially insulate the components from the actual database schema, we need to have the ability for the services to speak an older version of the actual RPC protocol until all the components have been upgraded. We’ve had backwards compatibility in the RPC server ends for a while, but being able to clamp to the lowest common version is important for making the transition graceful.

Moving State From Compute to Conductor

Another enemy for a graceful upgrade process is state contained on the compute nodes. Likely the biggest example of this is the various resize and migration tasks that are tracked by nova-compute. Since these are user-initiated and often require user input to finish, it’s likely that any real upgrade will need to gracefully handle situations where these operations are in progress. Further, for various reasons, there are several independent code paths in nova-compute that all accomplish the same basic thing in different ways. The “offline” resize/migrate operations follow a different path from the “live” migrate function, which is also different from the post-failure rebuild/evacuate operation.

Most everyone in the room agreed that the various migrate-related operations needed to be cleaned up and refactored to share as much code as possible, while still achieving the desired result. Further, the obvious choice of moving the orchestration of these processes to conductor provides a good opportunity to start fresh in the pursuit of that goal. This also provides an opportunity to move state out of the compute nodes (of which there are many) to the conductor (of which there are relatively few).

Since nova-conductor will likely house this critical function in the future, the question of how to deal with the fact that it is currently optional in Grizzly came up. Due to a bug in eventlet which can result in a deadlock under load, it is not feasible for many large installations to make the leap just yet. However, expecting that the issue will be resolved before Havana, it may be possible to promote nova-conductor to “not optional” status by then.

Virt Drivers

There was a lot of activity around new and updated virtualization drivers for Nova over the course of the week. There was good involvement from VMware surrounding their driver, both in terms of feature parity with other drivers, as well as new features such as exposing support for clustered resources as Nova host aggregates.

The Hyper-V session was similar, laying out plans to support new virtual disk formats and operations, as well as more complicated HA-related operations, similar to those of VMware.

The final session on the last day was a presentation by some folks at HP that had a proof-of-concept implementation of an oVirt driver for OpenStack. It sounded like this could provide an interesting migration path for folks that have existing oVirt resources and applications dependent on the “Pet VM” strategy to move gracefully to OpenStack.

Posted in OpenStack Tagged , , , , , ,

All your DB are belong to conductor

Well, it’s done. Hopefully.

Over the last year, Nova has had a goal of removing direct database access from nova-compute. This has a lot of advantages, especially around security and rolling upgrade abilities, but also brings some complexity and change. Much of this is made possible by utilizing the new nova-conductor service to proxy requests to the database over RPC on behalf of components that are not allowed to talk to the database directly. I authored many of the changes to either use conductor to access the database, or refactor things to not require it at all. I also had the distinct honor of committing the final patch to functionally disable the database module within the compute service. This will help ensure that folks doing testing between Grizzly-3 and the release will hit a reasonable (and reportable) error message, even if their compute nodes still have access to the database.

Security-wise, nova-compute nodes are the most likely targets for any sort of attack, since they run the untrusted customer workloads. Escaping from a VM or compromising one of the services that runs there previously meant full access to the database, and thus the cluster. By removing the ability (and need) to connect directly to the database, it is significantly easier for an administrator to limit the exposure caused by a compromised compute node. In the future, the gain realized from things like trusted RPC messaging will be even greater, as access to information about individual instances from a given host can be limited by conductor on a need-to-know basis.

From an upgrade point of view, decoupling nova-compute from the database also decouples it from the schema. That means that rolling upgrades can be supported through RPC API versioning without worrying about old code accessing new database schemas directly. No additional modeling is added between the database and the compute nodes, but having the RPC layer there provides a much better way to provide a stable N and N+1 interface.

Of course, neither of the above points imply that your cluster is now secure, or that you can safely do a rolling upgrade from Folsom to Grizzly or Grizzly to Havana. This no-db-compute milestone is one (major) step along the path to enabling both, but there’s still plenty of work to do. Since nova is large and complex, there is also no guarantee that all the direct database accesses have been removed. Since we recently started gating on full tempest runs, the fact that the disabling patch passed all the tests is a really good sign. However, it is entirely likely that a few more things needing attention will shake out of the testing that folks will do between Grizzly-3 and the release.

Let the bug reporting commence!

Posted in Codemonkeying, Linux, OpenStack Tagged , , , ,

Managing ECX 2013 Logistics with Drupal

If you know me, you know that one of my favorite events each year is the Eagle Cap Extreme Sled Dog Race. No, I’m not a big fan of dogs, or sleds, but when you put the two together, you get a really fun annual event in the wilderness of Eastern Oregon. The race runs 200 miles through the Wallowa Mountains near Joseph, OR, and is far from any commercial communications infrastructure. Each year, I go to great effort and expense to travel to the other side of the state with lots of gear and help the other hams provide excellent communications facilities in the woods for a few days, where there would otherwise be none.

The communications team is headed up by an excellent guy with finely-honed organizational skills suitable for running a group responsible for life-safety operations like this. Last year, we discussed a way to make things better, by logging all events and personnel in an electronic system. This would provide a digital record of the entire race, as well as a way to display more in-depth information to the administrative folks at HQ, and potentially to the spectating public. The net control and administrative folks run the race from the community center in Joseph, OR, which has commercial power, heat, and an internet connection, so an electronic system like this is possible, as long as it doesn’t become a liability.

We settled on a drupal-based system, which could be made to provide almost all of what we needed out of the box with things like Views and CCK. Being web-based meant that it was easy to access from multiple devices, and to collaborate on the design and implementation ahead of time. The only non-standard thing we really needed was facilitated by a small module I wrote to provide some additional fields to a few Views queries.

We smoke-tested this system over the summer at the Hells Canyon Relay Race, a shorter and slightly less complicated event, but with many of the same challenges and requirements of ECX. The goal was to have the system accessible in two ways:

  1. The net control folks had to have everything local in the building. This was Eastern Oregon, not Manhattan, and internet access reliable enough to depend on for something like this was not available. To provide this, we ran the MySQL and Apache/PHP servers on a Linux laptop, with a local web browser. This allowed someone to sit at the laptop and operate as an island, if necessary, but also for other laptops in the room to connect to the system as well.
  2. Anyone outside the room that needed access to the system connected to one of my colocated servers to do so. This machine received replication updates from the master copy of the database on the laptop server to keep it in sync, and was marked read-only to avoid anyone inserting something that net control wouldn’t be able to see.

This provided a reasonably robust setup, avoiding the need for external folks to come into the system over the temporary internet connection to the net control building. I used a persistent SSH tunnel to the external server from the laptop, which allowed MySQL traffic in both directions if necessary.

Of course, during the HCR event, the net control station’s internet connection never went down, but the folks working there wouldn’t have even noticed since they appeared to be working only on a local copy of the data at all times. Organizationally, the system was a big success, and things looked good for its use in ECX this year. There was only one concern: what if the net control building got hit by a missile and someone external needed to take over net control responsibilities and start to modify the data on the external server? Since that copy was marked read-only, this wouldn’t be possible.

For ECX, I’ve now set things up with MySQL multi-master replication. This allows both copies to be writable in both places, effectively allowing either to become an island if necessary. As long as the two systems can see each other, anything added to one is also added to the other. If they become separated for a period of time, they’re still functional, and they sync back up as soon as they’re able to talk again. While this is rather nightmarish for a bank or stock market, it’s actually exactly how we want the system to behave in our scenario.

ECX 2013 is only a few weeks away, so we’ll be running this at full scale pretty soon!

Posted in Linux, Radio Tagged , , , ,

9600 baud packet on a Kenwood TK-840

The Kenwood TK-840 is a nice commercial UHF radio that is starting to go for $50-$100 on eBay due to the fact that it is not narrow-band capable. It is happy in the ham bands, has a good screen, excellent rubber-covered buttons, and is quite small and rugged.

While not frequency-agile or field-programmable, it is more than adequate for a fixed installation, such as a remote base or digital mode transceiver. However, not much is available “out there” on how to interface it to a high-speed TNC. While you could use the well-documented mic and speaker jacks for 1200 baud, 9600 baud and faster require low-level access to the radio’s internals.

This rig is similar to (but much newer than) the oft-used Kenwood TK-805, for which there are documents available about general interfacing. This one is pretty common, but it actually only describes high-level audio connections, which aren’t suitable for high-speed stuff. However, you can follow those instructions to remove the speaker jack, jumper the proper traces to enable the internal speaker, and route a cable through the resulting hole in the case for interfacing.

The service manual can be found on repeater-builder, which shows the various boards and the signals on each of the inter-board connectors. In order to make high-speed packet work, you need access to the modulator for TX audio, the detector output for RX audio, ground, and of course PTT to transmit. In the manual, these signals are listed as DI (external modulator input), DEO (detector output), E (earth) and PTT respectively. If you want to power your TNC from the radio, you also need SB (switched battery).

On the main TX/RX board of the radio, on the left side (if facing the front panel), there is a small group of three connectors, two small and one large eight pin socket labeled CN2. The pins on the large connector are numbered from right to left, with the right-most pin being #1 and the left-most being #8. DEO is pin 1, DI is pin 4, and PTT is pin 7.

Since the pins aren’t exposed on the bottom side of the board, I carefully soldered to the top of each as they leave the board and enter the socket. It takes a steady hand and a good eye, as these pins are tiny. The nice thing about the older TK-805 is that all the components are larger and easier to solder to.

To the left of CN2 (above, in the picture) is the external alarm socket, which contains labeled pins for E (ground) and SB (switched battery). I soldered to the top of each pin here to gain access.

With everything buttoned up, I adjusted the TNC for the appropriate amount of drive to get about 3kHz of deviation. This took quite a bit of drive compared to the amateur radio I had been using with the same TNC for testing, but the Kantronics KPC-9612+ has plenty of oomph to accomplish the task. The radio appears to perform quite well with minimal additional tweaking.

Posted in Hardware, Radio Tagged , , ,

Field Day 2012

This past weekend was the 2012 ARRL Field Day, which is the biggest amateur radio event of the year in the US. The reason it’s called field day is that you’re supposed to get out into the field and operate on temporary equipment, power, etc. Lots of folks do it from their homes or some other established location, but last year we decided to make a point of getting out and doing it “for real.” This year, we returned to the same spot and did it again.

Unlike our previous trip, the weather did not cooperate this time. A storm was moving in from the Pacific on Friday, which gave us almost constant rain, heavy at times. This made it relatively challenging to get camp set up without getting all of our “inside gear” wet. Luckily, we had two large canopies (like last year) which allowed us to create a dry spot to set up the more sensitive sleeping tents. We were able to keep our sleeping quarters dry and comfortable the entire time, which makes everything else easier.

Starting a fire on the saturated ground was a bit challenging, but we brought dry wood and paper and were able to get it going much quicker than expected. Taylor was even able to enjoy a glass of wine around the fire during one of the breaks in the rain.

Operating the radios in these conditions required a little more care as well, to keep things dry. My large operating tent is really intended to protect from sun, not rain, and thus it was a little leaky during the heavier periods of precipitation. However, some creative use of tarps and other devices allowed us to keep our equipment protected. Luckily, we were able to throw the expensive ones back into the pelican cases at night in case the wind kicked up and blew rain into the tent.

This year we both used IC-7000 radios, but with a set of band-pass filters I quickly assembled the week before the trip. These helped a lot and allowed us to work QRO on different bands without interfering with each other. Power came from a Honda EU2000 inverter generator, which we used to charge our A123 batteries (for the radios) and our single 100AHr gel cell (for the computers). Again we used FDLog for logging and duplicate checking, over an ad-hoc wireless network.

This year we made 196 contacts, up from 122 last year. Given how much of the time we were away from the radios dealing with the weather, we’re quite happy with the result. We definitely plan to do it again next year, although we might shoot for a less-rainy part of the state than the Coast Range!

Posted in Radio Tagged , , , , ,

Low-latency continuous rsync

Okay, so “lowish-latency” would be more appropriate.

I regularly work on systems that are fairly distant, over relatively high-latency links. That means that I don’t want to run my editor there because 300ms between pressing a key and seeing it show up is maddening. Further, with something as large as the Linux kernel, editor integration with cscope is a huge time saver and pushing enough configuration to do that on each box I work on is annoying. Lately, the speed of the notebook I’m working from often outpaces that of the supposedly-fast machine I’m working on. For many tasks, a four-core, two threads per core, 10GB RAM laptop with an Intel SSD will smoke a 4GHz PowerPC LPAR with 2GB RAM.

I don’t really want to go to the trouble of cross-compiling the kernels on my laptop, so that’s the only piece I want to do remotely. Thus, I want to have high-speed access to the tree I’m working on from my local disk for editing, grep’ing, and cscope’ing. But, I want the changes to be synchronized (without introducing any user-perceived delay) to the distant machine in the background for when I’m ready to compile. Ideally, this would be some sort of rsync-like tool that uses inotify to notice changes and keep them synchronized to the remote machine over a persistent connection. However, I know of no such tool and haven’t been sufficiently annoyed to sit down and write one.

One can, however, achieve a reasonable approximation of this by gluing existing components together. The inotifywait tool from the inotify-tools provides a way to watch a directory and spit out a live list of changed files without much effort. Of course, rsync can handle the syncing for you, but not with a persistent connection. This script mostly does what I want:



if [ -z "$DEST" ]; then exit 1; fi

inotifywait -r -m -e close_write --format '%w%f' . |\
while read file
        echo $file
	rsync -azvq $file ${DEST}/$file
	echo -n 'Completed at '

That will monitor the local directory and synchronize it to the remote host every time a file changes. I run it like this:

sync.sh dan@myhost.domain.com:my-kernel-tree/

It’s horribly inefficient of course, but it does the job. The latency for edits to show up on the other end, although not intolerable, is higher than I’d like. The boxes I’m working on these days are in Minnesota, and I have to access them over a VPN which terminates in New York. That means packets leave Portland for Seattle, jump over to Denver, Chicago, Washington DC, then up to New York before they bounce back to Minnesota. Initiating an SSH connection every time the script synchronizes a file requires some chatting back and forth over that link, and thus is fairly slow.

Looking at how I might reduce the setup time for the SSH links, I stumbled across an incredibly cool feature available in recent versions of OpenSSH: connection multiplexing. With this enabled, you pay the high setup cost only the first time you connect to a host. Subsequent connections re-use the same tunnel as the first one, making the process nearly instant. To get this enabled for just the host I’m using, I added this to my ~/.ssh/config file:

Host myhost.domain.com
    ControlMaster auto
    ControlPath /tmp/%h%p%r

Now, all I do is ssh to the box each time I boot it (which I would do anyway) and the sync.sh script from above re-uses that connection for file synchronization. It’s still not the same as a shared filesystem, but it’s pretty dang close, especially for a few lines of config and shell scripting. Kernel development on these distant boxes is now much less painful.

Posted in Codemonkeying Tagged , ,