Right Angles

October 5, 2015
10:05
by: Dan

Upgrades in Nova: The Details

Over the last few years, the Nova project has spent a lot of time on improving its story around live upgrades. We’ve made a lot of progress and done so with minimal disruption as we were figuring out how to make it work. As some of this work starts to spread in the wind and pollinate other projects, the details of how all the pieces fit together is hard to communicate. The details of how Nova pulls off some of its tricks in this area live primarily in the heads of a few people.

In OpenStack, projects are expected to maintain a minimum level of what I would call “offline upgrade-ability”. Historically that has meant things like config file compatibility between releases, such that an operator with a valid Juno config should be able to upgrade to Kilo without changes. Database schema migrations (performed offline) have generally been something we know how to do to avoid preventing someone with a large deployment from having to rebuild their data after upgrading. Careful handling of things like deprecations will soon be table stakes.

The goal of our work on live upgrades is to avoid having to take down any component of Nova for an extended period of time. Upgrades of a large cloud take time, hit roadblocks, and uncover bugs. Any phase that requires some service to be down to make a change means that if you get stuck in that phase, something isn’t working and customers are perusing your competitor’s website while they wait.

In a series of posts to follow, I hope to detail and document some of the mechanics with examples from the Nova code and provide insight to why things work the way they do. The target audience of these posts are developers in other OpenStack projects looking to follow in Nova’s footsteps. As always, not all projects will want to do things the way Nova did, and that’s fine. The details are offered here for people that are interested, but are not intended to define the way all projects should do things.

The approach taken to make live upgrades work on something as complicated as Nova is actually composed of many more specific strategies across many subsystems. It is often claimed that adopting one library or protocol will magically make upgrades work. However, in my experience, there is no silver bullet for this problem. Pulling it off requires substantial changes, most of which are in the culture. Some mechanical things need to be done at the database and RPC layers, but in the end, it requires the people writing and reviewing the code to understand the problem space and defend the project against changes that will break upgrades.

What follows this post is hopefully enough of a roadmap for other projects to wrap their heads around the complexity of the problem and start making progress.

Posted in OpenStack Tagged nova, nova-upgrade-details, upgrades

June 26, 2015
16:12
by: Dan

Upgrading Nova to Kilo with minimal downtime

Starting in Icehouse, Nova gained the ability to do partial live upgrades. This first step meant that control services (which are mostly stateless) could be upgraded along with database schema before any of the compute nodes. After that step was done, individual compute nodes could be upgraded one-by-one, even migrating workloads off to newer compute nodes in order to facilitate hardware or platform upgrades in the process.

In the Kilo cycle, Nova made a concerted effort to break that initial atomic chunk of work into two pieces: the database schema upgrades and the code upgrades of the control services. It’s our first stab at this, so it’s not guaranteed to be perfect, but initial testing shows that it worked.

What follows is a high-level guide for doing a rolling Nova upgrade, using Juno-to-Kilo as the example. It’s not detailed enough to blindly follow, but is more intended to give an overview of the steps involved. It’s also untested and not something you should do on a production machine — test this procedure in your environment first and prove (to yourself) that it works.

The following steps also make some assumptions:

You’re using nova-network. If you’re using neutron, you are probably okay to do this, but you will want to use care around the compute-resident neutron agent(s) if you’re running them. If you’re installing system-level packages and dependencies, it may be difficult to upgrade Nova or Neutron packages without upgrading both.
You’re running non-local conductor (i.e. you have nova-conductor services running and [conductor]/use_local=False in your config). The conductor is a major part of insulating the newer and older services in a meaningful way. Without it, none of this will work.

Step 0: Prepare for what is coming

In order to have multiple versions of nova code running, there is an additional price in the form of extra RPC traffic between the compute nodes and the conductors. Compute nodes will start receiving data they don’t understand and they will start kicking that data back to conductor for help translating it into a format they understand. That may mean you want to start up some extra conductor workers to handle this load. How many additional workers you will need depends on the characteristics of your workload and there is really no rule of thumb to go by here. Also, if you plan to convert your compute nodes fairly quickly, you may need only a little extra overhead. If you have some stubborn compute nodes that will continue to run older code for a long time, they will be a constant source of additional traffic until they’re upgraded.

Further, as soon as you start running Kilo code, the upgraded services will be doing some online data migrations. That will generate some additional load on your database. As with the additional conductor load, the amount and impact depends on how active your cloud is and how much data needs to be migrated.

Step 1: Upgrade the schema

For this, you’ll need to get a copy of Kilo code installed somewhere. This should be a mostly temporary location that has access to the database and won’t affect any other running things. Once you’ve done that, you should be able to apply the schema updates:

$ nova-manage db sync

This should complete rather quickly as it does no invasive data migration or examination.

You should grab the code of whatever you’re going to deploy and run the database sync from that. If you’re installing from pip, use the same package to do this process. If you’re deploying distro packages, use those. Just be careful, regardless of where you do this, to avoid service disruption. It’s probably best to spin up a VM or other sandbox environment from which to perform this action.

Step 2: Pin the compute RPC version

This step ensures that everyone in the cloud will speak the same version of the compute RPC API. Right now, it won’t change anything, but once you start upgrading services, it will ensure that newer services will send messages that are compatible with the old ones.

In nova.conf, set the following pin:

[upgrade_levels]
compute = juno

You should do this on any node that could possibly talk to a compute node. That includes the compute nodes themselves, as they do talk to other compute nodes as well. If you’re not sure which services talk to compute nodes, just be safe and do this everywhere.

You don’t technically need to restart all your services after you’ve made this change, since it’s really mostly important for the newer code. However, it wouldn’t hurt to make sure that everything is happy with this version pin in place before you proceed.

I’ll also point out here that juno is an alias for 3.35. We try to make sure the aliases are there for the given releases, but this doesn’t always happen and it sometimes becomes invalid after changes are backported. This obviously is not a nice user experience, but it is what it is at this point. You can see the aliases, and history, defined in the compute/rpcapi.py file.

Step 3: Upgrade the control services

This is the first step where you actually deploy new code. Make sure that you don’t accidentally overwrite the changes you made in step 2 to your nova.conf, or that your new one includes the version pin. Nova, by convention, supports running a new release with the old release’s config file so you should be able to leave that in place for now.

In this step, you will upgrade everything but the compute nodes. This means nova-api, nova-scheduler, nova-conductor, nova-consoleauth, nova-network, and nova-cert. In reality, this needs to be done fairly atomically. So, shut down all of the affected services, roll the new code, and start them back up. This will result in some downtime for your API, but in reality, it should be easy to quickly perform the swap. In later releases, we’ll reduce the pain felt here by eliminating the need for the control services to go together.

Step 4: Watch and wait

At this point, you’ve got control services running on newer code with compute nodes running old stuff. Hopefully everything is working, and your compute nodes are slamming your conductors with requests for help with the newer versions of things.

Things to be on the lookout for are messages in the compute logs about receiving messages for an unsupported version, as well as version-related failures in the nova-api or nova-conductor logs. This example from the compute log is what you would see, along with some matching messages on the sending-side of calls that expect to receive a response:

Exception during message handling: Endpoint does not support RPC version 4.0. Attempted method: build_and_run_instance

If you see these messages, it means that either you set the pin to an incorrect value, or you missed restarting one of the services to pick up the change. In general, it’s the sender who sent the bad message, so if you see this on a compute node, suspect a conductor or api service as the culprit. Not all messages that the senders send expect a response, so trying to find the bad sender by matching up a compute error with an api error, for example, will not always be possible.

If everything looks good at this point, then you can proceed to the next step.

Step 5: Upgrade computes

This step may take an hour or a month, depending on your requirements. Each compute node can be upgraded independently to the new code at this point. When you do, it will just stop needing to ask conductor to translate things.

Don’t unpin the compute version just yet, even on upgraded nodes. If you do any resize/migrate/etc operations, a newer compute will have to talk to an older one, and the version pin needs to remain in place in order for that to work.

When you upgrade your last compute node, you’re technically done. However, the steps after 5 include some cleanup and homework before you can really declare completion and have that beer you’re waiting for.

Step 6: Drop the version pins

Once all the services are running the new code, you can remove (or comment out) the compute line in the upgrade_levels section and restart your services. This will cause all the services to start sending kilo-level messages. You could set this to “kilo” instead of commenting it out, but it’s better to leave it unset so that the newest version is always sent. If we were to backport something that was compatible with all the rest of kilo, but you had a pin set, you might be excluded from an important bug fix.

Because all of your services are new enough to accept old and new messages, you can stage the restarts of your services however you like in order to apply this change. It does not need to be atomic.

Step 7: Perform online data migrations

This step is your homework. There is a due date, but it’s a long way off. So, it’s more like a term project. You don’t have to do it now, but you will have to do it before you graduate to Liberty. If you’re responsible and mindful, you’ll get this out of the way early.

If you’re a seasoned stacker, you probably remember previous upgrades where the “db sync” phase was long, painful, and intense on the database. In Kilo, we’ve moved to making those schema updates (hopefully) lightweight, and have moved the heavy lifting to code that can execute at runtime. In fact, when you completed Step 3, you already had some data migrations happening in the background as part of normal operation. As instances are loaded from and saved to the database, those conversions will happen automatically. However, not everything will be migrated this way.

Before you will be able to move to Liberty, you will have to finish all your homework. That means getting all your data migrated to the newer formats. In Kilo, there is only one such migration to be performed and there is a new nova-manage command to help you do it. The best way to do this is to run small chunks of the upgrade over time until all of the work is done. The size of the chunks you should use depend on your infrastructure and your tolerance for the work being done. If you want to do ten instances at a time, you’d do this over and over:

$ nova-manage migrate_flavor_data --max-number 10

If you have lots of un-migrated instances, you should see something like this:

10 instances matched query, 10 completed

Once you run the command enough times, you should get to the point where it matches zero instances, at which point you know you’re done. If you start getting to the point where you have something like this:

7 instances matched query, 0 completed

…then you still have work to do. Instances that are in a transitional state (such as in the middle of being resized, or in ERROR state) are normally not migrated. Let these instances complete their transition and re-run the migration. Eventually you should be able to get to zero.

NOTE: The invocation of this migration function is actually broken in the Kilo release. There are a couple of backport patches proposed that will fix it, but it’s likely not fixed in your packages if you’re reading this soon after the release. Until then, you have a pass to not work on your homework until your distro pulls in the fixes[1][2].

Summary and Next Steps

If you’ve gotten this far, then you’ve upgraded yourself from Juno to Kilo with the minimal amount of downtime allowed by the current technology. It’s not perfect yet, but it’s a lot better than having to schedule the migration at a time where you can tolerate a significant outage window for database upgrades, and where you can take every node in your cluster offline for an atomic code deployment.

Going forward, you can expect this process to continue to get easier. Ideally we will continue to reduce the number of services that need to be upgraded together, including even partial upgrades of individual services. For example, right now you can’t really upgrade your API nodes separate from your conductors, and certainly not half of your conductors before the other half. However, that reality does exist in the future, and will allow a much less impactful transition.

As I said at the beginning, this is new stuff. It should work, and it does in our gate testing. However, be diligent about testing it on non-production systems and file bugs against the project if you find gaps and issues.

Posted in OpenStack Tagged juno, kilo, nova, openstack, upgrades

March 22, 2015
21:49
by: Dan

Execuhitch on a 2012 BMW X5 M-Sport

Apparently, the X5 is about the most complicated “utility vehicle” to use for towing. I don’t know why BMW doesn’t make this easier, but they don’t. Their OEM hitch option is clearly an afterthought and looks terrible, especially on an M-Sport. Luckily, execuhitch (invisihitch) makes an excellent option for X5 owners. I recently put one of these on my vehicle in a small number of hours (by myself with no helpers). Although the instructions they include are mostly great, there were some discrepancies, perhaps related to the M-Sport specifically. Below are a few notes on installation.

Here is what the hitch looks like directly out of the box:

It is an impressively robust hunk of metal. I’ve had (and installed some) hitches in the past and this one definitely looks overengineered. Overkill never fails, right? Right!

The first thing I’ll offer is about the tailgate. Almost all of the work you need to do for this project requires the tailgate to be open about 45 degrees. Save yourself time (and pinched fingers) and use a strap to hold it open. I just used a ratchet strap between the locking loop and the middle rear seat headrest.

The M-Sport bumper removal process may be different than the regular X5. Reports I read said that there were four clips holding the top of the bumper to the rear of the vehicle directly under the tailgate. On the vehicle there are squeeze tabs that are supposed to release the clips, freeing the bumper. On the M-Sport, the bumper doesn’t appear to use these, and no amount of squeezing would release it. After looking from underneath, there are different attachment points for these four clips on the M-Sport. Simply pulling directly back at each clip will free them safely and with a satisfying click. So the order is: free the left and right sides near the wheel wells, then pull the bumper straight back at each of the four clips.

Once you get the rear bumper cover off, this is what it looks like:

You can see the black bumper core that attaches with four bolts on each side, which will be removed and replaced with the hitch. Note that on mine, the top bolts were covered with plastic covers. This tripped me up for a bit, but I found a single post in a forum somewhere indicating that these are just finger tight and can be removed, revealing the nuts underneath.

Something else that didn’t seem to be mentioned in any of the instructions was the presence of this plastic support beam that spanned the underside of the bumper. This might be an M-Sport thing, but the beam will get in the way when attaching the hitch. Luckily, it is split into three pieces and the right and left sides are easily removed to provide room to install the hitch.

Simply remove the 8mm screws on the outer side and squeeze the clips under the inner side to remove the piece. When the hitch is installed, you can reinstall these support pieces.

When the hitch is installed on the vehicle, it looks like this:

The only bit of this process that requires a little bit of artwork is trimming the underside of the bumper. This makes room for the lighting connector and the receiver opening.

The trimmed piece is really undetectable unless you’re laying under the vehicle. The finished product looks like this:

Securing the 2″ square “bike rack accessory” or the ball mount to the receiver gives an incredibly satisfying “click”. Once seated, it feels like it’s welded to the vehicle, yet releasing it is easy and smooth. Both of these look and feel extremely solid, well-crafted, and have nearly a piano-black mirror finish:

I opted for the OEM wiring harness and vehicle programming to retain the trailer-specific stability control capabilities of the vehicle. The instructions and overall process for installing the OEM harness was amazingly complicated, despite being direct from the manufacturer and intended for retrofit applications like this. Once again, BMW’s definition of “towing preparation” (which this vehicle supposedly has) is questionable.

Posted in Cars Tagged bmw, hitch, m-sport, trailer, x5

March 20, 2015
00:38
by: Dan

2007 Audi A4 LED DRL Conversion

There is no shortage of writeups online about converting a B7’s ugly yellow daytime running lights to LEDs. It was Nick’s that made me decide to do the actual conversion, and part of the reason why I decided to add yet-another-writeup to the internet’s large collection of them. Most seem to ignore a couple of the important details about doing this, namely how and where to mount the resistors safely, as well as the proper sizing and power dissipation calculations required to select them.

As some background, modern fancy cars monitor the circuit of each safety-related light bulb so that it can indicate to the driver when one burns out. It does this by making sure that, when activated, the bulb draws a reasonable amount of current, commensurate with a properly-sized and working incandescent bulb. When you replace such a bulb with an LED, it draws so little power compared to the original that the circuit monitor thinks the bulb is burned out. Most times, the LED works fine, but the car incessantly warns the driver of the problem.

So, we need to select a resistor to put in parallel with each bulb that will draw (and waste) enough power to fool the monitoring circuit. That will get about as hot as a regular bulb does when it’s on, which means that it’s going to get pretty hot. The original bulb for the DRLs in this car is an 1156, which is a 26 watt bulb. At 15 volts (the maximum you’d expect from your alternator when the car is running), that’s 1.73 amps:

26 watts / 15 volts = 1.73 amps

If we wanted to pick a resistor that would consume 1.73 amps at 15 volts, we’d need an 8.6 ohm resistor:

15 volts / 1.73 amps = 8.6 ohms

Now, they make 8 ohm resistors that are rated at 25 watts, and we could use one. However, that would be running the resistor in the red zone all the time. Also, these wire-wound resistors are only rated at the stamped power when mounted on a substantial metal heatsink (check your datasheet). Assuming you’re not going to mount it on an ideal heatsink, AND assuming you’re going to mount it near a hot high-performance engine, AND expect to drive in the summer, the red zone is not where you want to be.

Even a 50 watt 8 ohm resistor is cutting it too close, in my opinion, given the environment. It’s scary that a lot of people seem to be using 6 ohm 25 watt resistors for this. At 15 volts, that’s almost 38 watts, which is more than the resistor is rated for in ideal conditions:

15 volts / 6 ohms = 2.5 amps x 15 volts = 37.5 watts

This is, in my opinion, not a good idea (sorry Nick!). In my setup, I chose a 15 ohm resistor rated at 50 watts. At 15 volts, we pull exactly 1 amp and dissipate 15 watts:

15 volts / 15 ohms = 1 amp x 15 volts = 15 watts

Since the resistor is rated at 50 watts, this is an 85% margin, which is a lot of safety padding. Pulling 1 amp is enough to fool the circuit monitor in my car.

Given that 15 watts continuous is still plenty of heat to dissipate, I wanted to find places to mount the resistors such that they had adequate air flow, and were physically mounted on something else large and metal to help with dissipation. On the driver’s side, the bracket that holds the coolant tank is just begging to help, with ample free space and pre-existing unused mounting holes:

On the passenger side, there are many fewer obvious options. There is a large space under a vented cover in front of the cabin air intake. There is a nice blank spot in the mid-bay separating wall for mounting. I made a small bracket out of some metal strapping, bolted it to the wall, and mounted the resistor on it. The bracket and the wall help to draw some heat away from the resistor.

Each resistor needs to be connected to ground on one side and the DRL circuit on the other. The passenger side is an easy trip to the negative terminal on the battery. The driver’s side has an easily-accessed bolt in to the front quarter panel that works nicely.

For tapping into the DRL circuit, I didn’t want to pierce the watertight housing of the headlamp assemblies. So, I pulled back a little bit of the rubber cover on the cable that plugs into the housing, found the wire, and used quick splices to tap in. On each side, the wires are labeled on the connector. Wire 12 is what you want.

Even only dissipating 15 watts with a resistor rated at 50 watts, after a couple hours of the lights being on in the garage for testing (hood open, engine not running), the resistors are plenty hot:

Imagine how hot the 25W 6 ohm resistors that most people use would get!

After the conversion, the car is happy and the daytime running lights look so much better. Just to be clear, this is a 2007 Audi A4 S-Line 3.2 Quattro with the color instrument cluster display.

Posted in Cars Tagged a4, audi, drl, led, lights, mods, s-line

January 31, 2015
00:37
by: Dan

The magical sleeping development machine

For years now, I’ve preferred to do all my development on a dedicated machine. This means my desktop (or laptop, depending) is just a glorified terminal-and-editor-running appliance. It probably comes from years of crashing boxes during kernel development, but it is also relevant for something like Nova, where running unit tests will consume all resources. The last thing I want is for unit tests to slow down my email and web browsing activities while I wait for them to complete.

A few years ago, I started automating the sleep activities of my development machine. There’s really no reason for that box to run all night when I’m not using it, especially since it’s beefy (and thus power-thirsty). I used to sleep the machine on a schedule which was mostly compatible with my work schedule. Lately, I’ve been using the idle times of any login sessions to determine idle-ness and sleeping the box after they pass some threshold. My only-works-for-me hack looks like this. This behavior is pretty handy, because once I wake it up, it stays up until I stop using it. At the time I started doing this, I would just have my always-running workstation machine wake it up right before I normally stop working so the box was ready for me in the morning. By putting the MAC of the development box into /etc/ethers, all I needed to wake it up was:

$ sudo etherwake theobromine

The problems with this are:

When I’m traveling or on vacation, the workstation keeps waking the development box for no reason, unless I remember to disable the cron job.
If I’m in another timezone accessing the development box remotely, it’s not online at the right times.
If I needed to wake the machine off schedule, I needed to ssh to an always-on machine on the network and wake it up.
Once I moved my desktop to a platform that can reliably suspend and resume a graphical environment, I stopped having an obvious place to run the wake script.

Since the development box sleeps according to lack of demand, what I really wanted was a similar demand-based policy for waking it up. Given that I only access the machine over the network, all I really need was to monitor the network from a machine that is always on, looking for something trying to contact the box. If I know the development box is down and I see such a request, I can issue the wakeup packet on behalf of the demanding machine without it having to know about the sleep schedule. I ended up with this. The logging looks like this:

2015-01-30 16:04:43,641 INFO Pinging theobromine [192.168.201.150]: Alive: True (46 sec ago)
2015-01-30 16:04:43,658 INFO theobromine seen alive (since 46 sec ago)

          <sleep occurs>

2015-01-30 16:06:22,710 INFO Pinging theobromine [192.168.201.150]: Alive: False (94 sec ago)
2015-01-30 16:25:21,065 INFO Pinging theobromine [192.168.201.150]: Alive: False (1232 sec ago)

          <attempt to ssh to theobromine from desktop>

2015-01-30 16:26:36,511 DEBUG ARP: 00:01:02:03:04:05 192.168.1.58 -request-> 00:00:00:00:00:00 192.168.1.20
2015-01-30 16:26:36,511 WARNING Waking theobromine
2015-01-30 16:26:36,889 INFO theobromine seen alive (since 1308 sec ago)

Now, any traffic destined for the development box that generates an ARP request will cause the always-on machine to issue a WoL magic packet to wake it up. That means reconnecting via SSH in the morning, making an edit via Emacs/Tramp, or even just a ping. Aside from the delay of a few seconds when the machine needs waking, it almost appears to the user as if it never goes to sleep.

Posted in Linux Tagged development, etherwake, linux, nova, power-management, wol

November 26, 2014
23:10
by: Dan

Multi-room audio with multicast RTP

Our house has speakers in the ceiling in almost every room. This is not something I’ve had before, and was initially skeptical about usefulness and fidelity. However, I’ve actually been enjoying having the background music while working be spread further than just my office. When I leave the room to get coffee or food, it’s nice to have the same music playing in the kitchen and beyond.

Background

I think that very high-end systems have all the speakers in the house wired back to a central location, where a massive multichannel audio system powers them, providing independent audio routing for rooms and other neat things like that. Our speakers, on the other hand, are mostly wired to a feed point in each room. The largest “zone” consists of the living room, stairway, bedroom hallway, my office, and the back deck, all of which are wired to the media console in the living room. The master bedroom and bathroom speakers are wired to a single place in the master bedroom, a pattern repeated in the other bedrooms. The large zone covers a lot of the space I care about during the day, but there are times (especially on the weekends) when we’d like for the other zones to be fed with the same stream.

One solution to this problem would be to reroute the speaker wires from the remote zones to the feed point of the largest main floor zone. This is hard to do because of a few exterior walls, but also would require at least a multichannel amplifier to be effective. We also want to be able to do things like pipe the bedroom TV audio to the bedroom speakers at times, and sending that all the way down to the living room just to be amplified and sent back up is kinda silly.

During my workdays, I have an MPD instance that plays my entire music collection on shuffle, which provides me a custom radio station with no commercials. A solution to the multiple zone problem above should mean that I can hear that stream anywhere in the house. A first thought was to just use icecast and several clients to play that stream in each zone. The downside of that is that the clients would be very out of sync, providing reverb or echo effects at the boundaries of two zones.

Solution

Turns out, PulseAudio has solved this problem for us, and in an amazingly awesome way. Assuming you have the bandwidth, PulseAudio can send an uncompressed stream via RTP from one node to another. It also has the ability to use multicast RTP and send one stream to … anyone that wants to listen.

Keeping to just two rooms for the sake of discussion, below is a diagram of what I’ve got now:

At each feed point, I have a Raspberry Pi, with the excellent HiFiBerry DAC attached. Each of these just needs a standard Raspbian install, with PulseAudio. This provides a quiet, low-power, solid-state source to feed the amplifier and, thus, the speakers at each location. In the server room, I already have a machine with lots of storage that houses the music collection, and runs Ampache to provide multi-catalog management of the MPD player. By running MPD on that machine, along with PulseAudio configured for multicast RTP, this machine effectively becomes the “radio station” for the house.

First, the configuration of the server machine. PulseAudio is probably already installed, so all you need to do is enable the null sink and RTP sender by putting this in /etc/pulse/default.pa:

load-module module-native-protocol-unix
load-module module-suspend-on-idle timeout=1
load-module module-null-sink sink_name=rtp
load-module module-rtp-send source=rtp.monitor rate=48000 channels=2 format=s16be

Then configure MPD to use PulseAudio by putting this in /etc/mpd.conf:

audio_output {
 type "pulse"
 name "My Pulse Output"
}

At this point, you should be able to start MPD, add some music and start it playing.

Next, get a Raspberry Pi booted to a fresh Raspbian install. If your DAC needs special configuration, do that now. Otherwise, the (awful) integrated audio should work for testing. In /etc/pulse/daemon.conf, set the following things:

; Avoid PulseAudio auto-quitting when idle
exit-idle-time = -1
; Don't use floating-point ops to resample
resample-method = trivial
; Default to 48kHz sampling rate
default-sample-rate 48000

Next, we configure PulseAudio to listen to multicast RTP and play whatever it finds. In /etc/pulse/default.pa:

load-module module-rtp-recv

Now you should be able to start the daemon and get audio. For debugging, in the foreground:

pulseaudio -v

Within a few seconds, you should see the daemon discover the stream, latch on, and start playing it.

Impressions

At first, I was highly skeptical that streaming uncompressed audio over the network was going to result in satisfactory performance. Obviously, it’s necessary for achieving any sort of realtime playback, but I expected to have issues keeping up with the stream, even on GigE just from a congestion point of view. I’m happy to say that for the most part, there really aren’t issues with this, and the audio quality is quite good. It won’t satisfy the audiophile, and I’d never use it for dedicated music listening with high quality speakers or headphones. However, for background audio while I’m working and general music in the house, it’s quite good.

I didn’t know what to expect with PulseAudio’s latency-matching attempts to create a seamless echo-free transition between zones. After testing it for several days, I can say that I am flat-out amazed. For all intents and purposes, when standing between two zones being fed by two different machines from the “radio station” stream, it’s basically impossible to tell that they’re not tied together on the analog side. I haven’t gone so far as to create a single tone audio file and try to detect beats between two adjacent systems, so I’m sure doing that would make it easier to tell that they’re not perfectly synchronized. However, for casual music listening, it’s very good.

Next Steps

After seeing how well this works for the house’s “radio station” I have some other thoughts. Each of the Raspberry Pi players is located near a TV. If each of those had a capture device, then it would be possible to stream TV audio from either location to the rest of the house for the “watching the news, but doing other stuff” sort of use-case. I figure that in order to make this really useful, I’ll need a web interface that allows me to enable or disable various streams, and control which stream any given player will “subscribe” to. That would let us patch any audio stream to any output easily and dynamically.

Posted in Hardware Tagged ampache, audio, audiophile, linux, mpd, music, pulseaudio

April 11, 2014
00:44
by: Dan

Don’t touch my Schiit!

As a work-from-home-er and a music fan, having good audio in my home office is important. Until recently, I’ve used my laptop to feed a two-channel amp and some decent speakers. A large collection of digital audio, combined with MPD means I basically have a personal radio station with no commercials that only plays stuff I like. I just turn on the amp when I want to hear it, turn it off when I don’t. It’s awesome.

Downsides to this setup are generally that if I want to watch a YouTube video on my laptop or do something that otherwise requires sound, I have to turn on the amp, pause the music, and unpause when I’m done. Something dedicated would be nice. I set out to build something like volumio, which turns a raspberry pi into a dedicated audio device. Since the audio out on the pi is really sub-standard, I decided to step up my game. So, I bought a Schiit Modi:

This thing is beautiful and highly-regarded as a top-notch async USB DAC for the (relatively low) $100 price point. Not too much to ask for a well-designed beautiful piece of audio gear, even if it is just a USB sound card.

Unfortunately, no matter what I do, I can’t seem to get it to play nice with the pi. Occasional skips and pops plague the audio, despite trying all the tricks for getting it to work. Since a regular cheapo USB sound card works fine on the pi, I was suspicious. Feeding the Modi from my laptop works fine and the audio is fantastic. However, whilst plugging and unplugging during all my testing, I encountered a nasty flaw.

If I walked across the room and then touched the beautiful metal case of the Modi, the audio would mute and slowly fade back. Occasionally, the Modi would actually jump off the USB bus entirely, and when it came back, audio would be distorted until I power cycled it. Being also into RF-related toys, I recognized this as some sort of static sensitivity, which often means design issues around grounding. Not cool.

Given that I was having so much trouble with my Modi and the pi (even though others seem okay with it), I decided to email Schiit and ask if the static issue was expected, thinking maybe I had a defective unit. To my horror, I got this short response from “Nick T”:

Yep, Modi can be static-sensitive.
The solution: avoid touching it.

I was more than a little surprised. Don’t touch it? Now, I can imagine some situations where static could be affecting the audio path, but the USB side should be totally solid. I can’t think of any reason why jumping off the USB bus is reasonable. Imagine if your printer or external disk did that! I replied and made sure I was clear about the USB side of the issue. Again, I got a short response from Nick:

Yes, it can be static-sensitive in some systems.
The solution: move it where you won’t touch it.

Okay, Nick, I get it. The thing is so beautiful, it needs to be in a glass case. Form over function, right? No thanks, this Schiit is going back.

Schiit charges a 15% restocking fee on the Modi, so by the time I pay for the original shipping and return shipping, I’ll have paid for over a quarter of the device itself. That’s okay, maybe these guys can use the extra money to include some star washers in future products (that’s a grounding joke).

Posted in Hardware, Linux Tagged audio, linux, modi, raspberry pi, schiit, usb

January 10, 2014
22:32
by: Dan

A Manual Tune Button for the LDG AT-7000

The LDG AT-7000 automatic tuner is very popular among Icom HF radio owners, especially those with IC-7000 and IC-706 radios. It’s small, simple, and works really well. The problem is, it’s so simple that it has no controls of its own. Since it depends on the radio to tell it when to tune or go into bypass, you can’t use it with any radio that doesn’t support an AH-4 tuner. I have mostly Icom radios, so this is not a problem, but occasionally I want to be able to use the thing with another brand, which isn’t possible. Or is it?

The AH-4 tuning control signals from the radio are super simple, which means it’s pretty easy to put a manual tuning button on the device. If you provide it power somehow, it suddenly becomes a versatile works-with-anything tuner. Here is my finished product:

I started at Surplus Gizmos to find a momentary button. I wanted something that would be small and unobtrusive, as well as look like it belonged there in the first place. As usual, they had the perfect thing. It’s a small momentary pushbutton that uses just a thin black plunger, requiring only a small hole to be drilled in the case.

Disassembly of the AT-7000 starts easily, but quickly becomes a chore with lots of little nuts holding connectors to the chassis. Every single one has a star washer, so be sure not to lose any of these as they’re important for good bonds between the connectors and the chassis. Specifically, this last one on the SO-239 for the radio hits a diode before coming all the way off:

To get it out, slide the board partially out of the chassis, which moves the diode just enough to pull the nut off and free it from the case. On the other side, the connections to the mini-din are accessible. The tune button needs to connect the “start” button to ground in order to control the thing, so jumper on the back of these pins:

While the board is out, the hole for the button should be drilled to avoid getting filings everywhere. Once that is done the board can go back in the chassis and all the tiny fittings can be re-mounted. I found plenty of space to mount my button securely in the back left corner above the radio control connector, which keeps the jumpers short:

At this point, the cover can go back on the case. The only remaining thing that needs to be done is to build a cable for the tuner so that you can power it without plugging it into an Icom radio. The connector is the same as a PS/2 connector, so cutting one off of an old device is the easiest way. By the pinout in the manual, pins 1 and 4 are +12V and Ground respectively.

Controlling the tuner is basically the same as it is when connected to an Icom radio, except that the radio doesn’t automatically transmit for you during the process. Pressing the button in various patterns controls the mode:

Less than 500ms: Bypass mode
Between 500ms and 2500ms: Memory tune
More than 2500ms: Full tune

You have to get the radio to transmit a carrier at a few watts in order for the tuner to have a signal to work with, so put your radio in CW, RTTY, FM, or AM mode and hold down PTT while triggering the appropriate mode. Keep the carrier until the tuning has finished.

The added benefit of this tune button is that you can now reset the tuner’s memory without taking the case off. Holding down the button for a few seconds while powering up the tuner will erase all the stored tune memories.

Posted in Radio Tagged at-7000, ham, icom, ldg, radio, tuner

July 20, 2013
01:04
by: Dan

A brief overview of Nova’s new object model (Part 3)

In parts one and two, I talked about the reasoning for developing an object model inside of Nova, as well as showed a sample implementation for a toy object. In this part, I will examine parts of some “real” objects that are currently under development in the Nova tree.

The biggest object in Nova is (and probably always will be) Instance. It’s fairly complicated though, so let’s start with something a little simpler, such as the SecurityGroup object. Here is the field definition:

class SecurityGroup(base.NovaObject):
    fields = {
        'id': int,
        'name': str,
        'description': str,
        'user_id': str,
        'project_id': str,
    }

There is an integral ID and a few strings, so it is pretty simple. There are two ways to query for those objects, by name or by ID:

@base.remotable_classmethod
def get(cls, context, secgroup_id):
    db_secgroup = db.security_group_get(context, secgroup_id)
    return cls._from_db_object(cls(), db_secgroup)

@base.remotable_classmethod
def get_by_name(cls, context, project_id, group_name):
    db_secgroup = db.security_group_get_by_name(context,
                                                project_id,
                                                group_name)
    return cls._from_db_object(cls(), db_secgroup)

Both of these methods use a common pattern as many of the other objects, which is to query the database for the SQLAlchemy model, and then pass that to a generic function (not shown here) that constructs the new object. Both of these are decorated with remotable_classmethod, which makes them callable from across RPC and at a class level. Querying for a security group would look something like this:

from nova.objects import security_group
secgroup = security_group.SecurityGroup.get(context, 1234)

Unlike the fictitious example in Part 2, there is another way to query for security group objects, which is by a collection based on some common attribute. This is often done by project ID, for example. The objects framework provides a way to easily define an object that is a list of objects, such that the list can be queried directly or over RPC in the same way, and so that the list itself contains inbuilt serialization, which handles the serialization of the objects contained within. See the SecurityGroupList object:

class SecurityGroupList(base.ObjectListBase, base.NovaObject):
    @base.remotable_classmethod
    def get_all(cls, context):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_all(context))

    @base.remotable_classmethod
    def get_by_project(cls, context, project_id):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_by_project(
                context, project_id))

    @base.remotable_classmethod
    def get_by_instance(cls, context, instance):
        return _make_secgroup_list(
            context, cls(),
            db.security_group_get_by_instance(
                context, instance.uuid))

The first line shows that this special object is not only a NovaObject, but also an ObjectListBase, which provides the special list behavior. Note that the order of inheritance is important, so they must be in the order shown.

The ObjectListBase definition assumes a single field of “objects” and handles typical list-like behaviors like iteration of the things in the objects field, as well as membership (i.e. contains) operations. Thus, all you need to do is fill out the “foo.objects” list, like the _make_secgroup_list() helper function does:

def _make_secgroup_list(context, secgroup_list, db_secgroup_list):
    secgroup_list.objects = []
    for db_secgroup in db_secgroup_list:
        secgroup = SecurityGroup._from_db_object(
            SecurityGroup(), db_secgroup)
        secgroup._context = context
        secgroup_list.objects.append(secgroup)
    secgroup_list.obj_reset_changes()
    return secgroup_list

This method simply populates the “objects” list of the SecurityGroupList object with SecurityGroup objects it constructs from the raw database models provided. It uses the same _from_db_object() helper method as the SecurityGroup object itself. You can use the result of this just like a real list:

secgroups = security_group.SecurityGroupList.get_all(context)
for secgroup in secgroups:
    print secgroup.name

The massive Instance object is similar to what we’ve seen in previous examples, and the SecurityGroup example above. There is a base Instance object, and an InstanceList object to provide an implementation for all the ways we can query for multiple instances at once. It’s too big to show here, but here is a subset of the field definition:

class Instance(base.NovaObject):
    fields = {
        'id': int,
        'user_id': obj_utils.str_or_none,
        'project_id': obj_utils.str_or_none,
        'launch_index': obj_utils.int_or_none,
        'scheduled_at': obj_utils.datetime_or_str_or_none,
        'launched_at': obj_utils.datetime_or_str_or_none,
        'terminated_at': obj_utils.datetime_or_str_or_none,
        'locked': bool,
        'access_ip_v4': obj_utils.ip_or_none(4),
        'access_ip_v6': obj_utils.ip_or_none(6),
 . . . }

Finally, an object with some interesting fields! We see the usual integral ID field at the top, but notice that most of the other fields use “or none” helpers from the utils module. Since many of the fields in the instance can be empty (nullable=True in the database definition), we need to handle “either a string or None” in cases such as user_id. The utils module provides some helpers for datetime and ip address functions, which return datetime.datetime and netaddr.IPAddress objects respectively. Just like the int and str type functions, these take a string and convert it into the complex type when someone does something like this:

inst = instance.Instance()
inst.access_ip_v4 = '1.2.3.4'  # Stored as netaddr.IPAddress('1.2.3.4')

These fields with complicated data types bring us to our first concrete example of something needing special handling during serialization and deserialization. The Instance object contains methods like _attr_scheduled_at_to_primitive() and _attr_scheduled_at_from_primitive() that handle converting the datetime objects to and from strings properly. Handlers (and handler-builders) for these types are provided in the utils module. The IP address fields provide a useful example for illustration, such as this serialization method for the IPv4 address:

    def _attr_access_ip_v4_to_primitive(self):
        if self.access_ip_v4 is not None:
            return str(self.access_ip_v4)
        else:
            return None

This gets called by the object’s serialization method when it encounters the complex IPv4 address field. Although not obvious to the layer above us, the netaddr.IPAddress object can serialize itself through simple string coercion, so we do just that. However, since the field could be None, we want to be sure not to convert that to a string resulting with the string “None” instead of None itself. Luckily, we need no special deserialization because the result of the above string coercion is sufficient to pass to the field’s type function itself, which the deserialization routine will try if no special handler is provided.

In a subsequent part, I will talk about advanced topics like lazy-loading, versioning, and object nesting.

Posted in OpenStack Tagged Instance, nova, NovaObject, objects, openstack, SecurityGroup, serialization

July 12, 2013
18:23
by: Dan

A brief overview of Nova’s new object model (Part 2)

In Part 1, I described the problems that the Unified Object Model aims to solve within Nova. Next, I’ll describe how the infrastructure behind the scenes achieves some of the magic of making things easier for developers to implement their objects.

The first concept to understand is the registry of objects. This registry contains a database of objects that we know about, and for each, what data is contained within and what methods are implemented. In Nova, simply inheriting from the NovaObject base class registers your object through some metaclass magic:

class NovaObject(object):
    """Base class and object factory.

    This forms the base of all objects that can be remoted or instantiated
    via RPC. Simply defining a class that inherits from this base class
    will make it remotely instantiatable. Objects should implement the
    necessary "get" classmethod routines as well as "save" object methods
    as appropriate.
    """
    __metaclass__ = NovaObjectMetaclass

In order to make your object useful, you need to do a few other things in most cases:

Declare the data fields and their types
Provide serialization and de-serialization routines for any non-primitive fields
Provide classmethods to query for your object
Provide a save() method to write your changes back to the database

Notice that nowhere in the list is “provide an RPC API”. That’s one of the many magical powers that you get for free, simply by inheriting from NovaObject and registering your object.

To declare your fields, you need something like the following in your class:

fields = {'foo': int,
          'bar': str,
          'baz': my_other_type_fn,
         }

This magic description of your fields describes the names and data types they should have. The key of each pair is, of course, the field name, and the value is a function that can coerce data into the proper format and/or raise an exception if that is not possible. Thus, if I set the “foo” attribute to a string of “1” the integer 1 will be actually stored. If I try to store the string “abc” into the same attribute, I’ll get a ValueError, as you would expect.

The next step is (de-)serialization routines for our attributes. Our “foo” and “bar” attributes are primitives, so we can ignore those, but our “baz” attribute is presumably something more complex, which requires a little more careful handling. So, we define a couple of specially-named methods in our object, which will be called when serialization or de-serialization of that attribute is required:

def _attr_baz_from_primitive(self, value):
    return somehow_deserialize_this(value) # Do something smart

def _attr_baz_to_primitive(self):
    return somehow_serialize_this(self.baz) # Do something smart

Now that our object has a data format and the ability to (de-)serialize itself, we probably need some methods to query the object. Assuming our “foo” attribute is a unique key that we can query by, we will define the following query method:

@remotable_classmethod
def get_by_foo(cls, context, foo):
    # Query the underlying database
    data = query_database_by_foo(foo)

    # Create an instance of our object
    obj = cls()
    obj.foo = data['foo']
    obj.bar = data['bar']
    obj.baz = data['baz']

    # Reset the dirty flags so the caller sees this as clean
    obj.obj_reset_changes()

    return obj

The above example papers over the part about querying the database. Right now, the objects implementations in Nova use the old DB API to do this part, but eventually, the dirty work could reside here in the object methods themselves.

Now, there is some magic here. If I am inside of nova-api (or some other part of nova with direct access to the database) and I call the above classmethod, the decorator is a no-op and the code within the method runs as you would expect, queries the database, and returns the resulting object. If, however, I am in nova-compute and I call the above method, the decorator actually remotes the call through conductor, executes the method there, and returns the result to me over RPC. Either way, the use of the object is exactly the same in both cases:

obj = MyObj.get_by_foo(context, 123)
print obj.foo # Prints 123
print obj.bar # Prints the value of bar
# etc...

Now, before we’re done, we need to make sure that changes to our object can be put back into the database. Since a “save” happens on an instance, we define a regular instance method, but decorate it as “remotable” like this:

@remotable
def save(self, context):
    # Iterate through items that have changed
    updates = {}
    for field in self.obj_what_changed():
        updates[field] = getattr(self, field)

    # Actually save them to the database
    save_things_to_db_by_foo(self.foo, updates)

    # Reset the changes so that the object is clean now
    self.obj_reset_changes()

This implementation checks to see which of the attributes of the object have been modified, constructs a dictionary of changes, and calls a database method to update those values. This pattern is very common in Nova and should be recognizable by people used to using DB API methods.

Now that we have all of these things built into our object, we can use it from anywhere in nova like this:

obj = MyObj.get_by_foo(context, 123)
obj.bar = 'hey, this is neat!'
obj.save()

One more bit of magic to note is the “sticky context”. Since you queried the object with a context, the object hides the context within itself so that you don’t have to provide it to the save() method (or any other instance methods) for the lifetime of the object. You can, of course, pass a different context to save if you need to for some reason, but if you don’t it will use the one you queried it with.

Nifty, huh? In Part 3, I will break from the world of fictitious objects and examine a real one that is already in the Nova tree, as well as fill out some of the other implementation details required.

Posted in OpenStack Tagged db, nova, NovaObject, objects, rpc, unified-object-model

Right Angles

Upgrades in Nova: The Details

Upgrading Nova to Kilo with minimal downtime

Step 0: Prepare for what is coming

Step 1: Upgrade the schema

Step 2: Pin the compute RPC version

Step 3: Upgrade the control services

Step 4: Watch and wait

Step 5: Upgrade computes

Step 6: Drop the version pins

Step 7: Perform online data migrations

Summary and Next Steps

Execuhitch on a 2012 BMW X5 M-Sport

2007 Audi A4 LED DRL Conversion

The magical sleeping development machine

Multi-room audio with multicast RTP

Background

Solution

Impressions

Next Steps

Don’t touch my Schiit!

A Manual Tune Button for the LDG AT-7000

A brief overview of Nova’s new object model (Part 3)

A brief overview of Nova’s new object model (Part 2)

Recent Comments

Links

Meta

Archives

Step 0: Prepare for what is coming

Step 1: Upgrade the schema

Step 2: Pin the compute RPC version

Step 3: Upgrade the control services

Step 4: Watch and wait

Step 5: Upgrade computes

Step 6: Drop the version pins

Step 7: Perform online data migrations

Summary and Next Steps

Background

Solution

Impressions

Next Steps

Tags

Recent Comments

Links

Meta

Archives