Recently my Icom IC-7000 died during transmit. The result was a totally-dead appearance, which I resolved as detailed in this recent post. After repairing it with new parts, the radio powers on, but all is not well.
The radio now exhibits a strange symptom related to the Po (power output) and ALC meters. In SSB mode, pressing the mic key shows about half-scale deflection of the power meter with no modulation and regardless of the RF Power setting. This should be zero. With modulation, some meter activity over the static level could be seen, but never full deflection. Further, in FM or RTTY mode, the power meter would show about 80% deflection when RF Power was set to 100%. This should show full-scale. Lastly, in SSB mode the ALC meter would show full deflection with no modulation if the RF Power was set over about 40%, and zero if it was set under that level. ALC should mirror the modulation input, regardless of power.
At first, this seemed reminiscent of the self-oscillation problem that could occur in the 756 and 746 radios, where the RX line wasn’t fully pulled to ground during transmit, causing similar behavior with power deflection during transmit with no modulation. However, I ruled this out by looking at the current draw on the power supply. In SSB mode with no modulation, the radio would only pull an additional two amps or so, despite the meter showing about 50W output. The power draw would fluctuate as expected with voice peaks, even though the power meter did not show any activity. In FM or RTTY mode, the power supply would show about 22A draw even though the radio claimed it was putting out less than 80% power.
Another very interesting manifestation was that the radio wouldn’t drive an external tuner. Even when connected to a dummy load, the radio would kick off a tune cycle, the tuner would achieve a satisfactory result, but the radio would kick the TUNE indicator off after it was done. My guess was that the confusing power output indications to the CPU yielded a “not a good match” determination by the radio itself.
All of this led me to think that something in the power metering or ALC circuits was not right. The forward and reflected power is sampled on the PA board at the antenna connector and fed to IC960 where it is amplified and fed to the CPU on the main unit via the HFOR and HREF lines. Since the radio was showing zero SWR deflection, and since the ALC and Power meters were based on the HFOR line, I focused there.
There is a check point in the ALC signal on the main board – CP1601. I measured 1.7VDC here during transmit regardless of modulation input or the RF Power setting. This, to me, seemed to be the problem: basically a static invalid feedback signal to the CPU, which it interpreted as power output when there was none in SSB mode, and potentially less than full output in FM or RTTY mode, when there was plenty.
There are a number of capacitors and resistors around IC960 before the HFOR line leaves for the main board, which I tried to test in-circuit. However, I couldn’t get anything like reasonable values for these without taking them out.
But, there’s a 300 ohm resistor in the HFOR line, R960. I tested this in-circuit and it seemed like a dead short. To further test it, I disconnected the ribbon cables going to the main board, which should free up one side of it and measured again. Still a dead short. So, I pulled it out and replaced it with as close of a temporary resistor as I had locally: 200 ohm. To my delight, the radio started behaving normally! SSB with no modulation showed no power deflection, and modulation made it bounce as expected. RTTY and FM showed full-scale at 100% and otherwise mirrored the RF Power setting as I backed it down, as expected
After this change, I measure differing voltages at CP1601 depending on drive and power output, which is what I would expect. The radio also happily drives the tuner, and seems to measure the proper SWR when fed with 25 and 100 ohm loads. I expect it is either not achieving full output (or over-driving the PA) due to the wrong-value resistor I installed. After replacing the temporary unit with the proper value I will test power output to make sure it’s behaving properly
I have a bit of a love/hate relationship with my Icom IC-7000. I think that it’s a fantastic radio, with a ton of stuff packed into a very small size. My radio has suffered several failures over its life and has let me down at critical moments. I want to document the most recent failure for posterity, which happens below, but first I’ll cover the background. If you are not interested, skip the history and go straight to where I describe the latest failure and my analysis.
In my early SOTA days, I was hauling it up to mountain peaks with me and operating in all kinds of conditions. I’ve also used it at home, for field day, and in the back country for heavy digital modes like Pactor-3 during events, most of the time with less-than-ideal makeshift antennas. It has definitely seen a lot of action, and has lived life far from the safe environment of an indoor ham shack’s desk top. I think that it packs an incredible amount of functionality and performance into a very small package, and it was definitely a front-runner in that category. That may be why it seems to have a somewhat poor reputation for being prone to failure. I don’t really fault the radio for punching above its weight class, or Icom for pushing the envelope. Like a specialty sports car, you buy it for the performance, not the reliability. In reality, if this thing died, I don’t really know what I’d replace it with; there’s really no equivalent offering today, in my opinion.
The first major issue happened while I was using it in the backcountry to provide Winlink messaging for an event, about 2012 or so. Occasionally, the radio would just go “deaf” on HF for several minutes at a time. It would be sitting there in receive mode showing background band noise (or actual signals) and then suddenly show a zero S-meter and no noise (other than internal receiver noise) for several minutes. This would persist across all bands. Then as suddenly as it went away, the receiver would come back and it would work fine for a while…until it wouldn’t again. For this, it went back to Icom, along with a detailed description of the problem and a video showing the symptoms over the course of several hours and several such events. Unfortunately, the usually-stellar Icom service center let me down on this also. They refused to watch the video and claimed that they were unable to reproduce the problem (I’m guessing because they only tested for a few minutes). They re-seated the ill-fated ribbon cables in the radio, and sent it back to me with a bill for their troubles. They also tried to argue that what I described was my own ignorance and that obviously I had some noise source in the house that was coming and going, despite the problem first manifesting hundreds of miles away, and persisting across all bands simultaneously. But alas. In general Icom service has been great and has a good reputation for being so. Despite my poor experience with one technician, I’m still an Icom fanboy.
When the radio came back from Icom, it went straight on the shelf. They couldn’t reproduce the problem, which means I couldn’t trust it. I was disappointed and frustrated with the service center, so I just moved on. At that point, I was using my Elecraft KX3 for SOTA stuff, which is, perhaps, the perfect radio for it. The IC-7000 stayed on the shelf for probably two years until I decided to get it out to play in a contest. When I tried to power it on, I got the “click-click of death” behavior. The radio wouldn’t power on, and just energized and then de-energized the internal relay when I pushed the power button. I was shocked, because I literally hadn’t powered it back on since Icom returned it to me and it seemed like they sent it back in worse shape. I tracked this down to a shorted tantalum capacitor in the head, which is a semi-well-known failure in these radios. It doesn’t seem like that would have happened while sitting on a shelf without any power for two years, but I’m not sure. I made that repair and the radio came back to life again. I still didn’t trust it, so it went back on the shelf for a couple more years.
This past year, I did field day for the first time in a while and had a blast. I wanted to use the IC-7000, but didn’t trust it so I used my IC-7200 (which is also a fantastic radio, but not really well-suited for contests). After field day, I decided to get the IC-7000 out and work some HF from home in order to put it through its paces. It held strong for hours of full-power SSB QSOs and I was starting to think that it had been exorcised of its demons. That is, until they popped out again.
The latest failure
At one point while transmitting, the radio shut off and then rebooted. Each time I would transmit, it would shut off, and then it got stuck in some sort of loop. Whenever power was applied to the DC jack, the radio would sit in what seemed like a tight reboot loop of just clicking the relay on and off (it was so fast that the screen never had a chance to come on). I took a break, let the radio sit for a while without power, and when I came back, it powered on and continued to work for some time after that. However, it was short-lived and at one point it shut off while transmitting for what would be the last time. The radio was stone dead, showing no signs of life. No relay clicking, no speaker popping, no significant power draw when plugged in, just … nothing.
Being totally not incentivized to spend more money and frustration with the Icom service center, I decided to take another shot at diagnosing the problem. The shorted capacitor in the head had been self-resolvable, after all. I found some information about the power up process that the radio goes through, which was helpful in catching a lead to the actual fault. At all times when the radio has power, the “logic unit” (i.e. the CPU module) receives the 14v input voltage through the HV line. This powers a 3.3v regulator on the logic unit itself, which powers the CPU. There is a pull-up resistor that brings the PWRK line high to the 3.3v rail from the regulator, which goes to the control head. To power the radio on, the control head brings the PWRK line to ground, which the CPU notices, and powers on the rest of the radio by energizing the main relay through a driver transistor.
Checking the PWRK signal on the head connector (pin 2), I saw that it was at about 2.0v, well below the expected 3.3v. Pulling the cover off of the logic unit, I checked the regulator and found that not only was the output low (about 2.4v right at the regulator), but the input was also about 2.7v instead of the expected 14v from the supply.
My thought was that, like the previous capacitor failure in the head, I was looking for something that is now shorted to ground, either on the high or low side of the regulator. Checking a bunch of the bypass caps on both sides, I found none that seemed to be problematic, so I thought maybe the regulator itself was faulty. I removed it (in pieces, unfortunately) and as soon as I did, the input pad was reading 14v from the supply again.
I thought I was in good shape, so I temporarily soldered a TO-220 3.3v regulator I had in my stash to see if that would solve the problem. Unfortunately, it did the exact same thing, showing the same low input and output voltages.
Still on the path of expecting something was dragging either the input or output to ground, I decided to try to isolate the logic unit from the main unit, to see whether the fault was on one or the other. I powered the logic module while disconnected from the main unit by feeding the regulator’s input directly. Sure enough, the output held at the expected 3.3v, with no voltage drop on the input side. This, I figured, meant something on the main unit was shorted. I put the logic unit back into the main unit, and again fed the logic regulator directly, expecting to see the output be pulled down. However, even seated in the main unit, the regulator behaved properly: no voltage sag on either the input or output.
At this point, I took a break, assuming the fault was in one of the main ICs, like perhaps the CPU itself and that I’d be looking at a more substantial replacement of that entire module, or worse. But, then I had the thought that if we’re really dragging the 14v from the supply down to ~3v, we would have to be pulling a lot of current through something and things should be smoking (and they weren’t). Further, the PWRK line was available on the outside of the radio, which means it really needed to be protected from over-current if something contacted the exposed pins while power was applied.
So, I started tracing the HV line to its source. The logic unit gets it from the main unit, which basically passes it straight through from the PA unit, which is where the DC supply connects to the radio. Here, the HV line is fed from the DC supply through the c line of Bus Line 1, through an RF choke (EP703) and a 4.7 ohm resistor (R723).
This, I assume, is the current-limiting resistor to prevent that always-on HV line from smoking a trace if it contacts ground. So, I found those two components on the PA board to measure them, and luckily found them on the bottom (exposed) surface by the front-loading fuse holder. The choke read zero ohms as expected, but R723 measured (in-circuit) at 52 ohms!
The thinking here is that R723 failed and is showing much higher resistance than expected. Under no load (with the logic unit disconnected), of course the HV line reads full voltage. When applied to the regulator and CPU on the logic unit, the input voltage drops too low to start the CPU and thus no response to the power button input. It’s also possible that the CPU is running, but when the power button is pressed, the low voltage and near zero current provided to the relay gives the impression that nothing is happening.
Again looking to actually solve the problem before I make a Digi-Key order, I bypassed R723 with the lowest value resistor I had on hand, which is 10 ohm. Sure enough, the radio responded to the button stimulus and powered right up!
At this point, I think I’ve resolved this latest failure. I’ve got to replace the regulator I removed (in pieces), which will be easy. Far more difficult will be replacing R723, which is smaller than a pin head. Shown here by the red arrow, pencil eraser for scale:
If that doesn’t go well, I may just formalize my bypass resistor in the circuit and call it good. I didn’t find any reports “out there” of this exact problem manifestation, so hopefully this write-up will help someone else if they suffer the same problem, or provide clues for a solution.
I have a couple of older Kenwood TM-D700 radios that I use for APRS stuff. They’re solid radios, and even have some features not present in the newer siblings, like the TM-D710. But, they don’t have integrated GPS devices, they don’t even support SmartBeaconing, and they were designed long before Bluetooth was common of course.
I recently decided to integrate a Bluetooth serial adapter directly into one, which would make it easier to use things like APRSDroid. With Bluetooth serial, APRSDroid can pretty much replace most of the APRS functionality in the radio, but using the inbuilt TNC for the actual modem part. This means a phone or tablet in range of the radio can enable rich mapping, SmartBeaconing, and messaging, without any cabling to the actual radio itself. Bluetooth-based connection is a feature of the current TH-D74 portable radio and it is extremely convenient.
Unlike the newer TM-D710 mobile, the TNC in the D700 is in the body of the radio itself, and thus that is where the Bluetooth serial module needs to go. Luckily, the front nose of the radio is plastic, and provides a good place for us to put the module so that it won’t be shielded by the rest of the case:
The red box roughly indicates the spot where the module will go. Conveniently there is a little extra space for it under the cover.
The module I used is an HC06, which is available from various places in a lot of different form factors, brands, etc. The only problem is that these modules almost universally work on TTL-style 3.3v serial. The TM-D700 has a proper RS-232 port, which works at differential 12v signalling. Thus a TTL serial converter is required, as well as a voltage divider to lower the 5v output from it to 3.3v for the Bluetooth module. The 3.3v coming from the Bluetooth module is enough to drive the RX side of the TTL level converter without issue.
So, for this to work, we’ll need to find a 5v source inside the radio to power the Bluetooth module and the TTL converter. Then we will place the TTL converter between the serial port on the radio and the serial port of the Bluetooth module. Luckily, the radio has an option slot for the VS-3 “Voice Synthesizer” module, and that module requires 5v from the radio. Since I don’t have that module installed in either of my D700s, it seems like a perfect place to grab it, especially since the radio expects to provide some amount of power (at 5v) to that module. The pins are super tiny, but you can probe them to see which one has the 5v. It’s the second one from the edge closest to the side of the radio. Pin 7 on the schematic, labeled “C5” for “Common 5v”:
There is also a ground at the other end of the blank spot for the module, which should be easy to spot. If you have a VS-3 already installed, you may need to find 5v and ground elsewhere. I grabbed mine from this area and tacked down the wires with a little hot snot to keep them from moving.
I chose a super small TTL converter, but which already had a female DE-9 connector. I removed that connector and just wired between the appropriate pins so it would fit inside the radio.
Now, the easy thing to do here is to just wire the TTL converter to pins 2, 3, and 5 on the board, as they would be if you were connecting externally to the port on the radio. However, this will present two problems. First, you won’t be able to use the serial port externally anymore to do things like program the radio or use the TNC for other purposes. Second, you won’t have any way to access the Bluetooth module’s serial port if you want to do things like change the PIN and human-readable name. However, there’s a good solution for this: hijack some of the unused pins. The schematic shows that several of the DE-9 pins are not connected (NC):
Pins 2 and 3 are the ones we eventually need to get to, and luckily pins 1 and 4 are not connected and right next to the ones we want. So, if we connect to 1 and 4, we can then use a super small loopback “dongle” in the form of a DE-9 female plug with pins 1 and 2 connected, and pins 3 and 4 connected. So, when we want Bluetooth to be operational, we can leave the plug inserted into the front of the radio to loop it through to the serial port; when we want to program the radio with a computer or use the TNC directly, we can remove it. If we need to talk directly to the Bluetooth module for programming, we can make a serial cable that runs to pins 1 and 4 instead of 2 and 3. Further, if we decide we want to have the Bluetooth module connected to the GPS connector (for read-only “Kenwood waypoint mode” operation), we can make a small cable to run from the GPS port to pins 1 and 4 of the DE-9. We can keep the connection to pin 5 (ground) all the time, so no need to account for that. Here’s what that looks like (prior to flux cleanup):
You can see the TTL converter wrapped and hot-snotted into place on the board just below the connector. There is a relatively blank spot on the board here with just some SMT buffer resistors that make a good spot for it.
Otherwise, locating the Bluetooth module is all that is required. On the top side where the TNC module and ribbon cable is located, there is room to place the module under the cover. It probably isn’t necessary to affix it, but a small dab of glue will hold it, antenna side up, in place relatively easily:
It’s not critical to wrap the connection side like I did since the cover that goes over this is plastic, but it helps to secure the wires, the voltage divider resistors, and the bottom of the board in case it comes loose.
After the radio is re-assembled, it’s time to build the loopback. Just a single female DE-9 is all you need for this. Here’s what it looks like (before covering it for safety):
Note that the D700 requires hardware flow control to be working. Since the Bluetooth module doesn’t use (or provide) this, just jumper RTS and CTS (pins 7-CTS and 8-RTS) together to tell the radio it is always “clear to send”.
The GPS connector is right next to it, so a very short connection between pins 1, 4, and 5 on the DE-9 to the GPS connector would enable waypoint mode instead of native TNC mode.
Technically, this is all you have to do to make it work. However, you probably want to change the PIN from the default (which is 1234), as well as change the name to something useful (like “TM-D700” or similar). You need to make up a serial cable, as described, that connects to pins 1 and 4 (instead of 2 and 3 by default). Once you do, you can configure a serial terminal for 9600,8n1 and type the following two commands:
Note that the module doesn’t seem to use carriage returns or line feeds to note the end of a command (as it should), but instead, the inter-character delay timeout. So, the best thing to do is type those commands (one at a time) into something else, and then copy/paste them into the serial terminal so that they show up all at once.
And with that, you should be good to go. The Bluetooth module will turn on and off with the radio, so power it on, and pair your Android device. Then configure APRSDroid for Bluetooth, select the proper Bluetooth pairing, choose KISS mode, and enter the following TNC initialization string for the Kenwood:
KISS ON RESTART
Then when you click “Start Tracking” APRSDroid should configure the TNC to go into KISS mode and you should see some positive response (if verbose logging is enabled). Try sending a position and confirm that the radio transmits to verify proper operation.
I recently got a good deal on a condition-unknown Kenwood TK-7160H radio. This is a nice 50W VHF high-band commercial transceiver, and I figured that even if it was a little sick, it could probably be used to add support to CHIRP. Once I had a chance to check it out, I discovered that it was almost entirely deaf — but only when it was in wide (5kHz deviation) mode. In narrow (2.5kHz deviation) mode, it was fine. Everything else about the thing was on point.
I cracked the case and followed the service manual’s description of the signal path to the IF filters. They’re on the bottom side of the board as installed in the chassis, which means it’s hard to examine them directly, but their terminals are sticking up, ripe for the probing.
Sure enough, I could scope the IF signal on either side of the narrow filter, but one side of the wide filter it was clamped hard to ground. If you are looking at the radio top-down, the wide filter is the one closest to the edge of the board:
The filters each have five pins: three closely-spaced on one side which are ground, and then two on the other side far apart. The two two far-apart ones are the filter input/output pins.
I also noticed that there was some buildup on the otherwise-clean board right around the filters. This looked like crispy flux residue, but also could have been some sort of corrosion. If it was flux, it seemed likely that someone had already been in here to replace filters, potentially only the narrow one since it was healthy?
It is well-known that some radios have issues with their ceramic IF filters. Notably, Kenwood had a rash of much-too-early failures on some of the TM-D710 and TM-V71 radios which had some people worried about long-term longevity. It has been theorized that this problem is due to electromigration due to improper buffering of the filters from the DC bias used in the narrow/wide filter selection logic. Many radio manufacturers do this across many of their models and widespread deaths of all those don’t seem common enough for this to be some major oversight by the world’s leading radio manufacturers, at least in my opinion. Other theories are that Kenwood specifically got a bad batch of Toko filters within a certain manufacturing window, and these are the ones that are dying early.
Other pictures and video I have seen of this problem revealed the affected filters showing external signs of “green death” leaking out of the filter enclosure, slowly working its way out due to increased pressure. To get a look at the filters themselves I had to remove the board, which required unsoldering the antenna connector at the back:
Once that was free, the rest of the screws will release the board from the chassis. I flipped it over and immediately noticed a few things. First, the filters were different brands, indicating that likely one had been swapped out and not the other. Second, the wide one was Toko brand. Third, the wide filter was covered in white fuzz, similar to other pictures I had seen of this problem, but not the narrow one.
Since this radio is intended for Part 90 use, it is likely that someone would have repaired/replaced just the narrow filter, as wide mode would really only be useful to a ham these days. Filters are cheap, about $5 each, and they’re fairly common across these sorts of radios. I ordered three of each and figured I’d swap out the narrow filter in this guy too, since I was in there, just in case. I opted not to do the bypass cap hack job to remove the DC bias. Kenwood’s part numbers for the filters are in the service manual:
Here’s the wide filter desoldered, and the narrow one came out right after it:
After the board was cleaned up, both new filters went back into their spots:
The hot elements are heat sinked to the chassis underneath and mated with thermal paste. I added a little more to each landing before I reassembled it to make sure nothing was thermally compromised by the dis- and re-assembly. Don’t forget to re-solder the antenna connection to the board once it is screwed back down.
After it was all back together, I checked on the service monitor and sure enough, the wide mode was working properly again. Sensitive down well below -120dBm. Before the fix, at 0dBm, I could only barely hear the modulation through the static when I held the squelch open.
There isn’t a lot of information out there (at least that I could find) about applying these filter fixes to the commercial line of radios. Most of those probably get fixed in a shop, where radios from the amateur line are more likely to get fixed by the owner. The same theory here should apply for the UHF variant (TK-7180) as well as similar models of TK-7162, TK-8162, TK-7102, TK-7108, TK-8102, TK-8108, TK-7180, and TK-8180.
When you boot an instance in Nova, you provide a reference to an image. In many cases, once Nova has selected a host, the virt driver on that node downloads the image from Glance and uses it as the basis for the root disk of your instance. If your nodes are using a virt driver that supports image caching, then that image only needs to be downloaded once per node, which means the first instance to use that image causes it to be downloaded (and thus has to wait). Subsequent instances based on that image will boot much faster as the image is already resident.
If you manage an application that involves booting a lot of instances from the same image, you know that the time-to-boot for those instances could be vastly reduced if the image is already resident on the compute nodes you will land on. If you are trying to avoid the latency of rolling out a new image, this becomes a critical calculation. For years, people have asked for or proposed solutions in Nova for allowing some sort of image pre-caching to solve this, but those discussions have always become stalled in detail hell. Some people have resorted to hacks like booting host-targeted tiny instances ahead of time, direct injection of image files to Nova’s cache directory, or local code modifications. Starting in the Ussuri release, such hacks will no longer be necessary.
Image pre-caching in Ussuri
Nova’s now-merged image caching feature includes a very lightweight and no-promises way to request that an image be cached on a group of hosts (defined by a host aggregate). In order to avoid some of the roadblocks to success that have plagued previous attempts, the new API does not attempt to provide a rich status result, nor a way to poll for or check on the status of a caching operation. There is also no scheduling, persistence, or reporting of which images are cached where. Asking Nova to cache one or more images on a group of hosts is similar to asking those hosts to boot an instance there, but without the overhead that goes along with it. That means that images cached as part of such a request will be subject to the same expiry timer as any other. If you want them to remain resident on the nodes permanently, you must re-request the images before the expiry timer would have purged them. Each time an image is pre-cached on a host, the timestamp for purge is updated if the image is already resident.
Obviously for a large cloud, status and monitoring of the cache process in some way is required, especially if you are waiting for it to complete before starting a rollout. The subject of this post is to demonstrate how this can be done with notifications.
Before we can talk about how to kick off and monitor a caching operation, we need to set up the basic elements of a deployment. That means we need some compute nodes, and for those nodes to be in an aggregate that represents the group that will be the target of our pre-caching operation. In this example, I have a 100-node cloud with numbered nodes that look like this:
In order to be able to request that an image be pre-cached on these nodes, I need to put some of them into an aggregate. I will do that programmatically since there are so many of them like this:
$ nova aggregate-create my-application +----+-----------------+-------------------+-------+----------+--------------------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | UUID | +----+-----------------+-------------------+-------+----------+--------------------------------------+ | 2 | my-application | - | | | cf6aa111-cade-4477-a185-a5c869bc3954 | +----+-----------------+-------------------+-------+----------+--------------------------------------+ $ for i in seq 1 95; do nova aggregate-add-host my-application guaranine$i; done ... lots of noise ...
Now that I have done that, I am able to request that an image be pre-cached on all the nodes within that aggregate by using the nova aggregate-cache-images command:
$ nova aggregate-cache-images my-application c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2
If all goes to plan, sometime in the future all of the hosts in that aggregate will have fetched the image into their local cache and will be able to use that for subsequent instance creation. Depending on your configuration, that happens largely sequentially to avoid storming Glance, and with so many hosts and a decently-sized image, it could take a while. If I am waiting to deploy my application until all the compute hosts have the image, I need some way of monitoring the process.
Many of the OpenStack services send notifications via the messaging bus (i.e. RabbitMQ) and Nova is no exception. That means that whenever things happen, Nova sends information about those things to a queue on that bus (if so configured) which you can use to receive asynchronous information about the system.
The image pre-cache operation sends start and end versioned notifications, as well as progress notifications for each host in the aggregate, which allows you to follow along. Ensure that you have set [notifications]/notification_format=versioned in your config file in order to receive these. A sample intermediate notification looks like this:
This tells us that host guaranine68 just completed its cache operation for one image in the my-application aggregate. It was host 68 of 95 total. Since the image ID we used is in the images_cached list, that means it was either successfully downloaded on that node, or was already present. If the image failed to download for some reason, it would be in the images_failed list.
In order to demonstrate what this might look like, I wrote some example code. This is not intended to be production-ready, but will provide a template for you to write something of your own to connect to the bus and monitor a cache operation. You would run this before kicking off the process, it waits for a cache operation to begin, prints information about progress, and then exists with a non-zero status code if there were any errors detected. For the above example invocation, the output looks like this:
$ python image_cache_watcher.py Image cache started on 95 hosts Aggregate 'foo' host 95: 100% complete (8 errors) Completed 94 hosts, 8 errors in 2m31s Errors from hosts: guaranine2 guaranine3 guaranine4 guaranine5 guaranine6 guaranine7 guaranine8 guaranine9 Image c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2 failed 8 times
In this case, I intentionally configured eight hosts so that the image download would fail for demonstration purposes.
The image caching functionality in Nova may gain more features in the future, but for now, it is a best-effort sort of thing. With just a little bit of scripting, Ussuri operators should be able to kick off and monitor image pre-cache operations and substantially improve time-to-boot performance for their users.
A year ago, I added onboard air to our Jeep. It had always been something I wanted to do (since the last Jeep) and I can say it’s definitely one of the best things I’ve done to it. Not only do I air down more often, knowing that airing up will be quicker and easier, but I ended up with a much better compressor than the cheesy portable one I used to have.
The compressor is an ARB twin, and it’s mounted under the hood of our JK right above the brake booster. It’s wired to the second battery, which means I can run it without the engine running if I want. Especially on the Jeep, it’s fairly easy to open the hood, hook up a hose and go to town.
Even with the high output of the twin-piston compressor, airing up four 35″ tires from about 12psi to 38psi, as well as two 31″ tires from 20psi to 50psi does take quite a while. I have a little device I built that automates the process of airing up a tire to completion (which works great), but in order to do its work it has to open a valve, wait, close a valve, check the pressure, and decide whether or not to open the valve again to keep adding air. This causes the compressor to cycle on and off as the closed valve quickly causes the compressor’s pressure switch to trigger. Thus, while the valve is closed, the compressor is doing no useful work. This same thing happens when I’m switching the hose to the next tire.
So, I wanted to add an air tank to the system. This would allow the compressor to run continuously (it’s rated for 100% duty cycle), whether it was filling a tire, or just filling the tank. Synergy makes a bracket for the JK, which allows mounting a Viar 2-gallon tank above the rear axle in some dead space. This means that I needed to run an air line from the compressor in the front to the tank in the back. It also meant that I had an opportunity to plumb in an air outlet at the back of the Jeep, which would be more convenient to access than opening the hood.
But, what kind of air line do you use for on board air? I’ve used the plastic air line to run air in the garage, and plenty of hoses and fitting for tools, but I didn’t really find many clear writeups of what to use for on a vehicle and how to do it. Thus, I decided to write this post just to document what I ended up with.
It turns out, the best option for air line on a vehicle is … air line made for a vehicle. Specifically DOT air brake line. This stuff is rated for some pretty high pressures, and temperatures up to 200C. It’s not super flexible, but it’s not too bad. It seems like a lot of low-flow-high-pressure situations use 1/4″ OD line, but I wanted more volume than that, so I opted for the 3/8″ OD stuff (which is about 1/4″ ID, or about the same as a standard quick connect fitting.
I ordered a few specific parts from a 4×4 vendor:
The Viair tank (VIA-91022)
The Synergy bracket for 2012+ JK (PPM-4022)
The Viair package of 1/4″ MPT pre-sealed plugs for the tank (PPM-4022)
A 1/4″ MPT drain cock (VIA-92835)
The ARB quick connect (ARB-0740112) and dust cover (ARB-0740113)
I have no idea how well the dust-covered quick connect will really hold up, but it has a nice large rubber ring on the sleeve that seems obviously better for cold/dirty/gloved hands than a regular one. If it doesn’t hold up, I’ll put a regular one on there and figure out some sort of cover.
The rest of the line and fittings came from the usual gettin’ spot. Specifically:
I first decided where the drain cock was going to go, and where my lines were going to interface with the tank. I installed the drain and the plugs in the appropriate spots and then mounted the tank and bracket before installing the push-to-connect fittings. I covered the holes for those in the tank with tape while doing the install to avoid getting anything in the tank (and installed the fittings afterwards to avoid damaging those). Synergy tells you to jack up the vehicle by the frame to let the suspension droop and I can say that this is definitely worth the time and makes the process much easier.
Next, I taped one free end of the DOT air line and started scouting my route from the engine bay to the rear. Since the JK has a V6, there are two hot exhaust headers and cats on either side of the engine, and the driver’s side one is pretty much exactly where I would have wanted to go straight down from the compressor. Instead, I ran over to and down the transmission tunnel behind the engine. The air hose is rated for 200C, but the cats could definitely get hot enough to cause problems. There are other wires and things in the tunnel area, so that seemed like a better plan.
I put probably 4ft of the fiberglass heat tube over the line for the trip down the tunnel and over the first part of the transmission and secured the ends with heat shrink. This stuff fits pretty loosely and in addition to blocking a lot of radiant heat, also makes me feel good about abrasion resistance and anything else in this sensitive area.
Over the top of the transmission and the transfer case, I was able to keep the line pretty much right down the middle of the chassis, going over the frame supports to keep it zip-tied down snugly and away from moving or heating parts. Over the evap canister and bracket and right into the air tank via straight fitting it went.
Once I had this run done, I was able to cut the line to length in the engine bay and used a straight push-to-connect fitting and an elbow to interface with the compressor.
After that, I used another section of the air line to go from the tank (via right-angle fitting) over to the rear passenger corner. The ARB bumper has a cover here, which is removed if you install their tire carrier. Since I don’t have that, it seemed like an obviously non-structural place that I could drill through for the line, which was also easily replaceable if I needed. The metal the bumper (and thus this cover) is made out of of is super hard, and it took a lot of drilling on my press to get a suitable hole through it.
Once I did, I was able to mount the bulkhead connector on the cover plate, with a swivel push-to-connect elbow fitting on the bottom to accept the DOT air line, and a fixed elbow on the top to accept the coupler. I used another small section of the fiberglass wrap to protect the line as it sits just below the bracket for the Gobi rack. This provides is some resistance to abrasion.
Now I have a quick-connect located on the outside of the vehicle, where it doesn’t interfere with the operation of the tailgate or the roof rack ladder, nor is it facing front or sticking out the side to be caught on anything. It’s also in the middle of the jeep-trailer system, which means making the hose reach all six tires is no longer a challenge.
I recently found out about a bunch of cool backcountry overlanding routes maintained by the folks at Oregon Off-Highway Vehicle Association (OOHVA) called the Oregon Back Country Discovery Route (OBCDR). These routes are hand-picked to provide hundreds of miles of off-road enjoyment through Oregon’s vast outdoor playground. These roads are open to the public, but cultivating and organizing the maps isn’t free. Before I ordered them I did some digging to try to find out what I was going to get for my money — I was hoping that it wouldn’t just be a single large map lacking in enough detail to really see the route. Now that I’ve received my packet, I thought I’d provide some information for other people that might be on the fence of whether or not the full set of information is worth $155. Spoiler: it is.
On the website, this is about as much detail as you’re able to see:
Clearly just an overview, you can see that the trail system is very large, but you can’t really see any detail other than the general direction and length of each section. To be honest, this is what had me most concerned before I ordered it — I was hoping I wasn’t just going to get a very large version of the above image.
Almost immediately after I ordered the full set of maps, I got an email with a tracking number and then this showed up today:
What’s that? Yep, it’s a shrink-wrapped set of spiral-bound maps. I was impressed.
Once I opened the package, I found each route in its own spiral-bound set printed on pretty decent paper with what looks like a good laser printer. These aren’t thick, glossy pages like you’d find in a book, but they also shouldn’t smudge if they get a little wet, nor tear too easily.
As you peer inside one of the bound manuals, you see that it’s arranged much like one of those large road atlases, where each section of the route is covered by a specific page. At the beginning they lay out all the pages into an index, so you can see the ordering of the pages, as well as which connects to which and by what edge:
It’s things like this that really make it clear that some time and thought has gone into this, and that you’re getting more than just a map from the forest service that is marked up by hand at high scale.
The actual map pages themselves are very easy to read, with excellent contrast and large labels. Some points along the way are marked clearly with latitude and longitude (and the datum!) so you can synchronize your GPS with the map if things get wonky out in the field. Map edges are also labeled with what map they connect to so that it’s easy to know which one to go to next when there are multiple paths off the current page.
The website promises that GPS tracks are available upon request. After receiving my packet in the mail, I sent an email to the maintainer asking about these files and received them within the hour. The tracks are separated one per file, and provide not only paths but also many waypoints along the routes. These loaded right up into Garmin Basecamp and will surely make it easy to follow. When I drive these, I’ll probably try to do most of the navigating electronically, keeping the paper maps safe for emergencies.
Possibly the only thing missing from the information provided is a little bit of a tactical overview on each route with logistics (e.g “be sure to fill up on gas before you leave this area” or “there won’t be a flat spot to camp for 20 miles”. Although the research, planning, and figuring-it-out-on-the-fly of those logistics is part of the fun. There are also a number of waypoints marked for things like formal campgrounds, and I even saw a service station indicated when crossing through a town.
So, overall, I’m quite impressed with what I have seen so far. Obviously I haven’t tried following any of these routes yet, but we’ll definitely be out there on some of them this summer. Hopefully the above overview gives you enough of an idea about what you get from OOHVA and you decide to purchase them yourself. At the time of this writing the full set of all the routes was $155, but individual routes are available for as little as $15.
This is a project I did quite a while ago, which has been working well for me ever since. I get a lot of looks when airing up my tires and so I thought I’d do a bit of a historical how-to of my setup in case other people are interested. The exact parts used here may no longer be available, or may have been revised so some improvisation will likely be required.
If you take your vehicle off-road, you probably air down your tires before you get started. That means you have to air them back up when you’re done. This can involve fancy things like on-board air, or un-fancy things like driving slowly to the nearest gas station to use their pump. It’s pretty common, however, to have a little 12v air compressor that you can use to re-inflate your tires. You typically plug it into power, turn it on, and it runs until it overheats or you have inflated all your tires.
The one I’ve got is a pretty cheap dual-cylinder one, apparently from “Q Industries”, model “Q89”. This compressor is re-badged under many names, including SmittyBilt. I got this because it was reasonably quick and pretty cheap. Surprisingly, it hasn’t let me down yet. One of the biggest downsides when I received it was that it came with non-standard fittings. I wanted to be able to plug the thing in (and maybe even mount it) and reach all four tires with the air hose without having to reposition between each one. So, the first thing I did was replace the proprietary fitting with a standard one. Luckily, after removing the original fitting on the manifold between the two cylinders, I found a 1/8″ female NPT fitting and was able to use an elbow and a couple nipples to fit a standard M-style quick connect:
This let me use standard air hoses and chucks to reach as far as I want. However, there was a problem. It was actually an opportunity and it led to a much better hack.
By default, these kinds of compressors come with air chucks that are free-flowing. When not connected to a valve, they just blow waste air out as the compressor runs. When you clamp it down on your valve stem, it creates a seal and the air is forced into the tire. The above quick-connect, however, is meant to operate differently. In a big shop, your compressor runs as needed to fill a tank, and then shuts off. If your fittings (intentionally) leaked air all the time then your compressor would have to run constantly. Thus, fitting this cheesy compressor that expects an “normally open” fitting with a “normally closed” one, you’re setting the stage for it to explode or destroy itself as it tries to compress the small volume of air in the feed lines to infinity (and beyond).
Thus, the awesome part of this hack is actually a pressure cut-off switch. This causes the compressor to turn on when the pressure in the lines drops below some number, and then cut off once it has built up pressure past a specific point. Just like a shop compressor. Turns out, this is pretty easy to accomplish. Here is a (bad) diagram of how this has to go:
In the center, you’ve got the compressor, which takes power from your battery (red/black coming in on the left) and pumps air out the grey outlet on the top. The air is fed into a manifold with both the quick connect fitting and the pressure switch attached. The pressure switch is normally closed, which means it allows the compressor to be powered until the pressure in the manifold rises above about 100PSI. When it does, it interrupts the power to the compressor. When the pressure drops again (as you start to inflate your tires), the switch allows the compressor to be powered again.
Here’s the rest of the first image above, with the pressure switch attached:
The pressure switch is the black cylindrical device on the left-hand side of the manifold, with white wire leaving the terminals and headed for the main feed.
This compressor has a “small” junction box in the feed wires just before they enter the body of the compressor. This houses a circuit breaker (and a lot of air), but provides a perfect place to interrupt the +12V line and re-route it through the pressure switch. I used 16 gauge wire for this task, which is probably a little light, although the run is short and I’ve never noticed it heating up even after extended use. There doesn’t seem to be any noticeable voltage drop such that performance of the compressor is affected.
Here’s a view of the routing of the wire. Imagine the +12V line entering the box, taking a detour out through the pressure switch and back via the white wire, and then resuming its path into the compressor itself.
When choosing a pressure switch, you’ll want one with a fairly low cut-off limit. This is the amount of pressure you’ll need to develop in order to stop the compressor from running. A very small/cheap compressor probably can’t really make 150PSI so if you have a switch rated for that, it’ll never cut off and likely burn itself up trying. It has been years since I bought mine, but as I recall it is a 90/110PSI switch. That means it turns off when the pressure reaches 110PSI and turns back on when it drops below 90PSI. Anything around this should be fine, and although it’s not exactly what I have, I think this one from Amazon is likely just fine.
So, for probably less than $25 you can make these changes to your cheap compressor and have it behave like the expensive one you have in your shop. Here’s a video of it cycling as I release the pressure with a tire inflator:
Obviously it goes without saying, but it’s your responsibility to make sure that you don’t blow up yourself or your friends using the instructions provided here. But, I hope it helps make your use of this kind of compressor a little less of a hassle.
If you’ve ever used OpenStack Nova, you’ve probably at least seen our plethora of confusing commands that involve migrating instances away from a failing (or failed) compute host. If you are brave, you’ve tried to use one or more of them, and have almost certainly chosen poorly. That is because we have several commands with similar names that do radically different things. Sorry about that.
Below I am going to try to clarify the current state of things, hopefully for the better so that people can choose properly and not be surprised (or at least, surprised again, after the initial surprise that caused you to google for this topic). Note that we have discussed changing some of these names at some point in the future. That may happen (and may reduce or increase confusion), but this post is just aimed at helping understand the current state of things.
At the heart of this issue, are the three(ish) core operations that Nova can do (server-side) to move instances around at your request. It helps to understand these first, before trying to understand everything that you can do from the client.
1. Cold Migration
Cold migration in nova means:
Shutting down the instance (if necessary)
Copying the image from the source host to the destination host
Reassigning the instance to the destination host in the database
Starting it back up (if it was running before)
This process is nice because it works without shared storage, and even lets you check it out on the destination host before telling the source host to completely forget about it. It is, however, rather disruptive as the instance has to be powered off to be moved.
Also note that the “resize” command in Nova is exactly the same as a cold migration, except that we start up the instance with more (or less) resources than it had before. Otherwise the process is identical. Cold migration (and resize) are usually operations granted to regular users.
2. Live Migration
Live migration is what it sounds like, and what most people think of when they hear the term: moving the instance from one host to another without the instance noticing (or needing to be powered off). This process is typically admin-only, requires a lot of planets to be aligned, but is very useful if tested and working properly.
Evacuate is the confusing one. If you look at the actual english definition of evacuate in this context, it basically says (paraphrased):
To remove someone (or something) from a dangerous situation to avoid harm and reach safety.
So when people see the evacuate command in Nova, they usually think “this is a thing I should do to my instances when I need to take host down for maintenance, or if a host signals a pre-failure situation.” Unfortunately, this is not at all what Nova (at the server side) means by “evacuate”.
An evacuation of an instance is done (and indeed only allowed) if the compute host that the instance is running on is marked as down. That means the failure has already happened. The core of the evacuate process in nova is actually rebuild, which in many cases is a destructive operation. So if we were to state Nova’s definition of evacuate in english words, it would be:
After a compute host has failed, rebuild my instance from the original image in another place, keeping my name, uuid, network addresses, and any other allocated resources that I had before.
Unless your instance is volume-backed*, the evacuation process is destructive. Further, you can only initiate this process after the compute host the instance is running on is down. This is clearly a much different definition from the english word, which implies an action taken to avoid failure or loss.
Footnote: In the case of volume-backed instances, the root disk of the instance is usually in a common location such as on a SAN device. In this case, the root disk is not destroyed, but any other instance state is recreated (which includes memory, ephemeral disk, swap disk, etc).
Now that you understand the fundamental operations that the server side of Nova can perform, we should talk about the client. Unfortunately, the misnamed server-side evacuate operation is further confused by some additional things in the client.
In the client, if you want to initiate any of the above operations, there is a straightforward command that maps to each:
Power off and move
Cold Migration (with resize)
Power off, move, resize
Move while running
Rebuild somewhere else
The really confusing bit comes into view because the client has a few extra commands to help automate some things.
The nova host-evacuate command does not translate directly to a server-side operation, but is more of a client-side macro or “meta operation.” When you call this command, you provide a hypervisor hostname, which the client uses to list and trigger evacuate operations on each instance running on that hypervisor. You would use this command post-failure (just like the single-instance evacuate command) to trigger evacuations of all the instances on a failed compute host.
The nova host-servers-migrate command also does not directly translate to a server-side operation, but like the above, issues a cold migration request for each instance running on the supplied hypervisor hostname. You would use this command pre-failure (just like the single-instance migrate command) to trigger migrations of all the instances on a still-running compute host.
Ready for the biggest and most confusing one, saved for last? The client also has a command called host-evacuate-live. You might be thinking: “Evacuate in nova means that the compute host is already down, how could we migrate anything live off of a dead host?” — and you would be correct. Unfortunately, this command does not do nova-style evacuations at all, but rather reflects the english definition of the word evacuate. Like its sibling above, this is a client-side meta command, that lists all instances running on the compute host, but triggers live-migration operations for each one of them. This too is a pre-failure command to get instances migrated off of a compute host before a failure or maintenance event occurs.
In tabular format to mirror the above:
Run evacuate (rebuild elsewhere)
on all instances on host
Run migrate on all
instances on host
Run live migration on all
instances on host
Hopefully the above has helped demystify or clarify the meanings of these highly related but very different operations. Unfortunately, I can’t demystify the question of “how did the naming of these commands come to be so confusing in the first place?”
This is a part of a series of posts on the details of how Nova supports live upgrades. It focuses on one of the more complicated pieces of the puzzle: how to do database schema changes with minimal disruption to the rest of the deployment, and with minimal downtime.
In the previous post on objects, I explained how Nova uses objects to maintain a consistent schema for services expecting different versions, in the face of changing persistence. That’s an important part of the strategy, as it eliminates the need to take everything down while running a set of data migrations that could take a long time to apply on even a modest data set.
Additive Schema-only Migrations
In recent cycles, Nova has enforced a requirement on all of our database migrations. They must be additive-only, and only change schema not data. Previously, it was common for a migration to add a column, move data there, and then drop the old column. Imagine my justification for adding the foobars field to the Flavor object was because I wanted to rename memory_mb. A typical offline schema/data migration might look something like this:
meta = MetaData(bind=migrate_engine)
flavors = Table('flavors', meta, autoload=True)
for flavor in flavors.select():
where(flavors.id == flavor.id).\
If you have a lot of flavors, this could take quite a while. That is a big problem because migrations like this need to be run with nothing else accessing the database — which means downtime for your Nova deployment. Imagine the pain of doing a migration like this on your instances table, which could be extremely large. Our operators have been reporting for some time that large atomic data migrations are things we just cannot keep doing. Large clouds being down for extended periods of time simply because we’re chugging through converting every record in the database is just terrible pain to inflict on deployers and users.
Instead of doing the schema change and data manipulation in a database migration like this, we only do the schema bit and save the data part for runtime. But, that means we must also separate the schema expansion (adding the new column) and contraction (removing the old column). So, the first (expansion) part of the migration would be just this:
meta = MetaData(bind=migrate_engine)
flavors = Table('flavors', meta, autoload=True)
Once the new column is there, our runtime code can start moving things to the new column. An important point to note here is that if the schema is purely additive and does not manipulate data, you can apply this change to the running database before deploying any new code. In Nova, that means you can be running Kilo, pre-apply the Liberty schema changes and then start upgrading your services in the proper order. Detaching the act of migrating the schema from actually upgrading services lets us do yet another piece at runtime before we start knocking things over. Of course, care needs to be taken to avoid schema-only migrations that require locking tables and effectively paralyzing everything while it’s running. Keep in mind that not all database engines can do the same set of operations without locking things down!
Migrating the Data Live
Consider the above example of effectively renaming memory_mb to foobars on the Flavor object. For this I need to ensure that existing flavors with only memory values are turned into flavors with only foobars values, except I need to maintain the old interface for older clients that don’t yet know about foobars. The first thing I need to do is make sure I’m converting memory to foobars when I load a Flavor, if the conversion hasn’t yet happened:
def get_by_id(cls, context, id):
flavor = cls(context=context, id=id)
db_flavor = db.get_flavor(context, id)
for field in flavor.fields:
if field not in ['memory_mb', 'foobars']:
setattr(flavor, field, db_flavor[field])
# NOTE(danms): This flavor has
# been converted
flavor.foobars = db_flavor['foobars']
# NOTE(danms): Execute hostile takeover
flavor.foobars = db_flavor['memory_mb']
When we load the object from the database, we have a chance to perform our switcheroo, setting foobars from memory_mb, if foobars is not yet set. The caller of this method doesn’t need to know which records are converted and which aren’t. If necessary, I could also arrange to have memory_mb set as well, either from the old or new value, in order to support older code that hasn’t converted to using Flavor.foobars.
The next important step of executing this change is to make sure that when we save an object that we’ve converted on load, we save it in the new format. That being, memory_mb set to NULL and foobars holding the new value. Since we’ve already expanded the database schema by adding the new column, my save() method might look like this:
Now, since we moved things from memory_mb to foobars in the query method, I just need to make sure we NULL out the old column when we save. I could be more defensive here in case some older code accidentally changed memory_mb, or try to be more efficient and only NULL out memory_mb if I decide it’s not already. With this change, I’ve moved data from one place in the database to another, at runtime, and without any of my callers knowing that it’s going on.
However, note that there is still the case of older compute nodes. Based on the earlier code, if I merely remove the foobars field from the object during backport, they will be confused to find memory_mb missing. Thus, I really need my backport method to revert to the older behavior for older nodes:
With this, nodes that only know about Flavor version 1.0 will continue to see the memory information in the proper field. Note that we need to take extra care in my save() method now, since a Flavor may have been converted on load, then backported, and then save()d.
Cleaning Up the Mess
After some amount of time, all the Flavor objects that are touched during normal operation will have had their foobars columns filled out, and their memory_mb columns emptied. At some point, we want to drop the empty column that we’re no longer using.
In Nova, we want people to be able to upgrade from one release to the other, having to only apply database schema updates once per cycle. That means we can’t actually drop the old column until the release following the expansion. So if the above expansion migration was landed in Kilo, we wouldn’t be able to land the contraction migration until Liberty (or later). When we do, we need to make sure that all the data was moved out of the old column before we drop it and that any nodes accessing the database will no longer assume the presence of that column. So the contraction migration might look like this:
count = select([func.count()]).select_from(flavors).\
where(memory_mb != None)
raise Exception('Some Flavors not migrated!')
Of course, if you do this, you need to make sure that all the flavors will be migrated before the deployer applies this migration. In Nova, we provide nova-manage commands to background-migrate small batches of objects and document the need in the release notes. Active objects will be migrated automatically at runtime, and any that aren’t touched as part of normal operation will be migrated by the operator in the background. The important part to remember is that all of this happens while the system is running. See step 7 here for an example of how this worked in Kilo.
Doing online migrations, whether during activity or in the background, is not free and can generate non-trivial load. Ideally those migrations would be as efficient as possible, not re-converting data multiple times and not incurring significant overhead checking to see if each record has been migrated every time. However, some extra runtime overhead is almost always better than an extended period of downtime, especially when it can be throttled and done efficiently to avoid significant performance changes.
Online Migrations Are Hard Worth It
Applying all the techniques thus far, we have now exposed a trap that is worth explaining. If you have many nodes accessing the database directly, you need to be careful to avoid breaking services running older code while deploying the new ones. In the example above, if you apply the schema updates and then upgrade one service that starts moving things from memory_mb to foobars, what happens to the older services that don’t know about foobars? As far as they know, flavors start getting NULL memory_mb values, which will undoubtedly lead them to failure.
In Nova, we alleviate this problem by requiring most of the nodes (i.e. all the compute services) to use conductor to access the database. Since conductor is always upgraded first, it knows about the new schema before anything else. Since all the computes access the database through conductor with versioned object RPC, conductor knows when an older node needs special attention (i.e. backporting).