SBLIM and their broken ABI

The CMPI standard defines the interface between a conforming provider and the CIMOM that loads it.  It’s a massive set of data structures, types, and functions that allow the CIM server to delegate requests to providers, and for providers to make use of broker services.  Like any standard ABI, it must be standard and uniform to allow any provider binary to be loadable in any CIMOM.

Yesterday, I was debugging a strange crash that I was seeing with Pegasus (as shipped in Fedora Smilie: 8), which was related to libcmpiutil’s asynchronous indication support.  The indication is triggered using the CBInvokeMethod broker callback function.  Adding debug print statements galore, I tracked it down to where the actual callback was being made.  Since I can’t (and don’t want to) add debug print statements to Pegasus, I re-ran the whole thing in gdb.  Sure enough, the crash is somewhere in Pegasus.  However, I spotted something very strange in the stack trace:

#0  Pegasus::value2CIMValue (data=0x2aaab285e312, type=63920, rc=0x41e00e7c)
at CMPI_Value.cpp:70
#1  0x00002aaab16b7e2c in mbSetProperty (mb=0x5555558ab1d0,
ctx=0x5555558ab1d0, cop=0x41e01450, name=0x555555864550 “”,
val=0x2aaab285e312, type=63920) at CMPI_Broker.cpp:562
#2  0x00002aaab285d97d in stdi_trigger_indication ()
from /usr/lib64/libcmpiutil.so.0
#3  0x00002aaab18ec099 in trigger_indication (context=0x41e01450,
base_type=<value optimized out>, ns=0x555555829f30 “root/virt”Smilie: ;)
at Virt_VirtualSystemManagementService.c:362
<snip>

The stdi_trigger_indication() function is where the CBInvokeMethod() call is made, so why is the next stack frame in the mbSetProperty() function?  I stewed on this a bit and then happened to notice that in cmpift.h, setProperty comes right after invokeMethod.  This was the key to the puzzle.  Somehow, my _CMPIBrokerFT function table was off by a void*, which meant that I neatly executed the next function down, which was sure to fail.  But why?

I noticed that there were two cmpift.h files on my system, one in /usr/include/cmpi, belonging to the sblim-cmpi-devel package, and another in /usr/include/Pegasus/CMPI belonging to the tog-pegasus package.  I thought it was a long shot, but I wrote a tiny program to print the size of the _CMPIBrokerFT structure, and compiled it against each, and sure enough: the sblim version was 8 bytes longer!

Upon further investigation, I found the problem.  The very first parameter of _CMPIBrokerFT was an unsigned long in the sblim case and an unsigned int in the Pegasus case.  On a 32-bit platform, they are both 4-byte types, but on a 64-bit machine, the unsigned long is 8 bytes.  Neatly aligning the first function pointer to an 8-byte boundary meant that all my functions still “worked”, but in an off-by-one crash-you-where-you-least-expect-it kinda way.

I reported it to the sblim mailing list and got a confirmation that indeed the sblim version of cmpift.h is wrong and needs to change.  Until then, make sure you don’t have the sblim headers on your machine when compiling 64-bit providers to be used under Pegasus or you’ll regret it!

Category(s): Codemonkeying
Tags: , ,

Comments are closed.