Recently a friend of mine signed up for and started using a jabber.com account to chat with me. I have run my own jabber server for almost ten years now and I’ve never had problems with the server-to-server (S2S) aspect until now. For some reason, the jabber.com SRV records seem to fail to resolve at times, which was occasionally killing the jabber.com S2S connection with my server. It seemed like the connection would occasionally recycle, which caused my server to lookup the SRV records. If that failed (which was happening multiple times per day) then I would be unable to communicate with jabber.com contacts for several minutes. Their status showed as something like “404: Server not found”. In the logs of my Openfire server, I saw items pointing to the failed DNS lookups.
After asking what to do in the Openfire forums, someone mentioned that they had the same issues due to sporadic lookup failures on the jabber.com SRV records. They suggested spoofing the necessary records to fool my server into connecting to the proper IPs without having to perform an actual lookup.
It is pretty silly that I have to do this, but I ended up making it work by running a local copy of BIND and hosting the jabber.com zone myself internally. This seemed to resolve the problem for me, which is good. Later when I was working on a different project, I noticed that dnsmasq now has the ability to spoof SRV records as well. I decided to switch to using it to do the job instead of bind.
My server is running CentOS 5.x, which has a dnsmasq package available. I installed it via yum:
yum install -y dnsmasq
Next, I edited the /etc/dnsmasq.conf file and added the following lines:
expand-hosts
resolv-file=/etc/resolv.masq
srv-host=_xmpp-server._tcp.jabber.org,hermes.jabber.org,5269,1
srv-host=_xmpp-server._tcp.jabber.com,jabber.com,5269,1
srv-host=_xmpp-server._tcp.jabber.com,denjab2a.jabber.com,5269,1
Finally, I put the following entries in /etc/hosts:
208.68.163.220 hermes.jabber.org
216.24.133.9 denjab2a.jabber.com
216.24.133.14 jabber.com
Note that the host entries may become stale, so some babysitting of those may be required. I decided to override jabber.org as well since I saw a few similar error messages in the logs for that domain as well.
Next, you need to put your own DNS servers in /etc/resolv.masq so that dnsmasq knows where to forward normal requests. Something like the following would work, substituting your own DNS server IP addresses:
nameserver 1.2.3.4
nameserver 5.6.7.8
Finally, you need to tell your system resolver to use the local machine (running dnsmasq) for queries. Set the nameserver in /etc/resolv.conf to localhost:
nameserver 127.0.0.1
Now you can start dnsmasq and configure it to start at boot:
service dnsmasq start
chkconfig dnsmasq on
A restart of openfire (or whatever you’re using) would probably be appropriate as well.