Low-latency continuous rsync

Okay, so “lowish-latency” would be more appropriate.

I regularly work on systems that are fairly distant, over relatively high-latency links. That means that I don’t want to run my editor there because 300ms between pressing a key and seeing it show up is maddening. Further, with something as large as the Linux kernel, editor integration with cscope is a huge time saver and pushing enough configuration to do that on each box I work on is annoying. Lately, the speed of the notebook I’m working from often outpaces that of the supposedly-fast machine I’m working on. For many tasks, a four-core, two threads per core, 10GB RAM laptop with an Intel SSD will smoke a 4GHz PowerPC LPAR with 2GB RAM.

I don’t really want to go to the trouble of cross-compiling the kernels on my laptop, so that’s the only piece I want to do remotely. Thus, I want to have high-speed access to the tree I’m working on from my local disk for editing, grep’ing, and cscope’ing. But, I want the changes to be synchronized (without introducing any user-perceived delay) to the distant machine in the background for when I’m ready to compile. Ideally, this would be some sort of rsync-like tool that uses inotify to notice changes and keep them synchronized to the remote machine over a persistent connection. However, I know of no such tool and haven’t been sufficiently annoyed to sit down and write one.

One can, however, achieve a reasonable approximation of this by gluing existing components together. The inotifywait tool from the inotify-tools provides a way to watch a directory and spit out a live list of changed files without much effort. Of course, rsync can handle the syncing for you, but not with a persistent connection. This script mostly does what I want:

#!/bin/bash

DEST="$1"

if [ -z "$DEST" ]; then exit 1; fi

inotifywait -r -m -e close_write --format '%w%f' . |\
while read file
do
        echo $file
	rsync -azvq $file ${DEST}/$file
	echo -n 'Completed at '
	date
done

That will monitor the local directory and synchronize it to the remote host every time a file changes. I run it like this:

sync.sh dan@myhost.domain.com:my-kernel-tree/

It’s horribly inefficient of course, but it does the job. The latency for edits to show up on the other end, although not intolerable, is higher than I’d like. The boxes I’m working on these days are in Minnesota, and I have to access them over a VPN which terminates in New York. That means packets leave Portland for Seattle, jump over to Denver, Chicago, Washington DC, then up to New York before they bounce back to Minnesota. Initiating an SSH connection every time the script synchronizes a file requires some chatting back and forth over that link, and thus is fairly slow.

Looking at how I might reduce the setup time for the SSH links, I stumbled across an incredibly cool feature available in recent versions of OpenSSH: connection multiplexing. With this enabled, you pay the high setup cost only the first time you connect to a host. Subsequent connections re-use the same tunnel as the first one, making the process nearly instant. To get this enabled for just the host I’m using, I added this to my ~/.ssh/config file:

Host myhost.domain.com
    ControlMaster auto
    ControlPath /tmp/%h%p%r

Now, all I do is ssh to the box each time I boot it (which I would do anyway) and the sync.sh script from above re-uses that connection for file synchronization. It’s still not the same as a shared filesystem, but it’s pretty dang close, especially for a few lines of config and shell scripting. Kernel development on these distant boxes is now much less painful.

Category(s): Codemonkeying
Tags: , ,

4 Responses to Low-latency continuous rsync