Tuesday, September 25, 2012

Recursive zfs send and receive

I normally keep my ZFS filesystem hierarchy simple, so that the filesystem boundary is the boundary of administrative activity. So migrating a filesystem from one place to another is usually as simple as:

zfs snapshot tank/a@copy
zfs send tank/a@copy | zfs recv cistern/a

However, if you have child filesystems, and clones in particular, that you want to move, then it's slightly more involved. Suppose you have the following

tank/a/myfiles
tank/a/myfiles@clone
tank/a/myfiles-clone

where myfiles-clone is a clone of the @clone snapshot. I often create these temporarily if someone wants a copy of some data in a slightly different layout. In today's case, it had taken some time to shuffle the files around in the clone and I didn't want to have to do that all over again.

So, ZFS has recursive send and receive. The first thing I learnt is that myfiles-clone isn't really a descendant of myfiles - think of it as more of a sibling. So in this case you start from tank/a and send everything under that. First create a recursive snapshot:

zfs snapshot -r tank/a@copy

Then to send the whole lot

zfs send -R tank/a@copy

I was rather naive and thought that

zfs send -R tank/a@copy | zfs recv cistern/a

would do what I wanted - simply drop everything into cistern/a, and was somewhat surprised that this doesn't work. Particularly as ZFS almost always just works and does what you expect.

What went wrong? What I think happens here is that the recursive copy effectively does:

zfs send tank/a@copy | zfs recv cistern/a
zfs send tank/a/myfiles@copy | zfs recv cistern/a
zfs send tank/a/myfiles-clone@copy | zfs recv cistern/a

and attempts to put all the child filesystems in the same place, which fails rather badly. (You can see the hierarchy that would be created on the receiving side by using 'zfs recv -vn'.)

The way to solve this is to use the -e or -d options of zfs recv, like so:

zfs send -R tank/a@copy | zfs recv -d cistern/a

or

zfs send -R tank/a@copy | zfs recv -e cistern/a

In both cases it uses the name of the source dataset to construct the name at the destination, so it will lay it out properly.

The difference (read the man page) is that -e simply uses the last part of the source name at the destination. In this example, this was fine, but if you start off with a hierarchy it will get flattened (and you could potentially have naming collisions). With -d, it will just strip off the beginning (in this case, tank), so that the structure of the hierarchy is preserved, although you may end up with extra levels at the destination. If it's not quite right, though, zfs rename can sort it all out.

1 comment:

Mike Gerdts said...

If use use the -v option to zfs send, you will see that when you send a clone, it sends an incremental stream from the clone's origin (zfs list -o name,origin) to the requested snapshot. The data that predates your @clone snapshot is not included in the send stream unless you are also sending an @copy of that snapshotted file system. The @copy must be newer than the @clone. You can also use zstreamdump to understand the content of a zfs stream. Note that the first section of output dumps an nvlist of intent. Some of the snapshots listed in the initial section may not be present in the stream.

FWIW, I recognized a variant of this problem when doing some work on support for using zfs archives with zoneadm -a . To simplify my scenario, I made some changes to zfs send such that it now has the -r (recursive compared to -R (replication)) and -c (self-contained). Thus:

zfs snapshot -r zones/z1@snap
zfs send -rc zones/z1@snap

will now create recursive, self-contained archive even if z1 is a clone of z0. If z1 has multiple boot environments (which are implemented as clones), the snapshot:clone relationship is preserved.

My change first appears in Solaris 11 and is not present in illumos.