Saturday, July 22, 2017

Mucking around with IPv6 and illumos zones

The world is running out of IPv4 addresses, and it's time to move to IPv6.

I remember that story being told over and over at conferences in the mid 1990s. Yet, here we are in 2017 and while there has been progress, we're definitely not there yet.

With zones, illumos (and Solaris) give you virtualized application environments (containers is the trendy term - we tend not to use that in the Solarish context because it got polluted by Sun marketing). Those environments (usually) need to be networked, so why not with IPv6.

So here goes with a few notes on the subject.

Shared-IP zones


With zones, the original networking model was a shared-ip stack, where the zone is given a fully configured network that is just a virtual IP configured on an existing interface. All the setup is done in the global zone, which makes it very easy.

(By the way, this was the cause of the limit of 8192 zones per system, because you can only have 8192 virtual addresses on a single physical interface.)

And configuring an IPv4 address is just a case of adding a net section to the zone configuration:

add net
set physical=aggr1
set address=172.18.1.172/24
end

It's exactly the same for IPv6, the only interesting issue is what the IPv6 address would be. Let's start with the link-local address - the one that starts with fe80:: - as you will generally need that even if you don't have a routed IPv6 network. For a physical interface, the IPv6 address is usually derived from the MAC address. We can't use that one, because we're sharing the interface and the global zone has already grabbed it. So the convention here is to construct something from the IPv4 address. It's then guaranteed to be unique in a broadcast network, which is all that matters for a link-local address. So all we have to do is convert the IPv4 address to hex, for example with printf:

printf "%x%x:%x%x\n" 172 18 1 172

which gives ac12:1ac, so the link-local address would be configured as:

add net
set physical=aggr1
set address=fe80::ac12:1ac/10
end

You're pretty much done here, if you do that your global and non-global zones will be able to communicate using IPv6 on the local subnet.

If you had a routable prefix, then the same scheme can be applied. Just put the fragment onto the end of your prefix.

add net
set physical=aggr1
set address=XXXX:XXXX:XXXX:XXXX::ac12:1ac/64
end


Of course, if you're assigned specific IPv6 addresses then you can use those directly. The above scheme is pretty trivial to script, though (and it actually makes it fairly easy to keep your DNS zone files up to date too).

Exclusive-IP zones


For an exclusive-ip zone, you just hand over a network interface to a zone and let it go figure. So it can assign whatever addresses it likes.

In particular, the zone can use the normal MAC address scheme to generate its IPv6 link-local address.

Originally in older Solaris, you needed to use a genuine physical interface. Which limited you a little bit as there are only so many network cards you can jam into a server. OpenSolaris introduced full network virtualization in the form of Crossbow, so any illumos distribution or Solaris 11 can create fully virtualized network stacks and present those to zones in the same way.

In Tribblix, I use zap to manage zones, and it takes care of creating the appropriate vnics and, if appropriate, etherstubs, and wiring things together. I also poke functional /etc/hostname.* and /etc/defaultrouter files into the zone so the networking at least gets configured when the zone boots. Adding IPv6 to the zone is simply a case of creating matching empty /etc/hostname6.* files (one for each vnic) and the IPv6 addresses will get autoconfigured.

There's one wrinkle with exclusive-ip that deserves a whole section, that of restricting the zone to only using addresses that you've set.

Restricting with allowed-address


Remember that an exclusive-ip zone can manage the network interface. So it could allocate the wrong address and generally cause havoc on the network. To prevent this, set the allowed-address property on the interface. For example:

add net
set physical=vnic1
set allowed-address=172.18.1
.172/24
end

The zone manages the interface, but an attempt to configure an invalid address will be thwarted.

You can see what's happening under the hood by running dladm show-linkprop. You'll see that the protection and allowed-ips properties are set.

(As an aside, it would be fantastic if illumos got the configure-allowed-address feature that Solaris 11 has, which would bypass my trickery in having to poke the network setup files in the zone.)

The same thing works for IPv6. The first problem I discovered is that (unlike Solaris 11) illumos won't accept multiple addresses in a list. Initially I thought this was something about IPv6, but it turns out you need to specify each address you want to add in a separate block.

The next question is going to be - what is the IPv6 address going to be? It's derived from the MAC address, so isn't fixed in advance.

The first step is to get the properties of the vnic. Running dladm show-vnic will give you the properties you need, including the MAC address. If you just want the one field, that's fairly easy too.

# dladm show-vnic -p -o MACADDRESS vnic1
2:8:20:c6:71:d

The IPv6 address is made up from that as a EUI-64 address, which is basically the fe80:: prefix, the first 3 octets, then ff:fe, then the last 3 octets. Oh, and the 7th bit gets flipped. And conventionally the leading zero gets suppressed. An ugly way of scripting this in ksh looks like:

/usr/sbin/dladm show-vnic -p -o MACADDRESS $VNIC | \
    /usr/bin/sed 's=:= =g' | read o1 o2 o3 o4 o5 o6
integer -i2 vi=16#$o1
integer -i2 nvi
nvi=$(($vi ^ 2#00000010))
integer -i16 xv=$nvi
no1=${xv/16#/}
if [ "$no1" = "0" ]; then
    no1=""

fi
if [ "$no3" = "0" ]; then
    no3=""

fi
if [ "$no5" = "0" ]; then
    no5=""

fi
printf "fe80::%s%s:%sff:fe%s:%s%s/10" "$no1" "$o2" "$o3" "$o4" "$o5" "$o6"

So, what you want to do is add something like:

add net
set physical=vnic1
set allowed-address=fe80::8:20ff:fec6:71d/10
end

to your zone configuration. And if you have a routable IPv6 address, you'll need to duplicate the block again for that.

In practice, this didn't quite work for me. If you don't set the allowed-address properties then the addresses get configured correctly, but with the protection set the address doesn't come up properly. If you try it then you get:

# ifconfig vnic1 inet6 up
ifconfig: setifflags: SIOCSLIFFLAGS: vnic1: Invalid argument

However, if you explicitly set the address, running ifconfig by hand:

# ifconfig vnic1 inet6 fe80::8:20ff:fec6:71d/10 up

then it works perfectly.

Thursday, July 13, 2017

What gets into Tribblix?

The software available for Tribblix is a bit of an eclectic mix. How do I choose what software to package?

There are actually a number of different reasons why you get a particular package.

The basics


Some packages are just basic,and you expect to find them. Much of the GNU stack comes in this way. And often things like Perl and Python are a foundational requirement for a lot of other tools.

What I want personally


There are a number of areas where I have specific interests - I'm a bit of a magpie when it comes to X11 window managers, for example. And I need to open office documents, so I had to get LibreOffice working. There a few games or emulators that I like. This also explains why some things might not be present too - I have no real interest in video or multimedia, for example, so that's an area with relatively little coverage.

Oh, that looks cool


I'm often interested in new stuff. (Even if it's actually old stuff that's just new to me.) So if I come across a piece of software and think "that looks cool" I'll often try and build and package it. If it works, fine, it ends up in the repository. Even if I might not end up doing anything with it, I've gone to the effort of making a package so it may as well stay there and somebody else might make use of it.

I have a $DAYJOB


Yes, I have a day job (a very good one, thank you very much), and it involves running applications on illumos. If I'm going to evaluate software I'll do it on Tribblix first. Building stuff on Tribblix is much easier than on, say, OmniOS - I have total control of the environment, and many more tools and prerequisite packages to give me a head start. So I can easily screen out any software that simply isn't going to work, and identify any patches or modifications necessary, before heading into the rather more constrained work environment.

Can you make X or Y or Z available


I get requests from users. I'll pretty much always at least try to add the software asked for - the fact that someone's bothered to ask indicates it might be useful, and I might find it interesting as well. This doesn't always work, of course, and I'll have to punt.

I got bored one day


Sometimes I get a bit of free time (no, this doesn't happen very often), and start looking for packages that might be worth adding. Sometimes this involves looking at other systems to see what they ship.

It's a prerequisite for something else


Dependency hell is a fact of life, so a lot of the time is actually spent building prerequisites. This is one reason for speculatively trying things out - it identifies prerequisites, and they're often going to be needed by other packages too. Although what you'll find is a number of packages with no obvious consumers, because the software I wanted didn't actually work. As I mentioned before, though, I'll keep those packages I built, and they might come in useful later.

Thursday, July 06, 2017

Running LX zones with Tribblix

I mentioned a few months ago a little project I had been working on - nicknamed omnitribblix, it's regular Tribblix with the illumos components coming from illumos-omnios (now via OmniOS Community Edition) rather than vanilla illumos-gate.

One of the changes I made in the recent Milestone 20 update was to split out the release packages to give more flexibility.

Thiis allowed me to release a micro update to Milestone 20 (imaginatively called m20.1 or update 1), which updates the illumos bits but shares the same main package repository as the main Milestone 20 release.

And the other thing I can now do is build variant releases. So Tribblix has an LX variant!

You can download the omnitribblix ISO image from the Tribblix download page. It installs, operates, and is packaged just like regular Tribblix. If you don't use LX zones, you probably wouldn't notice the difference.

(It's versioned as m20lx.1 - the update 1 there means that it's a parallel release to the regular Tribblix Milestone 20 update 1.)

You can also update to the LX variant from either the regular Milestone 20 or Milestone 20 update 1 releases, in the normal way. It's a micro update, or sidegrade perhaps, but uses the same upgrade process as regular upgrades.

And, because of the magic of boot environments, if there's a problem you can roll back.

Anyway, once you have omnitribblix installed, how do you create an LX zone? Very easily, in the same way you create and destroy other zones on Tribblix, using the zap utility.

Before you can do that, though, you need a Linux image of some sort to install.

I've been using the same images I use under Docker. So, for example, if I want Alpine then I would go:

docker run alpine uname -a

and then get the name of the container

docker ps -a

and then export that with

docker export romantic_galileo > alpine.tar

Then copy the alpine.tar file to your omnitribblix system. If you want something a bit richer, then Ubuntu will work. But generally exporting a Docker container like this will work, and the image characteristics will be a good fit for a zone.

And then all you do to create the zone is use zap, specifying that it's an lx brand and telling it where the tarball is:

zap create-zone -z alpine -t lx \
-x 10.0.2.99 -I /tmp/alpine.tar

and just zlogin to it as normal.

There are constraints around networking - you have to be exclusive-ip (the -x flag) and zap will create (and destroy) the vnic for you automatically. But the networking in the zone won't actually be configured. (While you specify the IP address in the command, that just tells zap how to configure the network plumbing and the vnic.) You'll have to log in to the zone and use the native tools to identify and configure the network, like so:

/native/sbin/ifconfig -a
/native/sbin/ifconfig znic0 inet 10.0.2.99 up
/native/usr/sbin/route add net default 10.0.2.2

And off you go. Sitting on an illumos box with all its goodness, with access to the wide variety of the Linux ecosystem at your fingertips.

Sunday, June 18, 2017

Tweaking binaries with elfedit

On Solaris and illumos, you can inspect shared objects (binaries and libraries) with elfdump. In the most common case, you're simply looking for what shared libraries you're linked against, in which case it's elfdump -d (or, for those of us who were doing this years before elfdump came into existence, dump -Lv). For example:

% elfdump -d /bin/true

Dynamic Section:  .dynamic
     index  tag                value
       [0]  NEEDED            0x1d6               libc.so.1
       [1]  INIT              0x8050d20          

and it goes on a bit. But basically you're looking at the NEEDED lines to see which shared libraries you need. (The other field that's generally of interest for a shared library is the SONAME field.)

However, you can go beyond this, and use elfedit to manipulate what's present here. You can essentially replicate the above with:

elfedit -r -e dyn:dump /bin/true

Here the -r flag says read-only (we're just looking), and -e says execute the command that follows, which is dyn:dump - or just show the dynamic section.

If you look around, you'll see that the classic example is to set the runpath (which you might see as RPATH or RUNPATH in the dump output). This was used to fix up binaries that had been built incorrectly, or where you've moved the libraries somewhere other than where the binary normally looks for them. Which might look like:

elfedit -e 'dyn:runpath /my/local/lib' prog

This is the first example in the man page, and the standard example wherever you look. (Note the quotes - that's a single command input to elfedit.)

However, another common case I come across is where libtool has completely mangled the link so the full pathname of the library (at build time, no less) has been embedded in the binary (either in absolute or relative form). In other words, rather than the NEEDED section being

libfoo.so.1

it ends up being

/home/ptribble/build/bar/.libs/libfoo.so.1

With this sort of error, no amount of tinkering with RPATH is going to help the binary find the library. Fortunately, elfedit can help us here too.

First you need to work out which element you want to modify. Back to elfedit again to dump out the structure

% elfedit -r -e dyn:dump /bin/baz
     index  tag                value
       [0]  POSFLAG_1         0x1                 [ LAZY ]
       [1]  NEEDED            0x8e2               /home/.../libfoo.so.1

It might be further down, of course. But the entry we want to edit is index number 1. We can narrow down the output just to this element by using the -dynndx flag to the dyn:dump command, for example

elfedit -r -e 'dyn:dump -dynndx 1' /bin/baz

or, equivalently, using dyn:value

elfedit -r -e 'dyn:value -dynndx 1' /bin/baz

And we can actually set the value as well. This requires the -s flag to set a string, but you end up with:

elfedit -e 'dyn:value -dynndx -s 1 libfoo.so.1' /bin/baz

and then if you use elfdump or elfedit or ldd to look at the binary, it should pick up the library correctly.

This is really very simple (the hardest part is having to work out what the index of the right entry is). I didn't find anything when searching that actually describes how simple it is, so I thought it worth documenting for the next time I need it.


Friday, June 09, 2017

On Tribblix Milestone 20

Having released a new update for Tribblix, I thought I would add a little commentary on the progress that's being made and the direction things are going in.

This goes beyond the rather dry release notes and list of what's changed.

The big structural change is that the ISO has been built as a single root archive, rather than the old way with a split-off /usr that's lofi-mounted from a compressed image.

The original reason for doing this (and I experimented with it a while ago) was to allow installation on systems without drivers for the device that you're booting from. This might be a system with only USB3 ports, or I've had problems with laptops where illumos doesn't recognize the CD drive. The boot loader (and BIOS) load the initial boot archive, so if you don't need to ever talk to the media device again you're in much better shape.

While we now have USB3 support, this simplified boot is a good thing in any case, and it allows some neat tricks like iPXE boot.

Another logical change is in the release mechanism itself. I've discussed the Tribblix package repositories before. The snag with the traditional repository layout was that the packages that defined a release were in the main Tribblix repository. So, every time I make a new release I end up having to create a whole new Tribblix repository. Every time I update the illumos packages, I needed a new Tribblix repository. Creating a new one isn't too bad; ongoing support for multiple repositories is a lot of unnecessary work.

The way to fix this is to split out the packages (there are 3 of them) that define the properties of a release into their own separate repo. This allows at least 2 new possibilities:

  1. I can release updated illumos packages without spinning a whole new Tribblix release. It would still use the same upgrade mechanism, but the main Tribblix repo is shared and it's a much lighter release process.
  2. I could create variants or spins. For example, I could create a variant that has LX (see omnitribblix). This would just have a different set of illumos packages but shares everything else. Or I could build a 32-bit or 64-bit only distro.
I haven't yet done either of those things, but it's going to happen.

Behind the scenes I've been gradually working to get more packages - especially those that deliver libraries - built as both 32-bit and 64-bit.

Tribblix is fairly clear that it will continue to support 32-bit and 64-bit hardware, at least for a while. (Whereas both OmniOS and OpenIndiana have effectively dropped 32-bit compatibility, mostly by neglect rather than design.) Of course, there is a reasonable amount of software now that's only 64-bit (anything built with go, for example, or OpenJDK 8), but there's a reasonable chance the people using 32-bit hardware aren't necessarily going to want the latest and greatest applications. (This isn't 100% true, by the way - sometime you have to interoperate with other facilities in the environment.) But eventually we're going to have to make a full 64-bit transition, and it would be good to be ready.

That gives a rough idea of the work that's currently underway. Looking ahead, there are a whole long list of packages that need adding or updating (such is a maintainer's life). The one significant place I have been falling behind is that I haven't updated gcc, so that needs work. And, of course, I'm trying to get SPARC into some sort of reasonable shape. But, overall, Tribblix is now pretty solid and a bit more polish and attention to detail would benefit it greatly.

Wednesday, June 07, 2017

Installing Tribblix on Vultr using iPXE

One of the new features in Tribblix 0m20 is that booting and installing using iPXE now works.

Here's an example of using this functionality to install a server running Tribblix in the Vultr cloud. A similar mechanism ought to work for any other provider that allows iPXE boot.

I'm assuming you have signed up and logged in, then go to deploy a server.

First choose where you want to deploy the server. I'm in the UK, so London is a good choice.


Then the critical bit, selecting the Server Type. The bit you want here is in a slightly confusing location, under the "Upload ISO" tab. But then select the "iPXE" radio button and put in the value http://pkgs.tribblix.org/m20/ipxe.txt


The other key option is Server Size. As with many providers, there's a simple scale. For testing, an instance with 1G of memory is more than adequate.


The deploy it. After a few seconds of installing you can then click the link to manage the server, and then view the console, which uses VNC.

If you're reasonably quick you get to see the initial iPXE screen, and can see it downloading the images:


What you can see here is that it's downloaded the original ipxe script we specified. This looks like:

#!ipxe
dhcp
kernel /m20/platform/i86pc/kernel/amd64/unix
initrd /m20/platform/i86pc/boot_archive
boot
 
Which just says to set up the network using dhcp (this might have already been done, but if you're booting off an ipxe iso it may not have been, so we do it anyway), then download the kernel and the boot archive, then boot from what you've just downloaded.

The kernel and the boot archive are on the iso, I've just unpacked them on the server (so the URL given above for the ipxe script will be reasonably permanent for anybody to use). The only slight tweak I've had to make is that the original boot archive is actually gzip compressed and iPXE can't handle that, so it's been uncompressed. The boot archive also now contains the /usr file system as well, rather than it being split off as before. While I'm sure you could mangle the system to download it and sort things out, it's so much easier to put it inside the boot archive.

Then you get into the normal installer, so log in as jack, su to root, and see what disk(s) are available using the new diskinfo tool. Then you can install Tribblix to that disk:



Don't bother adding additional overlays at this point. It won't work - and you'll get an error about not being able to install overlays (you'll get the error anyway because the installer always tries to add some packages that aren't needed in the live environment). This will be fixed in a future update, but it's relatively harmless.

The other thing you should do before the installation is to change the passwords for root and jack. If you change them before running the installer than the change will propagate to the installed system (because all it's doing is a copy). You really don't want the system to boot up wide open to the internet with the default (and well known) passwords.

Once the (pretty quick) install finishes, it'll look like this:


That's just like a normal install, other than the missing overlays. Then just reboot and you'll soon see the new loader, followed by the system booting.

Due to the missing overlays, you'll get an error about the intrd service failing. You'll have to log in (ssh will work at this point) and then add at least the base overlay:

zap install-overlay base

Plus whatever other overlays you might want. Then you can clear the intrd service and you're good to go.

Friday, June 02, 2017

Tribblix memory requirements

Compared to the other illumos distributions, Tribblix has lower memory requirements.

I'm not talking about crazy stunts like running in 48M; here I'm talking about running a fully fledged system.

I've been doing a bit of testing of the upcoming release, which includes running the install under a range of configurations. The test here is to boot the ISO image in VirtualBox with a range of memory sizes and then install the kitchen sink.
  • The live image won't boot at all on a 256M system
  • The live image will boot on a 512M system, but installing to zfs will fail
  • However, installing to ufs works on a 512M system
  • With 768M, installation to zfs is rather slow
  • With 1G or more, you're fine
The upcoming release is going to be built slightly differently, in that it's no longer a split-off /usr configuration. (I discussed how that worked and those strange zlib files some time ago.) The latest OmniOS is a single image; SmartOS likewise. It's just so much easier to construct, and far more reliable.

That change explains the 256M failure - the ramdisk is about 300M, so it simply won't fit. It's likely to have an impact on the 512M case too - in the old scenario you only paged in the bits of the /usr filesystem as and if you needed them, now it's locked into memory.

On a limited memory system there's a way to make things a bit easier. Simply install the base (no additional overlays) from the installer, then add the rest of the overlays and packages later. The point here is that running from disk doesn't lock up anywhere near as much memory as the full OS being resident in RAM does. And some of the packages in the kitchen sink are rather large, which causes problems.

Once you've got Tribblix installed, how well does it cope? Surprisingly well, to be honest. The Xfce desktop runs quite well in either 512M or 768M of memory. I can run firefox on the 768M system without too many problems (given the way it consumes memory, probably not for a long intensive browsing session), while firefox on a 512M system does run, but it's clearly starting to grind. Java applications work, some smaller ones at least. You need to be realistic in your expectations, but the point is that smaller systems do work.

The most limited systems would tend to be older, possibly 32-bit hardware. I could build a 32-bit only image which would be quite a bit smaller - maybe only two-thirds the size. (And if you really wanted to you could get it even smaller - but then you're in the realms of building custom images using mvi or the like.)

However, the aim of keeping Tribblix viable on smallish systems isn't just to allow the use of old hardware, beneficial though that is. If you're running a service on a cloud or hosting provider then being able to use a 1G server instead of a 2G server will halve your costs, and that's a very good thing to be able to do.