How to Install Ubuntu 10.04 over a LAN

Overview and Requirements

So I’ve volunteered to help Boulder Community Computers set up some automated Ubuntu OS installations over their local area network. Being as this is a large part of what I do professionally, this is something easy for me to do that should help them be more efficient. Their network layout is going to look something like this I think:

Internet
   ^
   |
Office LAN (private addresses)<-> Various workstations (DHCP from router)
   ^
   |
Multihomed Netboot Server
   ^
   |
OS Build Network (other private addresses)<->Bare metal target machines needing  an OS (DHCP from Netboot Server)

OK, so let’s talk about some of the goals/requirements:

  • I want the build network to be separate from the office network. This will keep things simple, avoid DHCP collisions, prevent users from accidentally provisioning their workstations, and keep the intense network traffic of OS installation off the office LAN
  • I want to mainly download cached objects from the Netboot server to the targets. I want to avoid downloading the OS over the Internet connection repeatedly.
  • I don’t want a lot of ongoing maintenance tasks
  • I would like to be able to do a fully automatic hands-off installation
  • I would also like to be able to boot into a live CD to test compatibility
  • I would like to be able to do an interactive custom install
  • I do want internet access from the OS Build Network (mostly for NTP, but it’s just handy in general)

So overall, Ubuntu can meet all of these goals easily. Here are the details.

Setting up the Multihomed Network Install Server

So we want an Ubuntu server machine with two network interfaces (multihomed) to act as our network install server. This machine will provide DHCP/PXE, TFTP, NFS, HTTP, and HTTP proxy services to the target machines. It will also act as their default gateway, routing their traffic from the OS Build Network out to the Office LAN and then out onto the Internet and back. Here’s how we get this server installed and configured.

Install a basic Ubuntu 10.04 Server amd64 host

I used a VirtualBox VM for this while testing this setup, and we will probably use that in their “production” environment as well. If you use a physical computer, make sure it has 2 network connections. You can select “OpenSSH Server” during the install or install openssh-server later. Once the OS is up and running, we need to configure both of the network interfaces. We’ll be using RFC 1918 private addresses for both. In this case we have a 10.0.0.0/8 network as eth0 which is the Office LAN and a 192.168.0.0/16 network as eth1 which is the OS Build Network. The Office LAN is your typical home or office type network where a router is serving private IP addresses over DHCP and providing Internet access. In VirtualBox I set the first interface to be a bridged interface eth0 and the second interface to an internal only interface eth1. To do this we edit /etc/network/interfaces as root as follows.

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

# The internal build network for OS installation
auto eth1
iface eth1 inet static
address 192.168.8.1
netmask 255.255.255.0

After that edit, activate the config with (as root) ifdown eth1; ifup eth1. Verify it’s working by pinging both of your IP addresses (the 10.X.X.X one and the 192.168.X.X one).

Install and Configure PXE Boot Services

install dnsmasq via apt-get install dnsmasq. Then as root do a mkdir /var/lib/tftpboot. Edit /etc/dnsmasq.conf to look as follows.

interface=eth1
dhcp-range=192.168.8.100,192.168.8.254,12h
dhcp-boot=pxelinux.0
enable-tftp
tftp-root=/var/lib/tftpboot/

Restart it with service dnsmasq restart

The next big step is to make the Debian Installer available as a PXE boot image. We’re basically following this article here, except we’re doing Ubuntu Lucid 10.04 instead of Feisty. Do the following.

cd /var/lib/tftpboot
sudo wget http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/netboot.tar.gz
sudo tar -xzvf netboot.tar.gz
sudo rm netboot.tar.gz

OK, at this point we should be ready to boot our first test target machine into the Debian Installer network boot service OS. PXE boot a machine on the OS Build Network (again, I used a VirtualBox VM). You may have to fit a key like F12 to instruct the machine to boot from a network card. I have to hit F12 then “l” to boot from the LAN on my VirtualBox VM. If all is well, you should see a lovely Ubuntu installer menu like this (you won’t have the menu items containing “10.04″ yet, but we’ll add them later).

Ubuntu Installer PXE Menu

So now we can use this to install Ubuntu onto the target. Well, almost. We don’t have Internet access from our OS Build Network working yet, so let’s get that going.

Setting up Routing to the Internet

We want to allow our target machines to connect to the Internet, which means we will configure our Net Boot server as a very basic router and NAT firewall. This is often called Internet Connection Sharing when used for this simple purpose. We’re basically following Internet Connection Sharing Setup docs from the Ubuntu community wiki. As root, add net.ipv4.ip_forward=1 to /etc/sysctl.conf. Then run these commands.

sudo iptables -A FORWARD -i eth0 -o eth1 -s 192.168.0.0/16 -m conntrack --ctstate NEW -j ACCEPT
sudo iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A POSTROUTING -t nat -j MASQUERADE 
sudo iptables-save | sudo tee /etc/iptables.sav
sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

Now add iptables-restore < /etc/iptables.sav to /etc/rc.local as root just above the exit 0 line. This will get us going again when we reboot.

Now you should be able to PXE boot a target, select “Install” from the menu and do an Ubuntu installation. However, it’s going to be a normal install over the Internet, which is fine for doing one or two boxes, but since we want to crank these out en masse, we’ll want to cache the bits and grab them locally.

Caching the Ubuntu Packages with apt-cacher-ng

I tried both squid and apt-cacher for a solution to locally cache the bulk of the Ubuntu binary package payload, but neither worked very well. apt-cacher-ng seems to do exactly what we want and nothing more with no configuration. install apt-cacher-ng. This “just works” with no setup. After it’s installed, if you PXE boot a target, select “Install” and go through the interactive install, when prompted for a proxy, enter http://192.168.8.1:3142. This is the OS Build Network IP of the Net Boot server and the apt-cacher-ng proxy port. Now all your packages will be cached on the OS Boot Server. The first install will download them from the Internet. Subsequent installs will get them from the local cache and thus be much much faster and more efficient.

Hands-off Automatic OS Installation

OK, we now have network based OS installs working, but since we’re planning on doing many of these, we want to fully automate this. To do that, we’ll use a preseed file to configure the Debian Installer to not ask any questions. Configure a file at /var/lib/tftpboot/bococo.seed with the the content linked here: bococo.seed.

Now we’ll add a new item to the PXE menu to trigger a hands-off automated OS install. Edit /var/lib/tftpboot/ubuntu-installer/i386/boot-screens/text.cfg and add this entry.

label install-10.04-hands-off 
        menu label ^Install 10.04 Hands Off
        kernel ubuntu-installer/i386/linux
        append vga=normal initrd=ubuntu-installer/i386/initrd.gz locale=en_US.UTF-8 debian-installer/keymap=us auto hostname=bococo preseed/url=http://192.168.8.1/bococo.seed -- quiet

In order to get that file over HTTP, we’ll server it up with the nginx web server. install nginx via apt-get install nginx. Then we can edit /etc/nginx/sites-enabled/default and change location / block to set the new document root to the tftpboot directory so it looks like this.

        location / {
                root   /var/lib/tftpboot;
                index  index.html index.htm;
        }

Restart nginx via service nginx restart. Now we should be able to PXE boot the target and select the “Install 10.04 Hands Off” option and the entire thing should happen automatically.

Network Booting into the Live Image

One final bit of utility functionality we want to provide is a network bootable live Ubuntu image. This will be handy for testing hardware compatibility, performance, etc. First we’ll download the primary Ubuntu Desktop CD image and make it available over NFS. Do the following as root.

apt-get install nfs-kernel-server
cd /var/opt
wget 'http://ubuntu.cs.utah.edu/releases/lucid/ubuntu-10.04-desktop-i386.iso'
mkdir /var/lib/tftpboot/ubuntu-10.04-desktop-i386
echo /var/opt/ubuntu-10.04-desktop-i386.iso /var/lib/tftpboot/ubuntu-10.04-desktop-i386 auto ro,loop 0 0 >> /etc/fstab
mount /var/lib/tftpboot/ubuntu-10.04-desktop-i386
echo "/var/lib/tftpboot/ubuntu-10.04-desktop-i386 *(ro,sync,no_subtree_check)" >> /etc/exports
service nfs-kernel-server restart

Now add another entry to your PXE menu for the live Session. I also like to move the menu default option to the live session, so my final /var/lib/tftpboot/ubuntu-installer/i386/boot-screens/text.cfg looks like this:

default install-10.04-hands-off
label live
        menu label ^Live Session 10.04
        menu default
        kernel ubuntu-10.04-desktop-i386/casper/vmlinuz
        append initrd=ubuntu-10.04-desktop-i386/casper/initrd.lz boot=casper netboot=nfs nfsroot=192.168.8.1:/var/lib/tftpboot/ubuntu-10.04-desktop-i386 -- quiet
label install
        menu label ^Install
        kernel ubuntu-installer/i386/linux
        append vga=normal initrd=ubuntu-installer/i386/initrd.gz -- quiet 
label install-10.04-hands-off
        menu label ^Install 10.04 Hands Off
        kernel ubuntu-installer/i386/linux
        append vga=normal initrd=ubuntu-installer/i386/initrd.gz locale=en_US.UTF-8 debian-installer/keymap=us auto hostname=bococo preseed/url=http://192.168.8.1/bococo.seed -- quiet
label cli
        menu label ^Command-line install
        kernel ubuntu-installer/i386/linux
        append tasks=standard pkgsel/language-pack-patterns= pkgsel/install-language-support=false vga=normal initrd=ubuntu-installer/i386/initrd.gz -- quiet

And that’s it! We can now netbook into a live session, an interactive install, or a fully automated install.

References

Acknowledgements

Thanks to the authors of the above reference web pages and Nick Flores for collaborating on this.

Windows Server 2008 Setup Annoyances

So I do a lot of work with automated unattended intallations of Operating Systems, including Windows. Here’s some of my primary complaints about the new Windows setup program in Windows Server 2008.

  1. No good way to validate the unnatend.xml file. Now Microsoft does provide tools to help generate these files, and hopefully any file you generate with those (graphical, Windows-only) tools should be at least well-formed and semantically valid. However, there’s no way to do a deeper validation that a given XML file is compatible with a particular target machine.
  2. The unattend.xml encodes the processor architecture all over the place for no reason. I have to do a search and replace of processorArchitecture="x86" with processorArchitecture="amd64" in every component tag. There’s pretty much zero information in most unattend.xml files that’s CPU architecture specific anyway. This is a real nuisance. I should be able to use the exact same file on x86 and amd64 without issues.
  3. Setup doesn’t do the simple but important hardware compatibility validation that would make users’ lives easier. For example, neither winpe nor windows setup with complain if the target system has insufficient RAM. WinPE will just behave very oddly and things won’t work. There’s no checking for sufficient disk space ahead of time or that the disk layout is feasible. There’s no checking for suitable network or storage drivers. When you don’t have viable storage drivers, you just reboot out of WinPE into a lovely Blue Screen of Death 0x7b stop error. Hurray! Similarly, no one at MS seems to care if you don’t have a working NIC driver. The OS has the word “Server” in it. You need a network driver or your OS is in a useless void.
  4. The fact that windows setup reboots into an environment with zero networking and zero third party applications allowed is just a mind-boggling recipe for end user frustration. We have to resort to brute time out calculation to even know whether the Windows install worked or not. We can’t provide a good user experience to people looking to do UNATTENDED installs.
  5. Another one in the “we have no idea what unattended means” even though our configuration file is called “unattend.xml” department: STOP PRESENTING GUI DIALOGS. There are many issues that unattend.xml supposedly allows a showGui="never" but in my experience they either have no effect (GUI displays and halts the install anyway) or they just flat out break the install. Microsoft just doesn’t “get it” here. Any automated install isn’t a fully attended graphical install rejiggered to try not to pop up a GUI. It’s an entirely different use case. Get it through your thick skull: NO ONE IS LOOKING AT THE CONSOLE. THESE ARE SERVERS. THE INSTALL IS BEING DRIVER AUTOMATICALLY BY SOFTWARE. NEVER EVER SHOW A GUI.
  6. Same category of cluelessness. Windows setup doesn’t return a non-zero exit code on failure. Duh. I’m aware of some other mechanisms like setupcomplete.cmd that MS claims to provide for this, but from what I can tell after several attempts, they simply don’t work.
  7. The log files and error messages are just a mess. And I’m not even talking about obscure edge case failures. Simple things like an invalid product key or computer name can create weird mysterious failures and behavior.

Remove your SCM system from your job postings

Most job postings will list one or more source code management (SCM) tools in their laundry list of required skills and buzzwords. This seems to both be unnecessary and also to miss the point. As an employer, your concern shouldn’t be whether or not your candidate is intimately familiar with subversion, git, perforce, starteam, mercurial, CVS, sourcesafe, or whatever system your code happens to reside within at this point in time. Why not? Because it just doesn’t matter and it doesn’t help you distinguish good candidates from bad. This in my mind seems akin to asking potential postal employees “Do you have experience driving from the right hand side of a boxy little truck?”. It’s just not something that’s going to be a stumbling block. If you can drive, you’ll get the hang of the postal delivery truck soon enough. It’s the same with SCM. If you have worked successfully on a few sizeable projects in one or two of them, you’ll be fine. I’ve never heard this story: “Oh yeah, we hired this woman Sandra and she wrote this fantastic code for us, but she just could not figure out how to commit it to subversion, so we had to let her go.” It’s just not going to be the issue. Well, what is going to be the issue?

In this context, it’s more important to see if the candidate understands branch and release organization and management as generic principles. When and how should the code be branched? What steps do you take when it’s time to ship a release? When a bug needs to get fixed against an old version of the product, how does that work with SCM? These types of questions are OK sanity checks to spend 2 minutes on in an interview, but they don’t need to be on your job posting.

Now, here’s what I see missing that seems in my mind much more likely to actually cause you problems down the road. Does your candidate understand software packaging, distribution, installation, and upgrade? Do they understand the principles of package management systems like RPM or DEB, and the complexities around dependency management, upgrade, and rollback? Do they understand how to package and ship (or not) third party libraries? These questions may be more or less relevant based on the deployment model you use (web vs. bundled vs. embedded, etc). However, for lots of projects, they are key and lots of companies are clearly in the dark here.

So the summary: don’t bother listing a specific SCM as a requirement in your job posting. Instead, briefly interview your candidates on general SCM methods and principles. And in addition to that, given your deployment model(s) find out how much your candidate knows about packaging and installation.

On Idempotence, intention, and unix commands

Idempotence means that running a command or function several times produces the same result as running it only once. This is an very important design principle that is a blessing when used appropriately and a scourge when not used where warranted.

For analogy, imagine you ask a housemate (or butler if that’s how you roll) to empty the dishwasher. They dutifully go over there, open the dishwasher door, and find it’s already empty. How do they react? Do they come back to you shouting in confusion “You fool! How can I empty the dishwasher if there’s nothing in it! Oh woe is me. What am I to do?”? Or do they just think to themselves “score!” and go on a coffee break, leaving you to go about your business trusting that the dishwasher is now empty?

Another analogy is from the military’s notion of “management by intent” wherein a commander might order his troops to “have camp fully operational by noon” as opposed to dictating specific tactics that must be taken in order to achieve the intended outcome. This way, the troops can rely on their own abilities to achieve the intent and are empowered to respond to changing or unexpected circumstances independently.

Now, when it comes to computer programs, UNIX has a mixed bag of utilities that understand this and some that don’t.

mkdir /tmp/2;echo $?;mkdir /tmp/2;echo $?
0
mkdir: cannot create directory `/tmp/2': File exists
1

rm /tmp/foo;echo $?;rm /tmp/foo;echo $?
0
rm: cannot remove `/tmp/foo': No such file or directory
1

So the bad examples include mkdir, rmdir, rm, ln, and perhaps kill (debatable). Think about how much simpler using a command line and writing shell scripts would be if these were idempotent and instead of panicking in horror when the user does not know the current state of the filesystem, just allowed the user to describe the desired end state. I would love to have idempotent and recursive by default commands like mkdir -p or rm -rf in combination with a transactional filesystem with built in undo capabilities.

Good idempotent examples include touch, tar, zip, cp, chmod.

So the point about design and usability here is it’s good to ask oneself “What is the user’s intent here?”, and try to do everything in your power to work in concert with that intention. A strong and painful negative example from my career has to do with the fact that the Solaris patchadd program is not idempotent and it doesn’t return exit codes according to the user’s intent. So when I run patchadd 123456-01, really my intention is “I want this system to be OK with regard to patch 123456-01″. patchadd will return a non-zero exit code if the patch is already installed or the patch is not applicable to the server or if a newer revision is already installed. As a user of patchadd, I don’t care. It’s all success to me, and nor do I want to be bothered with implementation details within patchadd such as not installing a patch if a newer revision is already installed. I think many shell scripts would be a lot smaller and clearer and simpler without always having to wrap mkdir in an if [ ! -d /blah/dir ] clause to avoid spurious error output.

A few other links on this topic:

Bleeding Edge and Rotting Core

I just wanted to post some thoughts on the topic of selecting software components with regard to the maturity thereof. I think overall the programmer community is by default gung-ho about the bleeding edge. We like the shiny new toys with the bells and whistles. Once something’s been around enough to have its weaknesses well understsood, we find it very frustrating to have to continue to work with it. I’m not going to offer any specific recommendations, just some things to keep in mind. The general gist though is that it takes some hard-earned pragmatism and real production experience to understand the value of using older releases of components.

First, let’s define some terms. We’re familiar with what is known as the bleeding edge. The new hotness. The stuff straight off the presses instilled with the glimmering light of state of the art knowledge. There’s probably always been a lot of this, but there seems to have been a flurry in the past five years of so of interest in ruby, rails, erlang, clojure, scala, dozens of python app and web frameworks, etc. On the other hand, we have the old guard, which I’d like to call the rotting core. Generally we shy away from this, but there are times when it is absolutely the correct choice in certain situations.

So, let’s look at some pros and cons.

Bleeding edge pros:

  • The freshest and (usually) best designs and thinking are made available
  • Almost always more succinct and expressive
  • Often more coherent, clean, and consistent
  • Embodies improvements based on lessons learned from past failings and shortcomings
  • Development tools and processes are sometimes more productive

Bleeding edge cons:

  • Development tools are usually immature and inferior
    • IDE support is likely to lag behind
    • Debugger may lag behind as may remote graphical debugging
    • Performance profilers might not be there
  • Deployment issues may not have been well addressed yet
  • Updates will come more frequently causing churn
  • Software has not had as broad testing in production and is therefore likely to have more “surprises”. Sometimes these can be showstoppers.
  • Community size will be smaller
  • Depth of knowledge in the community will be shallower
  • Standard library may be undergoing more flux
Rotting core pros:
  • Stable, known quantity. It may have warts and bugs, but at least we’re aware of most of them by now
  • Development tools generally have solid support including remote graphical debugging, mature performance profilers, etc
  • Community size will be larger
  • Community depth of knowledge will be much deeper
  • Updates are rare and only for occasional major issues or security patches
  • standard library will be well known and stable
Rotting core cons:
  • Less exciting to developers. Yesterday’s designs and paradigms.
  • Often tedious compared to the bleeding edge
  • Support issues. Standard answer may always be “update to the latest version”

And now, let’s back this up with some examples and anecdotes. I think when it comes to rotting core technologies, you have both the “oldie but a goodie” category and the “oldie and a baddie” one. Currently my project has a component written against the now ancient Python 1.5.2 runtime, and we have hundreds of thousands of copies of that component installed at customer sites. It is running on something around seventy different OSes. Now, at the time when that component was originally written, this was close to the bleeding edge. We’ve still not entirely upgraded it because it’s an oldie and a goodie. We’ve patched it a bunch and run it under huge loads and huge scales. We know what it can do, and we know what it can’t do. We even had famous python educator Mark Lutz (Programming Python) come in to train us and give us quizzical looks when we explain that half of what he is saying doesn’t apply to us since it wasn’t available in python 1.5.2. Over the years, I’ve come to see the merits of this and even though its frustrating, the business reality is that every year that stuff continues to run without issue is bettering the return on the initial R&D investment. It ain’t broke, so we’re not in a hurry to fix it.

Of course, on the other side, you’ve got things like Java 1.2, which I also worked with. Python has come a long way since 1.5.2, but really it’s still basically the same deal, and the design was good from the start. Java has probably come even farther, but the design was a mess from the beginning and they’ve since seen the error of their ways and made some great improvements. I would put that one in the “oldie but a baddie” category and do what it takes to upgrade.

I remember chatting with a stranger on a plane after we each noticed that we were both programmers and were both actively programming on the plane. This was a few years ago and Ruby was still pretty much bleeding edge. He looked at me with desperation and asked me if I knew anything about debugging deadlocks, threading issues, and core dumps since his production ruby app was regularly hitting issues and his team was basically at a point where they didn’t have the knowledge or tools to solve them, and it was jeopardizing their whole project. Sadly I couldn’t offer any help, but I could certainly sympathize.

I also have a friend who used to work at a DNS registry run by someone very much of the “rotting core” philosophy. They ran Solaris 8 and ancient versions of lots of core C/unix utilities (bind et al), and to actually run versions that old took significant effort on their part, but it made sense for that project. They are running a piece of the Internet backbone. It’s not bleeding edge stuff. It just needs stability, stability, stability, and those are the tools they needed to meet their business goals.

So next time you join a new project and start to reflexively freak out when they explain their software stack, supress your urge for a minute and get some information about the choices they have made and the reasoning and circumstances that got them where they are. You might be surprised at the difficult but pragmmatic choices that were made and hopefully you can admire and appreciate the character of those who made them.

And finally, think about the value of being able to look across a broad set of available components and correctly determine where components are in a “sweet spot” of their lifecycle, ripe to be chosen and deployed at length. That is a deep wisdom that is a long time coming.

MoinMoin Columns Macro

I just updated the “Columns” macro for the MoinMoin wiki. This allows you to lay out a wiki page in two to ten columns. This makes it easier to get lots of info on one page in certain situations and I’ve used it to great benefit on my personal wiki where I organize my stuff.

Here’s the MacroMarket page where my update has been posted for discussion. The original author may not like it, so it might not become the canonical fork, but that’s how it goes with open source.

Here’s what it looks like with four columns:

MoinMoin wiki page with Columns macro

Optional Syntax Should Be Illegal

Why in the world do some programming languages include optional syntax? To a true type A engineer, this is incomprehensible and unacceptable. For example, in Adobe’s ActionScript, statements may optionally be terminated with a semicolon. Usually this is not required, except in a few situations you need it. Evil. The statement that our number one job as software engineers is to manage complexitity really resonates with me, and willy nilly allowing of optional syntax just destroys consistency, predictability, and simplicity for no reason whatsoever. Optional syntax seems to me a bad language design smell that indicates the language authors need to rethink a bit and find something that works always and should be required.

Part of the impetus for this post is my annoyance when my thinking cycles are wasted on unimportant details in a source code file. I’d rather have a strict format so that whenever I am reading or writing in a language, there will be a strong and deep consistency. I don’t have to spend time deciding whether I’m going to use some optional syntax or which of the several ways to express the same thing I’m going to use. Similarly, when I come upon someone else’s code and it’s a mixture of two optional approaches, I feel compelled to go and make it consistent, which is another time waster.

Environment Variables Considered Harmful

Many projects reference environment variables at either build time, install time, or run time to handle configuration that can’t be made to work across all of the target environments. It is better to use plain text simple configuration files for the reasons that follow. First, let’s quickly review common usage of environment variables.

  • Directory path to supporting tools and libraries (JAVA_HOME, LD_LIBRARY_PATH, CATALINA_HOME, etc)
  • Customization of build time locations (BUILD_DIR, OUTPUT_DIR, DIST_DIR, etc)
  • Customization of compiler options and other build time configurations (STATIC_LINK, etc)
  • Settings that apply OS-wide and to several programs (http_proxy, etc). In theory this would almost make sense. You set your http_proxy environment variable in one place, and any program that makes HTTP requests respects that setting. In practice, these settings are more realistically effective higher up in your desktop environment, and AFAIK in the whole GNU/Linux/UNIX ecosystem, there are only a small handful of cross-program environment variables that are actually used commonly.

So what’s the problem with environment variables?

  • The are ephemeral, nebulous, stored in memory within your shell and process tree
  • How and where they are set is inconsistent across shells (~/.bash_profile, ~/.zshrc, etc)
  • The syntax to specify them is needlessly different across different shells (csh vs. bash vs. cmd.exe, etc)
  • How to fully unset them varies per shell and is often unclear
  • There is widespread confusion on the distinction between shell variables and environment variables, how to set each, and how each interacts with subprocesses
  • They are often tied to a user account due to where they are specified above, and can vary between login shell verses non-login shell. They can therefore often vary when a program runs via init compared to run from an interactive root login shell. This can be difficult to detect and troubleshoot

All of these reasons combined mean that in general environment variables are losers in our goal of managing complexity and making simple, easy to use software that is cross platform. So what’s the solution? The solution, as it so often is, is simple plain text configuration files. At the end of the day, environment variables end up set in a shell script as KEY=VALUE type pairs, and that’s where they belong in a configuration file on the filesystem. How does this make things better?

  • One consistent place to set your application’s configuration
  • Same syntax regardless of shell or OS
  • Files on disk are concrete and reliable. You can email it to someone for help with troubleshooting and be confident about its content

So go forth and configure with simple plain text configuration files. And there will be much rejoicing.

Business hours

NOTICE: To all businesses with a single physical location and a web site. You will put your address, phone number, and business hours on your home page. There will not be a “Contact” page. There will not be an “About” page. END OF NOTICE

The wheel of not waiting

So flash videos are everywhere now. Generally as they load they show some spinning wheel type graphic. The problem is as an end user, I have no visual differentiation between a video that is loading slowly and a video sitting there waiting for TCP from an overloaded server that is simply never going to work, and certainly not in a timeframe smaller than my attention budget. Show me whether or not you are getting any data, and I might be willing to wait, but if you have me watching your spinner until your TCP connection times out, you are just frustrating me.