[clug] The Tale of The Very Dead Ubuntu box

Stephen Hocking stephen.hocking at gmail.com
Thu Jun 6 08:54:32 UTC 2019


Hi all,

Gather around for an entertaining story….

We had a box, which was not supported by or known to us, that we were asked
to help with.

It was used by our client, both in-country & overseas.
It was not backed up, except for an occasional DB dump
The source code for the website it ran wasn’t backed up either, as far as
anyone knew.
It wasn’t documented anywhere.

It had crashed at some stage, and was so badly mangled that it was sitting
at the grub prompt, because the grub config that specified what kernel to
load had itself been trashed. After some research, it was determined that
we should use a rescue CD to see what kernels were actually on the box.
There were hundreds. Some were missing their initrd files. We picked an
intact one and typed the following:

set root=(hd0,1)
linux /boot/vmlinuz-3.13.0-98-generic
initrd /boot/initrd-3.13.0-98-generic
boot

The box booted, then quickly panicked, because it didn’t know what its root
partition was. Altering the “linux” line to:

linux /boot/vmlinuz-3.13.0-98-generic root=/dev/sda1

fixed that error, but another crash occurred because /sbin/init could not
load a shared library (why it wasn’t linked statically is a mystery to me).
This posed something of a problem. By looking at another Ubuntu box (my
laptop) we could determine what package provided that shared library (by
using the apt-file utility), so that we could reinstall it. In order to do
this, we needed to boot off the rescue CD and install packages from it. The
usual method of mounting the box’s root filesystem on /mnt, chrooting
to/mnt, and then using dpkg to install packages didn’t work, because
various shared libraries that the packaging utilities used were missing.
Getting out of the chroot environment and running dpkg or apt-get pointed
at a non / install environment was a bit beyond us at this point (we were
getting a little tired and were on a steep learning curve).

Now Debian/Ubuntu packages are created using the “ar” utility, which is
normally used to create library archives for programs to be linked against.
The .deb file is an archive of three components - debian-binary,
control.tar.gz and data.tar.xz.  The  file data.tar.xz is where all the
 files that make up the package are.  We extract these files using the “ar
x somepkg.deb” command. Sometimes, for the older packages, the data.tar
file has a .gz extension. One can do a pseudo install by changing to the
root directory of the installation (which, if we’re in the rescue CD mode
and have mounted the root filesystem under /mnt, is /mnt) and unpacking the
data.tar file. This, of course, will not update the package utility
databases.

Iterating through the process of booting the box from the grub command and
seeing what shared libraries were missing for /sbin/init and installing the
relevant packages allowed us to get past /sbin/init causing a crash. This
allowed the boot process to continue to a point where it would
spontaneously reboot. This, obviously, was not desirable. We got past this
by changing the “linux” command line to the old standby of

linux /boot/vmlinuz-3.13.0-98-generic root=/dev/sda1 init=/bin/bash

This revealed that there were a few other shared library packages that
needed reinstalling to run bash, so we do the dance with booting off the
rescue CD and extracting files to place the shared libraries until such
time as we end up with a bash shell running, we remount the root filesystem
as read/write, try running a couple of utilities, and discover that the
libc version that we installed off the rescue CD isn’t the right one. It
turns out we’re on an Ubuntu 14.04 system, whereas the rescue CD is Ubuntu
12.04. Woops. A 14.04 CD is procured, and the packages we’d
ghetto-installed we re-install, including libc. Fortunately there was only
4 packages installed via that method. This fixed the libc problem.
Obviously the ghetto install of packages really isn’t viable as a solution,
so after a couple of false starts, we manage to get the network up and
running and attempt to reinstall packages from a repository on the network.
We have ghetto-install from the mounted CD a few times to get things to the
point where apt-get doesn’t complain about missing shared libraries, and
then we re-install all the grub packages. This rebuilds the grub config
where we can reboot successfully without having to do the manual grub
configuration above. A number of pam library modules have to be
re-installed before we can login at the console, and a bunch of other
libraries have to be re-installed before the sshd daemon will start and
talk to LDAP.

There’s a bunch of work to do with the app – it appears that a lot of the
shared libraries on the system have been thoroughly mangled, but we can
reboot the box and do the basic OS things.

Enter the debsums utility. This fine thing examines the package database
and reports back if any of the files (excluding the config files which be
expected to be edited locally)  are corrupt. If the file isn’t there, it
reports it too stderr. Redirecting stderr through a suitable pipeline of
grep, sed & sort -u gives us a list of packages that have missing files. A
series of “apt-get install —reinstall  somepkgs” later, we can now start
mysql & apache2.

We didn’t know the root password of the mysql database, so cracked that via
the usual stackoverflow advice, which then allowed one of my colleagues to
 install a dump of the DB that had been taken a few days earlier.  The box
is then snapshotted (it’s running under VMWare) After making sure all the
package installs are complete, we  do an “apt-get update && apt-get
dist-upgrade”, reboot again, verify that the app is working, and hand it
back to a deeply grateful client.


-- 

  "I and the public know
  what all schoolchildren learn
  Those to whom evil is done
  Do evil in return"		W.H. Auden, "September 1, 1939"


More information about the linux mailing list