[clug] The Tale of The Very Dead Ubuntu box

George at Clug Clug at goproject.info
Thu Jun 6 13:24:32 UTC 2019



On Thursday, 06-06-2019 at 18:54 Stephen Hocking via linux wrote:
> Hi all,
> 
> Gather around for an entertaining story….
> 
> We had a box, which was not supported by or known to us, that we were asked
> to help with.
> 
> It was used by our client, both in-country & overseas.
> It was not backed up, except for an occasional DB dump
> The source code for the website it ran wasn’t backed up either, as far as
> anyone knew.
> It wasn’t documented anywhere.
> 
> It had crashed at some stage, and was so badly mangled that it was sitting
> at the grub prompt, because the grub config that specified what kernel to
> load had itself been trashed. After some research, it was determined that
> we should use a rescue CD to see what kernels were actually on the box.
> There were hundreds. Some were missing their initrd files. We picked an
> intact one and typed the following:
> 
> set root=(hd0,1)
> linux /boot/vmlinuz-3.13.0-98-generic
> initrd /boot/initrd-3.13.0-98-generic
> boot
> 
> The box booted, then quickly panicked, because it didn’t know what its root
> partition was. Altering the “linux” line to:
> 
> linux /boot/vmlinuz-3.13.0-98-generic root=/dev/sda1
> 
> fixed that error, but another crash occurred because /sbin/init could not
> load a shared library (why it wasn’t linked statically is a mystery to me).
> This posed something of a problem. By looking at another Ubuntu box (my
> laptop) we could determine what package provided that shared library (by
> using the apt-file utility), so that we could reinstall it. In order to do
> this, we needed to boot off the rescue CD and install packages from it. The
> usual method of mounting the box’s root filesystem on /mnt, chrooting
> to/mnt, and then using dpkg to install packages didn’t work, because
> various shared libraries that the packaging utilities used were missing.
> Getting out of the chroot environment and running dpkg or apt-get pointed
> at a non / install environment was a bit beyond us at this point (we were
> getting a little tired and were on a steep learning curve).
> 
> Now Debian/Ubuntu packages are created using the “ar” utility, which is
> normally used to create library archives for programs to be linked against.
> The .deb file is an archive of three components - debian-binary,
> control.tar.gz and data.tar.xz.  The  file data.tar.xz is where all the
>  files that make up the package are.  We extract these files using the “ar
> x somepkg.deb” command. Sometimes, for the older packages, the data.tar
> file has a .gz extension. One can do a pseudo install by changing to the
> root directory of the installation (which, if we’re in the rescue CD mode
> and have mounted the root filesystem under /mnt, is /mnt) and unpacking the
> data.tar file. This, of course, will not update the package utility
> databases.
> 
> Iterating through the process of booting the box from the grub command and
> seeing what shared libraries were missing for /sbin/init and installing the
> relevant packages allowed us to get past /sbin/init causing a crash. This
> allowed the boot process to continue to a point where it would
> spontaneously reboot. This, obviously, was not desirable. We got past this
> by changing the “linux” command line to the old standby of
> 
> linux /boot/vmlinuz-3.13.0-98-generic root=/dev/sda1 init=/bin/bash
> 
> This revealed that there were a few other shared library packages that
> needed reinstalling to run bash, so we do the dance with booting off the
> rescue CD and extracting files to place the shared libraries until such
> time as we end up with a bash shell running, we remount the root filesystem
> as read/write, try running a couple of utilities, and discover that the
> libc version that we installed off the rescue CD isn’t the right one. It
> turns out we’re on an Ubuntu 14.04 system, whereas the rescue CD is Ubuntu
> 12.04. Woops. A 14.04 CD is procured, and the packages we’d
> ghetto-installed we re-install, including libc. Fortunately there was only
> 4 packages installed via that method. This fixed the libc problem.
> Obviously the ghetto install of packages really isn’t viable as a solution,
> so after a couple of false starts, we manage to get the network up and
> running and attempt to reinstall packages from a repository on the network.
> We have ghetto-install from the mounted CD a few times to get things to the
> point where apt-get doesn’t complain about missing shared libraries, and
> then we re-install all the grub packages. This rebuilds the grub config
> where we can reboot successfully without having to do the manual grub
> configuration above. A number of pam library modules have to be
> re-installed before we can login at the console, and a bunch of other
> libraries have to be re-installed before the sshd daemon will start and
> talk to LDAP.
> 
> There’s a bunch of work to do with the app – it appears that a lot of the
> shared libraries on the system have been thoroughly mangled, but we can
> reboot the box and do the basic OS things.
> 
> Enter the debsums utility. This fine thing examines the package database
> and reports back if any of the files (excluding the config files which be
> expected to be edited locally)  are corrupt. If the file isn’t there, it
> reports it too stderr. Redirecting stderr through a suitable pipeline of
> grep, sed & sort -u gives us a list of packages that have missing files. A
> series of “apt-get install —reinstall  somepkgs” later, we can now start
> mysql & apache2.
> 
> We didn’t know the root password of the mysql database, so cracked that via
> the usual stackoverflow advice, which then allowed one of my colleagues to
>  install a dump of the DB that had been taken a few days earlier.  The box
> is then snapshotted (it’s running under VMWare) After making sure all the
> package installs are complete, we  do an “apt-get update && apt-get
> dist-upgrade”, reboot again, verify that the app is working, and hand it
> back to a deeply grateful client.
> 

"reboot again, verify that the app is working, and hand it back to a deeply grateful client.", you mean this is a story with a happy ending?  I am so very impressed, good work to the team.

Just to ask, if the computer was a virtual machine, did anyone do a clone as the first step, and then proceed to work on repairing the clone?

Not that it seems taking a clone was required, so congratulations.

As Bryan commented, "did the client learn from this experience", and has now implemented backups and system documentation?  ( that was a rhetorical question. I do hope they are, but ... I ask myself, do I ? )

George.

> 
> -- 
> 
>   "I and the public know
>   what all schoolchildren learn
>   Those to whom evil is done
>   Do evil in return"		W.H. Auden, "September 1, 1939"
> -- 
> linux mailing list
> linux at lists.samba.org
> https://lists.samba.org/mailman/listinfo/linux
> 



More information about the linux mailing list