[clug] How to make my server robust for booting
Chris Smart
clug at csmart.io
Wed Sep 11 01:01:25 UTC 2019
On Wed, 11 Sep 2019, at 08:49, Tony Lewis via linux wrote:
> All,
>
> I'm rebuilding my home server, and want to make it 'robust' for boot
> purposes. For example, if a disk fails, the system can continue to
> function until I replace it.
>
> I am testing my ideas in VirtualBox at the moment.
>
> The stuff I think I've sorted are:
>
> * encrypted RAID1 for /
> * RAID1 for /boot
>
> What I'm stuck on is how to handle a failure of the drive where GRUB is
> installed. I thought it might be as simple as doing grub-install
> /dev/sd[bcd] (as well as on /dev/sda) and BIOS would just find *a* copy
> of GRUB and be able to continue the boot process.
>
Yep, pretty sure I've done that successfully before. I'm not sure if I used grub install or just dd'd the first 446 from one to the other, it's been a while.
Can you use hexdump to check if Grub is at least embedded in the second drive?
What does your Grub device.map look like? Does it only have hd0?
> It's not working in VirtualBox at least. If I let it boot unaided, it
> cannot find a bootable medium (expected behaviour). If I interrupt that
> with F12 and choose the second hard drive to boot from, it locks up. It
> might be a VirtualBox thing, and so the physical server would be OK. Or
> more likely I don't understand what I'm doing.
>
What if you disconnect the first drive leaving only the second (which obviously becomes the first), does that work? Could be a virtualbox bug, can you try it on a KVM host somewhere?
> What's the best way to architect things so that a failed hard drive
> where GRUB is installed, is easily handled?
>
For what it's worth, I just tested it in KVM and CentOS 7 just does the right thing out of the box without anything fancy; literally just set RAID1 for /boot and encrypted RAID1 for / in the installer and it Just Works(tm).
There's a timeout when looking for the missing device (90 seconds or something), if it's takes too long you could try kernel ark like 'rd.retry=30'.
Another solution is probably a hardware raid card and let it take are of it all for you.... ;-)
-c
More information about the linux
mailing list