Sunday, November 13, 2011

Backup Server

Backup is a server meant to work in conjunction with Zeus to perform offsite nightly backups of each other. Backup sits at one site and Zeus sits at another. Backup is a backup aggregration point for its site. Collecting all the backup files in one place so they can be pushed to Zeus nightly.

Hardware

Backup consists of the following hardware:
  • Supermicro MBD-X7SPA-H-O motherboard (Atom Based) 
  • CHENBRO ES34069-BK-120 Black SECC Pedestal Server Chassis 120W 
  • 16GB Mushkin USB drive used as the system drive 
  • 2x 2GB GSK F2-5300CL5D-4GBSA R RAM 
  • 4x 1TB WD SATA2 WD10EVDS green hard drives 

Installation

Ubuntu Server 10.04 64bit is installed on the Mushkin usb drive. The installation was pretty standard, I didn't change anything during installation and only selected to install the OpenSSH server role.

Post installation I updated & dist-upgraded, rebooted then installed a few packages: htop vim-nox mdadm xfsprogs xfsdump openssh-server

MDADM installs postfix by default and I chose to accept the default settings.

MDADM

mdadm is the linux raid interface that allows you to manage, create, and destroy your raid array. The man page is very detailed and helpful.

Partition

For each of the 4x1TB hard drives I had to create a new partition and set its type to the Linux RAID type.
For each of the disks do this:

sudo fdisk /dev/sda

chose to create a new primary partition (command n) use the default values to create a partition that takes the whole disk
change the partitions system ID (command t ) to Linux raid auto (code fd)
write the table to the disk and exit (command w)

Repeat those steps for each drive

Create the /dev/md0 array

I created the Raidarray using this command

sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

The array started syncing. An interesting note that gave me pause at first is that the array builds in a degraded state (with 3 of the 4 drives active) and the 4th device marked as a spare. After the initial sync it will add the spare back into the array.

cat /proc/mdstat looks like this:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid5 sdd1[3] sda1[0] sdb1[1] sdc1[2]

2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]



unused devices:

Create an mdadm.conf

This is the file that tells mdadm how to assemble your array when it first starts up. Without this you will need to do an mdadm --assemble on each reboot.

and my /etc/mdadm/mdadm.conf looks like this:

DEVICE partitions

HOMEHOST

MAILADDR me@myrealaddress.com

ARRAY /dev/md0 level=raid5 num-devices=4 metadata=00.90 UUID=3577b84b:f6e2fa73:e508a82b:f6f9bec0


this was created using these commands as root (sudo will not work for them):

echo "DEVICE partitions" > /etc/mdadm/mdadm.conf

echo "HOMEHOST " >> /etc/mdadm/mdadm.conf

echo "MAILADDR me@myrealaddress.com" >> /etc/mdadm/mdadm.conf

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Just for completeness, here is the output of mdadm --examine /dev/sda1

mdadm: metadata format 00.90 unknown, ignored.

/dev/sda1:

Magic : a92b4efc

Version : 00.90.00

UUID : 3577b84b:f6e2fa73:e508a82b:f6f9bec0 (local to host ASBackup)

Creation Time : Wed Sep 8 02:51:51 2010

Raid Level : raid5

Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)

Array Size : 2930279808 (2794.53 GiB 3000.61 GB)

Raid Devices : 4

Total Devices : 4

Preferred Minor : 0



Update Time : Wed Sep 8 11:47:49 2010

State : clean

Active Devices : 4

Working Devices : 4

Failed Devices : 0

Spare Devices : 0

Checksum : 84d0cdaa - correct

Events : 38



Layout : left-symmetric

Chunk Size : 64K



Number Major Minor RaidDevice State

this 0 8 1 0 active sync /dev/sda1

0 0 8 1 0 active sync /dev/sda1

1 1 8 17 1 active sync /dev/sdb1

2 2 8 33 2 active sync /dev/sdc1

3 3 8 49 3 active sync /dev/sdd1

Format the array as XFS

Once the array was created I used

mkfs.xfs /dev/md0

to format the drive as xfs

Mount the array

I create a mount point for the array at /mnt/raid

sudo mkdir /mnt/raid

I then found the UUID of my raid array using blkid which for me gave this output:

/dev/sda1: UUID="3577b84b-f6e2-fa73-e508-a82bf6f9bec0" TYPE="linux_raid_member"

/dev/sdc1: UUID="3577b84b-f6e2-fa73-e508-a82bf6f9bec0" TYPE="linux_raid_member"

/dev/sdd1: UUID="3577b84b-f6e2-fa73-e508-a82bf6f9bec0" TYPE="linux_raid_member"

/dev/sdb1: UUID="3577b84b-f6e2-fa73-e508-a82bf6f9bec0" TYPE="linux_raid_member"

/dev/sde1: UUID="e77eb088-01de-41ab-bc0e-79d155f75642" TYPE="ext2"

/dev/sde5: UUID="Nf7hnj-ejy2-2jcm-h38l-urRP-rwng-oGecSz" TYPE="LVM2_member"

/dev/mapper/ASBackup-root: UUID="f862f5de-832c-4fb5-ab88-ae5eb848456a" TYPE="ext4"

/dev/mapper/ASBackup-swap_1: UUID="69b9bdaa-7b39-4175-a083-91287d1d0fbc" TYPE="swap"

/dev/md0: UUID="e366c151-d9ee-467e-967a-ed228d79c51d" TYPE="xfs"

The last one was important to me, since I wanted to auto mount the raid array on boot I would need to add an entry into my fstab

Add the array to the fstab


/etc/fstab will make sure that your drive is mounted each time the device is mounted. So this went into my fstab:

#/dev/md0 raid device

UUID=e366c151-d9ee-467e-967a-ed228d79c51d /mnt/raid xfs rw,user,auto 0 0

I then typed:

mount /mnt/raid

and the raid array was ready for business.

Testing drive failure

Since this server will be running in a remote location it is even more important that I be able to detect issues that may be arising on the server so I can either login remotely to troubleshoot or order new parts ASAP to keep the server running.

I am especially concerned about the raid array having issues and I want to be notified via email if there are problems with it. In order to simulate a drive failure I issue the following commands:

sudo mdadm --manage /dev/md0 --fail /dev/sda1

mdadm: metadata format 00.90 unknown, ignored.

mdadm: set /dev/sda1 faulty in /dev/md0

at this point I received an email in my inbox nearly instantly with the following information (it actually went to my spam folder, so look for it there):

This is an automatically generated mail message from mdadm

running on Backup



A Fail event had been detected on md device /dev/md0.



It could be related to component device /dev/sda1.



Faithfully yours, etc.



P.S. The /proc/mdstat file currently contains the following:



Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid5 sdd1[3] sda1[4](F) sdb1[1] sdc1[2]

2930279808 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]



unused devices:

Cool!

To put the drive back in the array use the following commands:

sudo mdadm --manage /dev/md0 --remove /dev/sda1

mdadm: metadata format 00.90 unknown, ignored.

mdadm: hot removed /dev/sda1



sudo mdadm --manage /dev/md0 --add /dev/sda1

mdadm: metadata format 00.90 unknown, ignored.

mdadm: re-added /dev/sda1

And the array starts to rebuild:

cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid5 sda1[4] sdd1[3] sdb1[1] sdc1[2]

2930279808 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

[>....................] recovery = 3.2% (31718400/976759936) finish=299.9min speed=52512K/sec



unused devices:

WIN!