Monday, January 16, 2012

Fault Tolerance In Computer Networks

Summary:

Fault tolerance and disaster recovery are two important things to be implemented at computer networks. Fault tolerance provides a means by which a computer or network has redundancy or the ability to recover from faults and to continue providing services during fault. Fault tolerance may be achieved by providing redundant routing, pooling servers to function in a pooled environment, or by using disk level backup schemes.

Body:

The article provides an overview of different fault tolerant schemes available for systems administrators for backing up data, and recovering from systems failures.

The following methods provide fault tolerance for hard-disk systems:

Mirroring
Duplexing
Data Striping
Redundant Array of Independent Disks (RAID)

Disk Mirroring: Mirroring a drive means designating a hard-disk drive in the computer as a duplicate to another specified drive. The two drives are attached to a single disk controller. This fault tolerance feature is provided by most of the network operating systems. When any data is written into the drive, the same data is also written to the drive designed as the mirror. If the drive fails, the mirror drive is already online, and because it has supplicate information, the users wont realize that a disk drive in the server has failed. The NOS notifies the admin that the failure has occurred. On the other side if the disk controller fails neither drive is available.

Disk Duplexing: Duplexing also saves data to a mirror drive; the only major difference between duplexing and mirroring is that duplexing uses two separate controllers. Hence duplexing not only provides redundant disk but also redundant controller. Duplexing provides fault tolerance even if a controller fails.

Disk Striping: Disk striping breaks up the data that are to be saved to the disk into small portions and sequentially writes the portions to all disks simultaneously in small areas called strips. These strips maximize performance because all of the read/write heads are working constantly.

RAID: (Redundant Array of Inexpensive Disks)

RAID uses an array of less-expensive hard disks and provides several methods for writing tot hose disks to ensure redundancy. RAID has several levels; some of the frequently used RAID configurations are discussed below:

RAID 0: The RAID 0 is the commonly used disk. This method is the fastest because all read/write heads are constantly being used without the burden of parity or duplicate data being written. This RAID level improves the performance; it does not provide fault tolerance.

RAID 1:This is also commonly used disk. This level uses hard disks, one mirrored to the other. RAID 1 is the most basic level of disk fault tolerance. If the first hard disk fails, the second hard disk automatically takes over. The parity or error-checking information is not stored. Rather the drives have duplicate information. If both the drives fail a new drive must be installed and configured. This level provides fault tolerance.

RAID 2:In this level individual bits are stripped across multiple disks. Multiple redundancy drives in this configuration are dedicated to storing error-correcting code.

RAID 3:At this level data is striped across multiple hard drives using a parity drive. The data are striped in bytes and not in bits as of RAID 2. This configuration is popular because more data is written and read in one operation that increases overall disk performance.

RAID 4: This level is similar to RAID 2 and 3 expect the data is striped in blocks, which facilitates fast reads from one drive. This is not popular implementation.

RAID 5: This level is commonly used; at this level the data and parity are striped across three or more drives. This allows fast reads and writes. This works well if one disk fails.

RAID 10: RAID 10 provides high availability by combining features of RAID 0 and RAID 1. AID 0 increases performance by striping volume data across multiple disk drives. RAID 1 provides disk mirroring which duplicates data between two disk drives. By combining the features of RAID 0 and RAID 1, RAID 10 provides a second optimization for fault tolerance.

No comments:

Post a Comment