Understanding Fault Tolerance

An Introduction to Fault Tolerance

The following issues initiate downtime:

Because systems continually change, planning for disaster situations is a continuous process that should occur within an organization. A disaster recovery plan is normally formulated to outline the procedures that should be carried out when disasters occur.

A sound disaster recovery plan is one that incorporates the following:

A few strategies which you can employ to prepare for disaster are listed below:

A few strategies which you can employ to ensure fault tolerance in your system are summarized below:

Understanding the Mean Time to Failure and Mean Time to Recover metrics

The metrics utilized to measure fault tolerance are:

The calculation typically used to measure downtime is:

There are three phases to devices' life cycle with every phase being categorized by a particular behaviour:

A few factors to consider when working with the MTTF and MTTR metrics are listed below:

Safeguarding the Power Supply

The power supply is considered the biggest failure point for a network, simply because computers cannot run without power. A network should be protected from the following power supply issues:

Providing Fault Tolerance through RAID Arrays

The hardware failure that normally occurs most frequently is a hard disk failure. To protect data from drive failures and to add fault tolerance to your file systems, use RAID technology. Windows Server 2003 provides good fault-tolerant RAID systems. Windows Server 2003 also supports hardware based RAID solutions.

You can implement fault tolerance as hardware based RAID, or software based RAID. Windows Server 2003 provides a software implementation of RAID to maintain data access when a single disk failure occurs. Data redundancy occurs when a computer writes data to more than one disk. This in turn safeguards data from a single hard disk failure. The distinction between software RAID and hardware RAID is that software RAID is put into operation solely through software and needs no special hardware for it to be implemented. Hardware RAID uses special disk controllers and drives. Hardware RAID is more fault tolerant than software RAID. It is also simpler to recover from failure when RAID is implemented in hardware than the software RAID provided by Windows Server 2003. While software RAID is simple to set up and configure, it has shortfalls. A hardware RAID system can rebuild itself more rapidly than what software RAID can. When a drive has a failure, the server does not need to be brought down to replace the particular drive. You can hot swap the failed drive. With software RAID, if one of the drives in a stripe has a failure, the server has to be brought down before you can replace the failed drive.

Windows Server 2003 supports three levels of RAID, namely, RAID 0, RAID 1 and RAID 5.

The RAID levels available to enable fault tolerance are listed below:

The factors that should be included when you determine which RAID solution suites the fault tolerance requirements of your organizations are:

Understanding Clustering Technologies

Microsoft offers the following two clustering technologies that are supported in Windows 2000 and Windows Server 2003.

Microsoft Clustering Server (MSCS), initially launched in the Windows NT Server Enterprise Edition, enabled organizations to increase server availability for mission critical resources by grouping multiple physical servers into a cluster. Servers in the cluster are referred to as nodes, while services and applications are referred to as resources. A cluster can be defined as the grouping of two or multiple physical servers that are portrayed as, and operate as one network server. These servers provide redundancy to the enterprise network by resuming operations of a failed server within the cluster. This procedure is known as failover. The process of failback occurs when a failed server automatically recommences performing its former operations once it is online again. The cluster can also be configured to provide load balancing features. With the introduction of Windows 2000 this technology became known as Microsoft Cluster Service. Microsoft Cluster Service is best suited for network services that require a high degree of availability. Windows Server 2003 can support eight node server clusters.

Resource DLLs manage resources in the cluster, and provide the mechanism for Cluster Service to maintain communications with its supported applications. A quorum resource has to exist in order for a node in the cluster to carry out its functions. This common resource holds the cluster database's synchronized version that stores management data for the cluster. The quorum resource is situated on the physical disk of the shared drive of the cluster. Clustering software such as resources makes is possible for the cluster to operate. Administrative software is the software utilized to manage the cluster, such as Cluster Administrator.

A few advantages associated with deploying cluster servers are:

The installation requirements of Cluster Service are listed below:

When determining the applications for the cluster and failover, consider the following:

Cluster implementations offer a choice between five configuration models. The configuration model chosen affects cluster performance, and the degree of availability ensured during a failure. The different configuration models are:

Windows 2000 Network Load Balancing (NLB) is a clustering technology that provides high availability and scalability. NLB is typically utilized to assign Web requests between a cluster of Internet server applications. NLB reroutes any requests that are sent to a failed NLB cluster server. With NLB, client requests are load balanced according to the configured load balancing parameters. Servers in the NLB cluster can therefore be configured to share the processing load of client requests. The Wlbs.sys driver of NLB is configured for each server in the cluster, and functions between the network adapter and the TCP/IP protocol. The driver manages and allocates client requests to a server in the cluster. With NLB there is no single instance of failure purely because it is regarded as a distributed application. Throughput is maximized because the broadcast subnet is utilized to distribute client requests to the cluster servers. These client requests are then filtered on each cluster server.

To ensure high performance, NLB uses a distributed filtering algorithm to match incoming client requests to the NLB servers in the cluster when making load balancing decisions. When an incoming packet is received, all the NLB servers check to determine which NLB server should handle the client request. The NLB servers use a statistical mapping that determines a host priority for the incoming packet, to identify the NLB server that should handle the request. Once the NLB server is identified for the packet, the remainder of the servers in the NLB cluster discards the packet. Each server in the NLB cluster utilizes and transmits heartbeat messages to identify the state of the cluster. The heartbeat message holds information on the state of the cluster, and the cluster configurations and associated port rules.

A few NLB planning considerations and requirements are listed below:

Understanding the Role of Distributed File System (Dfs) in Fault Tolerance

Distributed file system (Dfs) is a single hierarchical file system that assists in organizing shared folders on multiple computers in the network. Dfs provides a single logical file system structure, and can also provide a fault-tolerant storage system. Dfs provides load balancing and fault tolerance features that in turn provide high availability of the file system and improved performance. Administrators can also install Dfs as a cluster service to provide improved reliability. With domain based Dfs roots, Active Directory is used for the Dfs topology replication, thereby ensuring fault tolerance and the synchronization of the Dfs root and shared folders. Configuring replication for the Dfs root and the individual shared folders provide improved performance to clients. With added load balancing, clients can randomly select a physical server to connect to using the list of referrals provided by the Dfs server.

Dfs roots can be either stand-alone roots or domain based roots.

The following servers can host a Dfs root, or be a Dfs server:

The process for deploying domain based Dfs is briefly outlined below:

How to create a striped volume (RAID 0)

  1. Open the Disk Management console
  2. Right-click the unallocated space on the disk where you want to create the volume, and select New Volume to launch the New Volume Wizard. Click Next.
  3. Select Striped on the Select Volume Type window. Click Next.
  4. On the Select Disks window, select the disk(s) to include in the striped volume, and the amount of space to be used. Click Next
  5. On the Assign Drive Letter or Path window, assign a drive letter or mount the volume to an empty NTFS folder. Click Next
  6. On the Format Volume window, select a format (NTFS) for the volume, or select the Do not format this volume option. Click Next
  7. The Completing the New Volume Wizard window displays the options you have selected.
  8. Click Finish to create the striped volume.

How to create a mirrored volume (RAID 1)

  1. Open the Disk Management console
  2. Right-click the volume you want to mirror, and select Add mirror to open the Add Mirror window.
  3. Select the disk you want to use for a mirror.
  4. Click Add Mirror to create the mirror.

How to recover from a mirrored volume failure (RAID1)

  1. Open the Disk Management console
  2. Right-click the failed mirrored volume and select Remove Mirror from the shortcut menu.
  3. When the Remove Mirror dialog box is displayed, choose the disk that should be removed, and click Remove Mirror
  4. Click Yes to verify your action to remove the mirror. The remaining volume turns into a simple volume.
  5. You can now remove the failed drive from the computer, and replace it.
  6. Following this, you should use the Disk Management console to create the mirrored volume again

How to create a RAID 5 volume

  1. Open the Disk Management console.
  2. Right-click the unallocated space on the disk where you want to create the RAID 5 volume, and select New Volume to launch the New Volume Wizard. Click Next.
  3. Select RAID 5 on the Select Volume Type window. Click Next.
  4. On the Select Disks window, select the disk(s) to include in the volume, and the amount of space to be used. Click Next
  5. On the Assign Drive Letter or Path window, assign a drive letter or mount the volume to an empty NTFS folder. Click Next
  6. On the Format Volume window, select a format (NTFS) for the RAID 5 volume, or select the Do not format this volume option. Click Next
  7. The Completing the New Volume Wizard window displays the options you have selected.
  8. Click Finish to create the RAID 5 volume

How to recover from a RAID 5 volume failure

  1. Back up your data prior to performing any necessary actions to repair a RAID 5 volume set.
  2. Your first step is to restore all drives in the RAID5 volume set to online. The status of the volume set has to be displayed as Failed Redundancy.
  3. Where the status of the failed volume is Missing or Offline, verify that the drive has power and that there are no connectivity issues.
  4. Use the Disk Management console to reactivate the disk. Right-click the volume and then choose Reactivate Disk from the menu. The status of the drive should first move to Regenerating and following this, to Healthy.
  5. Right-click the volume and choose the Regenerate Parity option if the status fails to change to Healthy.
  6. Where the status of the failed volume is Online (Errors), right-click the volume that failed and choose Reactivate Disk from the menu. The status of the drive should first move to Regenerating and following this, to Healthy. Choose the Regenerate Parity option if the status fails to change to Healthy.

How to configure a DHCP cluster

  1. Install the cluster hardware
  2. Proceed to configure Cluster Service
  3. Install the DHCP service on the node in the cluster
  4. Specify the global options when implementing many scopes.
  5. Configure a new scope. Assign the IP Address range and options (WINS/DNS server). Next, define and set any further scope options.
  6. Any necessary reservations can now be specified for clients needing a reserved IP Address. Remember to exclude IP Addresses that are not part of the lease.
  7. Proceed to activate the new scope.
  8. Any option classes and additional option types can be allocated next.
  9. The lease duration can also be modified as required.
  10. If multiple subnets are going to be supported, superscopes should be configured next.
  11. Next, authorize the DHCP server in Active Directory. Configure the DNS dynamic update policy.
  12. When supporting routed networks, DHCP or BOOTP relay agents might need to be configured. The BOOTP table needs to be configured.
  13. Any multicast scopes (if necessary) can be configured next.
  14. Configure a resource group for the DHCP resources. Utilize the New Group Wizard provided by Cluster Service.
  15. Launch the New Resource Wizard provided by Cluster Service to define the necessary IP Address, Network Name and Dependant Disk resources.
  16. Next set the database, and backup and audit paths' locations for the DHCP resource on the shared device
  17. Verify failover for the DHCP cluster, and check whether the DHCP server can be accessed by clients.
  18. You can use System Monitor to monitor DHCP Server performance. The DHCP Server audit log can be utilized for troubleshooting purposes.

How to install Internet Information Services (IIS) on a cluster

  1. On the cluster shared disk, configure a folder for the IIS virtual servers. A folder should be configured for each IIS virtual server.
  2. Next, utilize Cluster Administrator to configure a resource group for each defined virtual server. For this, the Dependant Disk resource for each resource group is necessary. The Dependant Disk resource for MS DTC (if configured) should be the same as the IIS virtual server disk.
  3. Ensure that the IIS virtual server resources are on the node that manages the Physical Disk resource of the virtual web.
  4. Configure the IIS virtual server's IP Address resource in the exact group as the Physical Disk resource at the location of the Web folders. Configure the IP Address resource as being dependent on the IIS virtual server's Physical Disk resource and MS DTC resource (if necessary).
  5. Specify the IIS virtual server's Network Name in the exact group as the Physical Disk resource at the location of the Web folders.
  6. Configure the Network Name resource as being dependent on the IP Address resource.
  7. To configure the cluster Web site, utilize the Internet Services Manager snap-in. The cluster Web site can be a new Web site or an existing Web site. The Web site should utilize the IP Address and folder on the shared disk. Make certain that the Web site is not specified as All Unassigned, or to the IP address of the IIS virtual server. The Website has to utilize an anonymous username/password combination. The nodes have to be able to utilize these details.
  8. Next, continue to configure the identical Web site on the other cluster node.
  9. Configure an IIS server instance with the Web site value mapping to the IIS Web site. Utilize Cluster Administrator for this configuration. For failover, ensure that each node is a possible owner of the IIS server instance, and that an IP address resource dependency is configured. When the Web information is held on the cluster, the IIS server has to be dependent on the Physical Disk resource. A Network Name dependency can be configured. This will ensure failover when the network name is utilized for accessing purposes.
  10. It is recommended to utilize Cluster Administrator to start and stop the cluster Web sites / IIS resources. Cluster Administrator should also be utilized to remove cluster IIS resources.
  11. All IIS resources have to be removed from the node before you uninstal Cluster Service.

How to create a domain based Dfs root

  1. Open the Dfs console
  2. Right-click the Distributed File System icon, and choose New from the shortcut menu. You can also select the New Root option from the Action menu
  3. When the New Root Wizard launches, click Next on the Welcome To The New Dfs Root Wizard screen.
  4. On the Root Type screen, choose the Domain Root option if the server is a member of an Active Directory domain. Click Next
  5. Enter the fully qualified DNS name of the server hosting the Dfs root on the Host Domain screen. You can click Browse to search Active Directory for the server. Click Next
  6. When the Root Name screen appears, enter a name for the new Dfs root. You can also enter a comment in the Comments field. Click Next
  7. The Root Share screen is displayed when the share does not exist on the server. This is where you enter the full path to the folder that should store the Dfs root. Click Next
  8. Verify the settings that you have selected
  9. Click Finish
  10. The wizard now shares the specified folder, and creates the Dfs root and entries in the registry.

How to publish domain based Dfs roots in Active Directory

  1. Open the Dfs console
  2. Choose the Dfs root, and select Properties from the Action tab.
  3. When the Properties dialog box of the selected Dfs root appears, click the Publish tab.
  4. Enable the Publish This Root In Active Directory checkbox
  5. Enter a description for the Dfs root in the Description box
  6. You can also enter an e-mail address for the administrator of the Dfs root in the Owners box
  7. Click the Edit button to specify a list of keywords.
  8. Click OK.

How to create Dfs links

  1. Open the Dfs console
  2. In the left pane, choose the root that you want to create a link(s) for.
  3. Select the New Link option from the Action menu.
  4. The New Link dialog box opens.
  5. Enter the name that you want your users to see when they browse Dfs in the Link Name box.
  6. In the Path To Target box, enter the shared folder's UNC or DNS path. You can alternatively click the Browse button.
  7. Use the Comments box to enter any additional information.
  8. In the Amount Of Time Clients Cache This Referral In Seconds box, enter the amount of time for clients to cache the referral before they ascertain whether it is still valid.
  9. Click OK.

How to create targets for the Dfs root to provide redundancy

When working with domain based Dfs roots, you can configure the Dfs root with targets to provide redundancy. By setting up multiple targets for the Dfs root, you are enhancing fault tolerance for the Dfs tree. Targets can also be configured to automatically replicate with one another. You can ensure that users can continue to access files when a server has a failure by creating additional targets for your Dfs links.

To create targets for the Dfs root

  1. Open the Dfs console
  2. Navigate to the domain based Dfs root that you want to add targets for.
  3. Select the New Root Target option from the Action menu
  4. This action initiates the New Root Wizard.
  5. Enter the DNS name of the server that is going to host the new target. You can click Browse to find the server. Click Next
  6. Enter the path of the folder that you are going to use for the Dfs root target. You can click Browse to find the folder. Click Next
  7. Verify the settings that you have specified.
  8. Click Finish
  9. The new Dfs root target is created.


Top 5 Free Networking Tools

Bookmark Understanding Fault Tolerance

Latest Blog Posts


English English GermanGerman SpanishSpanish FrenchFrench ItalianItalian PortuguesePortuguese RussianRussian DutchDutch
GreekGreek HindiHindi JapaneseJapanese KoreanKorean ChineseChinese Chinese (Simplified)Chinese (Simplified) ArabicArabic

Copyright 2009 Tech-FAQ. All rights reserved. Privacy Policy.