Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


August 2001

The 7 Habits of Highly Available Exchange Servers


RSS
Subscribe to Windows IT Pro | See More Performance Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Comparing RAID 5 and RAID 0+1

Lessons in self-and server improvement

Consulting about Microsoft Exchange Server availability is like watching the Loony Tunes' Wile E. Coyote: Watch for a while, and you can begin to predict the mistakes that lead to the falls. You also learn that the falls aren't as deadly as the pounding that follows close behind. After years of working with Exchange Server organizations, I've identified the factors that can lead to falls from high availability and the disaster recovery mistakes that can make these falls catastrophic. Inspired by Stephen R. Covey's bestseller The 7 Habits of Highly Effective People (Simon & Schuster, 1999), I've identified seven factors that help organizations prevent Exchange Server system failures and maintain high availability.

Seek first to Understand Downtime
Administrators must commit to solving the problems that decrease Exchange Server availability. Such problems fall into one of two categories: planned downtime or unplanned downtime. Planned downtime (e.g., applying service packs, upgrading hardware) is by far the easier category to manage. The best approach, when feasible, is to schedule planned downtime for nonbusiness hours.

Highly available Exchange Server organizations conduct risk assessments of unplanned downtime events. An important part of these assessments is the list you generate of possible downtime events. You can sort this list by the events' relative risks, then concentrate on preventing high-probability, high-impact events (e.g., Software Component A causing Software Component B to behave unexpectedly) and give less attention to the low-probability events (e.g., a meteorite striking your data center).

In my experience, software quality problems—bugs—are most often the cause of unplanned downtime. However, your response to outages—the decisions you make and the procedures you follow—determines the duration of the downtime. Unplanned downtime cycles have several stages, from problem identification through recovery. Understanding these stages and preparing yourself for action helps minimize downtime.

The first stage is notification that a problem exists. Automated notification systems—either built-in or added on—can detect hardware problems before they cause outages. OS- and application-level monitors, such as NetIQ's AppManager Suite and BMC Software's PATROL for Microsoft Exchange 2000 Servers, also aid in early problem detection. Undetected problems can lead to cascading failures that obscure the source problem. For example, suppose a mail connector queue fills Server A's hard disk. If this problem goes unnoticed, it might result in a connector on Server B failing to deliver messages to Server A. Thus, Server B appears to be the source of the problem, which diverts attention from the actual source: Server A.

The second stage is thorough problem analysis. Analysis helps you develop a troubleshooting course of action. The troubleshooting team must react quickly, but mistakes can be costly. The team members need to first isolate the problem to prevent further harm. Then, they must gather information about the problem, whether from tracking logs, Windows event logs, or the server operator's records of system changes.

Implementing and testing your recovery solution is the third stage. But don't consider the downtime cycle complete until the fourth stage: analysis of the lessons you've learned. Most unplanned downtime events contain lessons that can help you prevent a recurrence of the problem.

Put Hardware First
Hardware is the foundation of availability. Application stability doesn't matter if
you don't run your applications on solid hardware. Fault-tolerant hardware often lets you repair hardware faults without taking systems down. Redundant components can keep systems running when the inevitable hardware faults occur. Hot-swappable components let you replace them without downtime.

RAID-protected hard disk subsystems are key to protecting your Exchange servers from the effects of hard disk failure. Best practice is to place Exchange Server log files on a RAID 1 volume and the database on a RAID 5 or, better yet, RAID 0+1 volume. For more information about the pros and cons of these RAID configurations, see the sidebar "Comparing RAID 5 and RAID 0+1."

Storage planning is another important consideration. One organization's Exchange Server administrators told me that migrations to larger storage cabinets and more or larger hard disks were their servers' most significant sources of downtime (corporate policy prevented these administrators from enforcing mailbox limits). The organization was looking into a Storage Area Network (SAN) as a solution. A SAN provides a high-performance pool of hard disks from which you can allocate storage to servers. SANs also simplify storage expansion, reconfiguration, and backup and recovery. However, transitioning to SAN-based storage can be difficult and can increase downtime.

Clustering for a Win-Win Environment
Clustering improves application reliability and helps prevent system failures. But the real beauty of clustering is that it can make even unreliable applications highly available to end users. For example, one day Node A in my 2-node Exchange Server 5.5 cluster began failing over to Node B. When I looked in the event log, I noticed that the failovers were occurring at 2-hour intervals. The person who installed the cluster had mistakenly installed an evaluation edition of Windows NT Server. When the 120-day evaluation period had expired, the OS began performing hard shutdowns every 2 hours. Clustering kept our Exchange Server system available to end users until we resolved the problem.

Clustering also helps you manage planned downtime. In a clustered environment, you can fail over Node A's services to Node B, then apply a service pack, hotfix, or upgrade to Node A.

Exchange Server 5.5 permits only 2-node active-passive clustering. Only the active node can perform Exchange Server processing. The passive node can't perform any processing until failover occurs. This limitation has lowered clustering's adoption rate, because 2-node active-passive clustering requires you to spend twice as much money on hardware without increasing processing capacity.

Exchange 2000 active-passive clusters are slightly different from Exchange Server 5.5 clusters: One node runs an Exchange Virtual Server (EVS) and the other has Exchange 2000 and doesn't run EVS until a failover occurs. Exchange 2000 with Service Pack 1 (SP1) permits 2-node active-active clustering on Windows 2000 Advanced Server. However, to ensure failover, you need to carefully distribute active user connections and keep processor utilization within the range that lets failovers occur. You can progress to 4-node clustering (i.e., 3+1 clustering) on Win2K Datacenter. Although you get better returns for your hardware investment when you cluster on Exchange 2000 and Win2K, you must still purchase special storage that lets two or more cluster nodes share a hard disk. Fibre channel SANs are a must for 3+1 clusters. For more information about clustering, see Greg Todd, "Microsoft Clustering Solutions," November 2000.

Back Up with Restores in Mind
A nasty crash can result in a corrupted Information Store (IS) that won't mount. This situation can necessitate a lengthy recovery process. Checking database integrity can take several hours. Eseutil, Exchange Server's primary integrity check and repair utility, could take an hour to check and repair a 15GB database, even with the fastest disk technology.

   Previous  [1]  2  Next 


Top Viewed ArticlesView all articles
CES 2009: Ballmer Announces Windows 7, Windows Live, Live Search Milestones

During his first-ever Consumer Electronics Show (CES) 2009 keynote address last night in Las Vegas, Microsoft CEO Steve Ballmer announced the pending public availability of a feature-complete Windows 7, the final version of Windows Live Essentials, and ...

Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

Where is Microsoft NetMeeting in Windows XP?

...


Exchange Server and Outlook Whitepapers Protecting (You and) Your Data with Exchange Server 2007

StoreVault SnapManagers for Microsoft Exchange and SQL Server

Related Events Virtualization Forum: Optimizing Storage, Networks, Desktops, and Security

Cloud Computing Forum: Integrating Software, Server and Storage as a Service into Your Enterprise IT Delivery Model

Virtualization Forum: Optimizing Storage, Networks, Desktops, and Security

Check out our list of Free Email Newsletters!

Exchange Server and Outlook eBooks Spam Fighting and Email Security for the 21st Century

Understanding and Leveraging Code Signing Technologies

The Expert's Guide for Exchange 2003: Preparing for, Moving to, and Supporting Exchange Server 2003

Related Exchange Server and Outlook Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.

Exchange & Outlook UPDATE eNewsletter
News, strategies, products, and developments in Exchange Server and Outlook messaging.

Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technology Resource Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2009 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing