When I started figuring where I wanted WebSphere 6.0 high availability to go at the end of 2002, I decided I wanted high availability to be a lot easier and cheaper to setup than was possible at the time. I also wanted to avoid the need for SANs, shared disk arrays etc. All the applications required external software like Veritas and HACMP as well as switched disks attached with fiber or SCSI to do proper failover and it was slow. It could take minutes to restart the failed server.
It was very complex to setup, lots of steps, lots of scope for mistakes. It was too hard. This wasn't unique to WebSphere. Everybody had this problem. Everybody still has this problem today except us. WebSphere 6.0 is kind of revolutionary in this respect. For the first time, customers will be able to get best of class availability (on ANY platform) out of the box on commodity hardware.
WebSphere 6.0 is designed to take advantage of shared file systems like IBM SAN FS, Windows CIFS, and NFS v4. A cluster that uses a shared file system is almost childs play to make highly available now and will recover indoubt transactions for the failing cluster member in around 11 seconds. This is the current limit using a shared file system although faster times are possible using other configurations.
The first thing is to mount a shared file system at the same mount point on all cluster members. Say /mnt/cluster. Next, create a directory for each cluster member. On each cluster member change the transaction log directory to be /mnt/cluster/serverX, where serverX is the directory for that server. Now, bring up the cluster screen and enable "High Availability" services.
That's it, you're highly available. If server A crashes then WebSphere's HAManager component will detect the failure and tell one of the other servers in the same cluster as server A to recover the server A transaction logs. The other servers can see server A's transaction logs because of the shared file system.
WebSphere uses file locking to ensure only a single server at a time can modify a set of transaction logs. NFS v4 and SAN FS provide fast lock recovery when a share file system client crashes. AIX 5.3 can have its NFS v4 tuned to do this after only 10 seconds. SAN FS also. This means that any file locks held by the dead server are released automatically after this interval. This allows the peer cluster member to safely lock the files and do recovery.
If the failed server restarts then the HAManager will do an orderly failback a second or two after the server starts.
NFS v4 is currently only available on AIX 5.3 and Solaris 10. I read the other day that Linus is looking at adding NFS v4 to the 2.6 Linux kernels later this year. This means that this new level of availability will be available on x86 Linux hardware.
Failure Detection in WebSphere 6.0
WebSphere 6.0 detects failures in one of two ways out of the box. One way uses TCP sockets to detect dead servers. This depends on KEEP_ALIVE tuning to be effective when hard failures occur. A hard failure is a power failure, motherboard failure or network problem. The other uses active heart beating. The default heart beat rate is a heart beat every 10 seconds and if twenty heart beats are missed then we mark the server as suspect. Obviously, this is 200 seconds or 3 minutes. This was done so that WebSphere worked well in smaller development machines etc. It is, of course, tunable and we've had it down as low as 6 seconds (2 second HBs with 3 indicating failure). The recovery time for indoubt transaction recovery is the greater of the failure detection time of WebSphere and the lock recovery time of the shared file system. Even if WebSphere can failover in 4 seconds, if the file system takes 10 seconds to release the locks then recovery will happen when the locks are released, i.e. 10 seconds. So, the file system lock lease time should be used as the heart beat rate for production systems. Again, this is the current best recovery time using a shared file system, faster recovery is possible using configurations that don't use a shared file system and thats another blog entry.
IP Failover
WebSphere 6.0 doesn't need IP failover for TM recovery when it's configured for hot standby. This is a big improvement over previous versions of WebSphere and everyone else. WebSphere now uses logical names for the TM and these are resolved dynamically at runtime to where ever the TM was placed by the HAManager.
2 minute Configuration
I hope customers will start to use this new feature. It's a very simple way to make a cluster highly available at very low cost and anybody can do it. It takes under 2 minutes to setup.
- Export a file system
- Mount the file system on all cluster members
- Make directories for each server
- Change the tran log directory to the shared file system
- Check the HA box on the cluster panel.
Some of you may also realise that even without a shared file system, this can help with other scenarios. When you use vertical clustering, i.e. multiple JVMs per server then the local file system is in effect a shared file system for those JVMs on the same server and WebSphere can be configured to do this kind of recovery here also. This won't handle box failure but will handle JVM panics etc.
SAMBA
I tried to get SAMBA to work. It's supposed to do file locking etc and the solution also works on native Windows client and Windows file servers. But, making SAMBA do file locking like this was beyond me. I still think it's possible but I couldn't get any help from the community here. If SAMBA could be configured to work then this technology would work on any Linux with that SAMBA. If someone from the SAMBA community wants to work with me on this then email me. It's a shame that it appears not to work.
The problem with NFS v3
NFS v3 doesn't work here. The problem is locking. When a server fails, the locks only get released when that server restarts. That takes AGES. Minutes. Or an administrator can manually release the locks. Again, not really what I want. The solution really needs lock leasing to work well. Only SAN FS, NFS v4 or CIFS provide this fast lock recovery with no dependancy on the fail server.
SAN FS
SAN FS offers a ultra high performance shared file system that doesn't require large file server boxes. It uses a unique architecture that means it's as fast as a local file system but is fully shared. You can have a couple of small 2 way boxes as SAN FS servers (for HA) and run a lot of WebSphere or database boxes on the shared file system. It also offers a shared file system that uses lock leasing on older operating systems. NFS v4 is available on the latest and greatest operating systems like AIX 5.3, Solaris 10 and later this year 2.6 linux kernels. But, if you're on an older operating system then you don't have this support built in. SAN FS is available on these older operating systems and will enable you to take advantage of this new WebSphere support on those platforms.
Summary
Thats it. No extra software besides WebSphere needs to be bought and no switched disks per server when using CIFS or NFS v4 (SAN FS requires a SAN and the SAN FS client to be installed). I think this sets a whole new level of ease of use in clustering on the market and puts WebSphere in front of the pack even after WebLogic 9 ships later this year. The HAManager is exploited all over the place in WebSphere 6.0. The new messaging engine uses it for similar failover times. All the critical singletons in the product that used to run in the deployment manager now 'float' and run anywhere in the cell. Customers can, of course, specify policies to tell WebSphere where they would like these services to run and can specify backup servers etc.
So, there it is. WebSphere 6.0 can achieve < 12 second recovery times on commodity hardware when it's used with a shared file system with lock leasing.