I've seen a scenario over the last couple of days where a customer had a cluster and wanted to monitor the state of a backend system. They didn't want each cluster member to monitor the backend because of load etc but this resulting in a single point of failure or SPOF for the monitoring.
Partition Facility (XD)
So, I suggested the following. Use XD. The XD application creates a single partition called "MONITOR". The task of the MONITOR partition is the following:
PartitionStart:
- Check the status of the backend and publish it on a non persistent but reliable JMS TOPIC.
- Start an alarm (async beans API) to expire in say 10 seconds.
AlarmFired:
- Check the backend status and publish it on the topic if the status changed.
- Reset the alarm (async beans API) to fire in another 10 seconds.
When the application starts on a cluster member then do the following using a startup bean.
ApplicationStart:
- Make an IIOP call to the partition routable session bean and call a method to determine the backend status.
- If it fails (i.e. the partition is in the process of recovery) then simply mark as unknown and wait for a JMS message once the partition restarts (seconds).
An application using partitioning has a single special stateless session bean called the PSSB (partition stateless session bean). Client IIOP calls to this session bean use partition specific routing. The client uses a Java Bean called the PRSB (partition routable ...). The JavaBean has methods with the same signatures as the remote interface of the PSSB except they always return a String. The string tells the client which partition to route the method request to. The check status method always returns the string 'MONITOR' in this case. The routing is static, i.e. always route to where 'MONITOR' is running. The method call will be routed to the cluster member where the MONITOR partition is currently running and returns the current backend status.
Add an MDB that listens to this topic to the application. Make sure the JMS provider is configured to deliver any messages published on the topic to ALL subscribers. Each cluster member will receive a status message from the partition when the status changes. If the server with the partition fails then XD will reassign the partition to one of the surviving cluster members. This member will then do the partitionStart logic and we're back to monitoring again.
It's a simple problem that has an elegant solution using WebSphere XD, the partition facility and the async beans APIs (startup beans, alarms). We only monitor the backend from a single cluster member, everybody gets timely information on backend availability. The monitoring service is highly available using the partition facility.