Redundancy: System hardware and software redundancy
Money.net server configurations follow the classic cluster design in a functionally independent layered architecture. The first layer is configured with the high bandwidth Cisco routers for routing traffic internally to the appropriate load balancer. The second functional layer consists of a cluster of web servers running Apache on Linux. The third layer runs a cluster of our core technology servers called SDS - designed to scale horizontally to support thousands of simultaneous connections and delivery of market data. SDS servers are deployed on Linux for reliability and speed. The fourth layer is the innermost secure database layer for user accounts and portfolio maintenance.
Fail over, fault tolerance: HW and SW fault tolerance monitoring and handling
We have configured all our servers with monitoring scripts, tracking - CPU utilization, traffic, response time to services, system log status, memory utilization, page faults, and server down scenarios. This is linked to a central monitoring facility, which automatically sends alerts to technology operations.
Most of the fail over tasks are automated. For example, if a server goes down, it is automatically taken out of the cluster configuration and an automatic switch is made to the next available server in the cluster for traffic. In most of the cases we have the capability of detecting failure even before the server is down. In the event server is down due to hardware failure, the remote activity logs are read to replace or fix the components and restart the server. This is seamless to the users as their requests are redirected to the other servers in the cluster. As soon as the server goes online it is automatically placed again in the cluster.
Fail over and fault tolerance: Power
Money.net is in the midtown AT&T IDC, which has a dual (2N) Uninterrupted Power Supply (UPS), each with multiple modules synchronized to work in unison or independently. UPS systems receive power from both commercial power feeders and standby generators and are designed to support a fully loaded center for 15 minutes via its internal battery plant.
In case of a commercial power failure, banks of diesel generators in an N+1 design (at full load) provide power to the center within one minute of a commercial power outage. The one-minute gap is covered by the UPS battery system to ensure no impact on equipment. During an extended commercial power outage, the diesel generators provide power using the fuel securely stored on site. Fuel suppliers are on standby to provide additional service.
|