Downtime for Service Expansion (20-23rd Dec 2010) - ALL TCHPC Clusters
Further to below, the unanticipated outage obviously lasted over Christmas as the College was closed.
The issues seem to have been due to possible incompatibilities between different hardware stacks at the back-end I/O infrastructure. There was also a failure of a Fibre Channel card on one of the I/O servers.
We have been working to restore services.
The Lonsdale and Parsons clusters are now online again after testing.
There may be some further minor downtimes on the queues, but we anticipate that all systems are now working correctly.
The IITAC cluster will be brought back online shortly.
The new e-INIS cluster (called Kelvin) has also been brought online for initial testing.
Please note that following the Christmas break, we are investigating the cause of the I/O problems, and hence all cluster services must remain offline for now.
Logins to the lonsdale and IITAC head nodes are available, but global home and gscratch are unavailable, so this is only useful for remote logins.
Due to problems in the I/O fabric, all global home and gscratch have been taken offline.
We apologise for any inconvenience.
Please note that all TCHPC clusters are offline for the start of this week (20-23rd Dec 2010).
This is to allow us to expand the I/O infrastructure in advance of installation of the new e-INIS cluster (called Kelvin).
Logins will be re-enabled as soon as possible to allow users to log in and view files / submit jobs.
More like this
- [RESOLVED] Downtime for Hardware Failure - ALL TCHPC Clusters
- Downtime for Service Window (5th-6th Apr 2011) - ALL TCHPC Clusters
- Downtime: Tues 9th Mar 2010 from 9am - 2pm
- Downtime for Server Room Maintenance (Fri 21th Nov - Fri 5th Dec 2014) -- ALL TCHPC Clusters
- Downtime: Monday 22nd Feb 2010 from 9am - 2pm