Main Content Region

[RESOLVED] Downtime for Hardware Failure - ALL TCHPC Clusters

Update Wed 26 Jun 9am

The replacement controller was installed, and the disk arrays have been rebuilt. The filesystems are back online.

Logins are available on the cluster headnodes again. The queues will be released shortly.

Original post: Friday 21st 9.30am

Due to a hardware failure in the SAN storage system, the clusters (lonsdale, parsons, kelvin) will now be taken offline.

We are taking this step as the storage system is now in a non-redundant state, and we wish to guard against potential data loss.

We expect to have a replacement unit delivered on Monday, and will have the systems back online as soon as rebuilds have finished. This could lead into Tuesday.

All queues will be unavailable at this time.

The GPFS cluster filesystems (/home /projects and /gscratch) will also be unavailable during this period.

For queries, please contact: