EWFM Data Center: A shortcut to a faster supercomputer (From left: Michael Spear, Roberto Palmieri, and postdoc researcher Ahmed Hassan)
Library and Technology Services (LTS) manages two data centers on campus and utilizes co-location services in the region for disaster recovery and business continuity. The main data center consists of 3,500 square feet of raised floor space. Data Center power is provided by an independent feed from the university's power grid with emergency power provided by both diesel generator and natural gas generators. In addition, network connectivity and power supplies are redundant for all systems. These data centers contain over a petabyte of disk storage and a combined total of over 500 virtual and physical servers.
Network Operations Center
Our Network Operations Center (NOC) is located next to the main data center and provides support for data center operations, centralized monitoring and alerting, and is our primary interface for incident management. The NOC is staffed 6:30am-11:00pm, Monday through Friday, and staff are on-call during non-working hours. Open source and proprietary monitoring software provide visibility and alerting for all components of our cyberinfrastructure 24x7x365.
Data Center Features
- Highly redundant and resilient design and architecture
- Diesel and natural gas generators in case of city power loss
- More efficient, reliable cooling and power delivery to compute than typical server rooms
- Increased energy savings in terms of cooling supported by hot/cold aisle configurations
- Secure facilities with access control and video surveillance
- Fire protection using environmentally safe clean agent with Emergency Power Off (EPO)
HPC
High Performance Computing (HPC) is hosted our in our data center. Currently, HPC consists of the following:
- Sol, a shared condominium cluster, consisting of a total of 2,528 cores, 13.4TB RAM, 120 consumer grade NVIDIA graphics processing units (GPUs) with 1TB memory, and 100Gbps Infiniband fabric.
- Hawk, a NSF Campus Cyberinfrastructure grant-funded cluster (Office of Advanced Cyberinfrastructure award number 2019035), consisting of a total of 1752 cores, 16.9TB RAM and 32 nVIDIA T4 GPUs with 512GB memory.
- Ceph, a 2026TB expandable (675TB usable) storage resource including 795TB funded by NSF’s Campus Cyberinfrastructure grant, based on the Ceph software.
- CephFS, a 94TB raw (45TB usable) scratch filesystem supported by solid state drives, serving both Sol and Hawk.