How’s & Why’s Of Datacenter Tiers

There’s Much More To Consider Than Just Raw Computing Power

When it comes to datacenters, being No. 1 is not the best. Nilesh_RaneDatacenters typically are ranked in tiers, with Tier I the most basic and prone to downtime and Tier IV the most robust, redundant, and functional.

There are several infrastructure elements to consider beyond raw computing power and the number of fiber-optic pipes in and out of the datacenter. These site infrastructure features include power, cooling, and emergency backup capacity and functionality, height of the raised flooring, fire suppression, and both logical and physical security.

Customersoften come to us for help with datacenter consolidation or new datacenter implementation projects. The discussion quickly comes around to the appropriate “Tier Level” for their IT facilities. What we’re talking about here (to a large extent) is an industry standard way of describing the availability of the datacenter facility. Availability, in this case, is referring to the degree to which the facility can support constant uninterrupted operation of the contained data processing systems.

We know that the systems themselves can be architected with high-availability configurations. Autonomous failover of network connections, clustered server environments, and so on are ways that the systems can sustain operation even if, say, a server crashes. What the Tier Levels of a datacenter refer to though, is the capability of the facility itself to support the systems it serves. Utility power can fail, the temperature in the building can rise to cause damage to equipment, and so on. These are facilities issues, and are the foundation upon which any amount of data processing fault-tolerance stands.

A bit of history on standards for datacenter facilities
As with most areas of business, we rely on industry standards as reference-able qualifications and guidelines to avoid the dangers and ambiguities of qualitative references. Instead of, “very robust,” “fully fault-tolerant,” “top notch,” or “Class-A,” we use industry standards to articulate quantifiable guidelines for exactly how “top notch” a certain thing really is.

The American National Standards Institute (ANSI) and the Telecommunications Industry Association (TIA) are examples of organizations that formulate standards for the industry to follow. The TIA developed a specification entitled TIA-942: Telecommunications Infrastructure Standard for datacenters. This is perhaps the most widely referenced standard when talking about datacenter facility availability. From the title one might be inclined to think this is a specification of telecom for datacenters. It is that, but it’s much broader than that including cabling, space layout, site selection criteria, and infrastructure tiers/availability. This last point, represented by Annex G in the TIA-942 standard, is where all the talk about datacenter tiers comes from.

As it turns out, the TIA relied upon an organization called The Uptime Institute (or, ‘The Institute” for short) to develop this part of the standard. The Institute’s charter is to provide research-based information on high density computing and mission critical facilities in a vendor-neutral manner. As such, the Institute has become the industry’s trusted source of information in this regard. The Institute continually gathers benchmark data by surveying existing datacenters and datacenter projects. Its research includes models for estimating implementation costs and Total Cost of Ownership (TCO) for datacenter projects.

Perhaps the one piece of research that most strongly defines the Institute’s work is the definition of tier classifications for datacenter performance. This is the definition of the four-tier system for classifying datacenter capabilities. Thus, when we hear people speak of, say, a “Tier-4 Data Center,” they are referring to the tier classifications from TIA-942, or the Uptime Institute.

The Tier Classifications for Datacenters
The tier classification model provides an objective basis for comparing or describing the functionality, capacity, and cost of a datacenter’s facility architecture. In particular, the tier classification model is focused on the Availability of the facility itself, and is driven by the infrastructure to power and cool the data processing environment.

The power and cooling capabilities of a facility are delivered by its Mechanical, Electrical, and Plumbing (MEP) infrastructure. The Mechanical systems provide cooling to the environment in which the data processing equipment is installed. It is comprised of air handlers, air conditioners, chillers, plenums to channel airflow, and so on. The Electrical systems provide the power to the data processing equipment. It is comprised of the utility service to the facility, transfer switches, generators and Uninterruptible Power Supplies (UPS), batteries, Power Distribution Units (PDUs), load banks, breaker panels, bus duct, copper cabling, and so on. The Plumbing systems support the Mechanical and Electrical systems by routing cabling, air, water, fire suppression gases, and so on. There are multiple plumbing circuits in the facility and is analogous to the vascular system of the building.

Very simply put, the tier classifications refer to the degree of resilience the facility has to failures of MEP systems. Resilience to failures is provided by redundancy and topology of the infrastructure design. In the tier classification model, a Tier-1 facility is the least resilient and a Tier-4 is the most resilient. Said another way, a Tier-1 facility has the lowest availability and a Tier-4 has the highest availability. Said yet another way, a Tier-1 facility carries the highest risk to the business and a Tier-4 carries the lowest risk to the business.

Tier 1- Basic Datacenter Infrastructure
A Tier-1 facility has no redundant capacity components. It provides basic power and cooling to the data processing footprint with no excess capacity for backup or failover, and has no redundancy in the MEP distribution paths.

In this type of facility, any unplanned outage or failure of a capacity component or distribution element will impact the data processing equipment and end-users. Whenever maintenance is needed for the MEP infrastructure (utility work, replacement of components, certification testing, preventative maintenance, and so on) the impact is just as if there were an unplanned outage. All systems and users are affected.

As per Institute benchmark data, Tier-1 sites typically experience two separate 12-hour site-wide shutdowns per year for repair work. In addition, Tier-1 sites typically experience 1.2 equipment or distribution component failures on average each year. Statistically, this means 28.8 hours of downtime per year, or 99.67% availability.

Tier 2: Datacenter with Redundant Capacity Components
A Tier-2 datacenter has redundant capacity components, but only a single non-redundant distribution path serving the data processing equipment. The benefit of this level is that any redundant capacity component can be removed from service on a planned basis (e.g., for preventative maintenance) without causing the data processing to be shut down. However, an unplanned outage or failure of any capacity component or any disruption to the distribution path may impact the computer equipment.

On average, Tier-2 sites have one unplanned outage per year, and schedule three maintenance activities over a two-year period. The annual impact to operations is 22 hours of downtime per year, or 99.75% availability.

Tier 3: Concurrently Maintainable
A Tier-3 datacenter has redundant capacity components and multiple independent distribution paths serving the data processing footprint. There is sufficient MEP capacity to meet the needs of the data processing systems even when one of these redundant MEP components has been removed from the infrastructure. In a Tier-3 datacenter, maintenance activities and certain unplanned events can occur without interruption to the computing systems.

Because of the concurrently maintainable characteristic of Tier-3 facilities, no annual shutdowns for routine maintenance are required. This allows for very aggressive preventative maintenance programs to be implemented, extending further the operational duty of the MEP components. The Institute has concluded that Tier-3 datacenters have unplanned events totaling only 1.6 hours per year. Tier-3 sites then, deliver 99.98% availability.

Notice that both the Tier-1 and Tier-2 levels deliver “two-nines” availability, but the step to Tier-3 delivers “three-nines.” This is a big improvement in uptime.

Tier 4: Fault Tolerant
Tier-4 facilities have multiple, independent, and physically separate systems that each have redundant capacity components and multiple, independent, diverse, AND active distribution paths supporting the data processing footprint. In a Tier-4 datacenter, any single failure of an MEP component or distribution path has no negative impact to the data processing systems, and the infrastructure automatically responds to the failure to prevent further impact to the facility.

Because of the degree of redundancy and fault-tolerance in Tier-4 infrastructures, facility-related failures that impact the data processing equipment are statistically reduced to 0.8 hours per year. This yields 99.99% availability