March 1, 2023
OK, so data centers don’t use Duracell batteries (ours are much, much heavier, more expensive, and specialized). I just couldn’t resist the Matrix reference.
And what data center operator does have time for downtime? Nobody, with average costs per minute hovering at nearly $8,000. This same 2013 study from Ponemon discovered that 55% of data center outages were caused by—you guessed it—UPS battery failure.
Data center UPS (uninterruptible power supply) systems are supported by dozens of bricklike batteries, and if even one of them has a bad cell, it can take down the whole system. Even a brief hiccup in power can then lead to downtime for the entire data center. Despite all this, only 48% of surveyed operators regularly tested or monitored their UPS battery health.
The two main types of UPS systems used in data centers are line-interactive and on-line or Double Conversion (off-line UPSs power loads directly and have a long delay between power interruption and battery power). Line-interactive UPSs use an autotransformer to supply additional battery power and have the inverter in-line. Online UPS systems are the most common, as they are designed for 10kW+ and accommodate greater currents when switching AC to DC power and recharging batteries. In this system, the inverter is constantly hooked up to the batteries, so there is a buffer between power coming into the data center and the IT equipment.
Wet cell batteries are flooded lead sulferic acid containers, often found in larger facilities because they are reliable for longer periods of up to twenty five years. They need regular maintenance and a separate battery room as a safety precaution against acid spills. When a wet cell battery fails it is “closed”; similar to a strand of Christmas lights that are wired in parallel, the other batteries in the string will still function (albeit at a lower voltage).
Valve regulated lead acid (VLRA) batteries, in opposition, are common in many newer UPS systems. They have paste rather than liquid, an electrolyte that takes longer to recharge and also lasts only three to five years. With VRLA batteries, a single battery failure often causes the entire group to fail.
It’s clear that monitoring and testing batteries is vital if you want to avoid downtime. Using a combination of sensors and software, operators can keep logs and analytics to try and determine the real lifetime of a battery. Newer UPS systems come with testing functionality built-in, and a schedule should be followed religiously. Third party systems are also available, but make sure you choose a reputable, established vendor who can offer support, including hands-on testing and maintenance if needed. Monitoring and testing usually works by either measuring internal resistance of cells or by adding voltage.
Alarms can also be configured to tie into DCIM (data center infrastructure management) software, alerting based on voltage, impedance, temperature, record discharge, and more. OEMs often have recommendations on alert settings.
Best practice is to have redundant battery strings for each UPS, as this allows double the backup power time if they both work, and will keep systems running (for half as long) even if one string fails. If you find one cell is failing during routine testing, remove the entire string (disposing of it properly!) and replace them. You need to include this possibility in your DCOps budget, as mismatching batteries or adding new ones to older strings will result in overcharging your new battery.
Multiple discharges, bad connections, too many recharge cycles, too hot or too cold environments, and overcharging can all shorten battery life significantly. If you’ve placed UPS systems in your data hall and are also following modern standards for a hotter data center floor, chances are you are exceeding the ideal temperature for your batteries and should consider a separate battery room that is under 75 degrees Fahrenheit.
The best defense against battery failure is a combination of monitoring and testing provided both onsite and remotely. A little bit of standard process and dedicated operations staff can help you avoid an expensive case of downtime.
Posted by Director of Data Centers & Compliance Art Salazar