Document, Document, Document: Change Management & More in the Data Center
In the IT world, if it isn’t logged or documented, it might as well never have happened. Without properly keeping track of change management, even for routine processes, it can be impossible to discover why a system stopped working, or worse. Technicians might be stuck halfway through a switch upgrade, unable to retrace their steps when they realize the equipment install won’t work. Or an entire organization could be held accountable under the law because they failed in their compliance.
IT documentation, in other words, is an essential if occasionally painstaking piece of data center operations. At Green House Data, we document everything we possibly can. Outside of support or internal emergency responses, which are always tracked in a ticket, planned changes must undergo a five-step process in order to keep track and learn from the change.
Any time an engineer, technician, or data center operations staff member needs to make a change to physical or logical pieces of the data center, they must follow a five-step process. Even things as benign as changing a system clock must go through the change management process.
Schedule & plan – multiple processes should not take place at the same time, because if something goes wrong, it will be very difficult to trace back to a single cause. It could be that vCloud update, or it could be the new switch that was being installed in the data center (for example).
At this time a SMOP is also created. SMOP is a tongue in cheek programming term meaning Simple Matter of Programming (it usually isn’t so simple), used to suggest additional features or code edits. We use it more as a Simple Method of Procedure, so to speak. It basically spells out the complete plan for the change. That means exactly what will happen, step by step, and what will happen if something goes wrong, including backup plans or ways to back out of the process and revert back to the original state.
Once a SMOP has been created for a given process it is often reused or copied and modified. They can therefore become a roadmap or document of more-or-less standard procedure.
Review – once the ticket is created and a time and date is set for the change, the SMOP makes its way to our Change Management Committee. The members and director of the committee rotate throughout our senior staff in order to avoid burnout and taking too much time away from other duties. This committee will review it at their next meeting, checking to see if anything was overlooked and if it will impact anything else going on at the time.
During the review, the change process is checked for:
- risk analysis
- criticality – how important is this change?
- resources required, both equipment/software and personnel
- internal or external
Announce – if the change will affect customers, we have to announce it to them via e-mail. The planned change is sent to the Client Services team along with a description of the event, the timing, who will be impacted, and how it will impact them. Client Services then notifies the relevant customer list.
Implement – the staff member in charge of the operation follows through with their SMOP.
Reflect – The next change management meeting reviews the process to see if anything was discovered that needs to be addressed, or to plan and reschedule the process if it was walked back. Any pain points or difficulties are added into future SMOPs.
Security Events & Other Logs
Change management isn’t the only protocol or event that requires documentation. Data centers must track everything happening in the building to maintain top-tier security and compliance with HIPAA and other standards.
That means careful logs of visitors (often manual), documentation from Building Management Systems, including automatic logs of entrances and exits from data center floors and other secure areas, equipment failure notices from monitoring tools or Data Center Infrastructure Management (DCIM) platforms, and security logs and alerts from firewalls and other tools.
Any tickets generated from these events and change management are reviewed during our annual audits. An auditor notices small details about them – for example, an internal ticket or a ticket that doesn’t affect customer systems will not have a customer notification, but if we didn’t list the reason for no notification within the ticket, it will be flagged during an audit.
Documentation is essential evidence and beyond covering data center providers in case something goes amiss, it also allows for learning opportunities and increased operational efficiency. Every time we perform a new process, or often even an old one, we can learn new ways to do it faster and with less of an effect on other systems.