In what is being called the largest IT outage in history, a flawed CrowdStrike security update caused Windows devices to “blue screen,” making them nonfunctional. As we all heard and many experienced, this outage affected 911 services, airlines, airports, hospitals, financial systems and more.
Even though CrowdStrike is a security tool used to protect endpoints against cybersecurity threats, this was not a security event. According to the company’s website this was a “defect found in a single content update for Windows hosts.” With cyber top of mind in today’s world, the nature of this outage drives home the fact that proper IT monitoring and having the right processes and procedures in place are critical to recovering from such an event.
K2 Services does not leverage CrowdStrike, but some of our clients do. As stewards of their environment, we worked to ensure the effects of this event were minimal for client day-to-day operations. As the flawed update started to roll out, K2 monitoring tools began alerting on those systems affected by this event. As the K2 Network Operations Center began to coordinate alerts, it became clear something larger was happening across client environments. The K2 NOC team quickly initiated our Major Incident Management process that involved our engineers and team leaders immediately engaging and working through the issues caused by this unprecedented event. By 9 a.m. Central time, due to the quick and coordinated reaction, our affected clients had been fully recovered.
Now that the dust has settled, this event is a strong reminder that reviewing security incident management and engaging in frequent recovery tabletop exercises are critical in our technology-driven world. Because K2 proactively monitors and manages our client network environments, K2 was quickly alerted that the event was widespread across many client environments. Because K2 has a structured Incident Management policy in place, the K2 team was proactive, coordinated and as efficient as possible in determining next steps. Our team implemented the needed remediation and validated a working state to get customers back to business as usual.
In an after-action report released by CrowdStrike, the organization identified a structured process review and improvements to reduce the risk of any similar events in the future. Steps outlined in the report included a phased rollout, starting with a “canary” deployment to gauge the effects of the changes before pushing updates to more systems and additional independent third-party code reviews.
As a result of this event, K2 will be rolling out an internal review of our Major Incident Management response plan to identify improvements and even further reinforce our security and incident response posture. We will be conducting internal exercises incorporating lessons learned for continuous improvement of our operational response and recovery procedures.
If you’re interested in discussing the security recovery policies and procedures your organization should have in place or if you’d like to review your existing security posture, K2 Services can assist. Reach out to us today.