11/08/2024 by Willy Rojas and Dustin O'Bier
Survive Digital Disruption: CrowdStrike Global Outage Takeaways
In a world so reliant on digital technology, we often expect (and hope) that our software will be stable. Yet even the most reliable technology platforms can falter. Take the recent digital disruption felt by businesses affected by the CrowdStrike global outage for example.
The recent global outage involving CrowdStrike, one of the world’s leading cybersecurity companies, was a stark reminder that no system is entirely immune to disruption. It’s a harsh reality, but one we need to face head-on. Here’s how the recent outage affected businesses and, most importantly, some essential tips to ensure your operations remain resilient. Read on to safeguard your company’s digital future.
Crowdstrike Global Outage Event
On July 19, 2024, CrowdStrike released an update for its Falcon Sensor software. The update caused a significant global IT outage, crashing millions of Windows computers and displaying what you might otherwise know as the “Blue Screen of Death.”
Around 8.5 million systems worldwide were affected. The outage interrupted businesses of all kinds, including airlines, healthcare, banks, and more. Here are a few examples of the disruption the outage caused.
Airlines
At LaGuardia Airport, the outage caused their baggage handling system to fail, causing significant delays and widespread operational disarray. Wait times were extensive, and many passengers missed their flights.
Delta Airlines was the largest airline affected by the outage. The company had to reset over 40,000 servers and manually cancel 5,000 flights, losing over 500 million dollars in revenue.
Healthcare
Hospitals like the Mayo Clinic, Cleveland Clinic, and Mass General Brigham faced system crashes that affected patient care and administrative functions. Electronic health records went offline, delaying medical procedures and patient admissions. The estimated financial impact on the healthcare sector alone was around $1.94 billion.
Banking
Financial institutions like JPMorgan Chase and Bank of America suffered considerable downtime. Transactions, online banking services, and customer support were affected. The inability to process anything led to customer dissatisfaction and financial losses. The estimated impact on the banking sector contributed heavily to the global economic damage totaling at least $10 billion.
Preventing Digital Disruption in Your Business
The CrowdStrike global outage underscored the importance of being prepared for the unexpected. While such events may be rare, businesses should understand that no service is exempt from disruption. But don’t panic. You can use the practices below to reduce any impact should an incident like the CrowdStrike outage ever happen.
Have a Strong Incident Response Plan
A well-structured Incident Response Plan (IRP) is crucial for navigating outages. For many businesses, the CrowdStrike outage was a wake-up call about the importance of having a detailed IRP. Most organizations now have plans for cyber threats but remain unprepared for a service outage.
Organizations need well-defined and practiced IRPs. An effective IRP ensures faster recovery and coordinated actions during outages.
A proper incident response plan should have the following components:
- The organization’s incident response strategy and how it supports business objectives
- Roles and responsibilities involved in incident response
- Procedures for each phase of the incident response process
- Communication procedures within the incident response team, with the rest of the organization, and external stakeholders
- How to learn from previous incidents to improve the organization’s security posture
Without a solid IRP, chaos can arise when essential tools and services go down. Time is so critical in these kinds of situations. Companies that don’t respond to incidents fast often face increased downtime and direct revenue loss.
According to a SANS report, companies without a proper IRP take 54 percent longer to contain incidents that cause downtime. Additionally, a study from Ponemom Institute found that organizations without an effective IRP team experienced 54 percent more downtime compared to those with one.
Having an IRP in place is crucial, but the second most important aspect is testing it often! This ensures that the plan is effective, team members are familiar with their roles, and potential gaps are identified before an actual incident occurs.
Practicing the IRP should be done annually or after a major change to your process. Planning and organization are the only ways to mitigate significant disruption. It’s always best to be prepared for the worst!
Review Your Software Deployment Practices
Deploying new software can be a very complex process, especially when it is dependent on other applications or systems. The CrowdStrike global outage was caused directly by this issue, as it was dependent on the Windows operating system.
Here are some best practices for establishing an effective deployment process.
Develop a Patch Management Policy
Define a comprehensive policy that details the procedure for managing patches, specifying roles, responsibilities, and schedules.
Inventory Assets
Keep an updated inventory of all hardware and software assets that need patching.
Prioritize Patches
Assess and schedule patches based on the severity of vulnerabilities and the importance of the systems they impact.
Before deploying patches to production environments, test them in a controlled setting. This way, you can be sure they won’t cause any issues.
Automate Patch Deployment
Automated tools are excellent for streamlining the patch deployment process. This can reduce the risk of human error and ensure timely updates.
Monitor and Audit
Continuously track the patching process and audit patch deployments to ensure compliance and effectiveness.
Have a Rollback Plan
Have a rollback plan in place. This allows you to revert to a previous state should a patch cause problems.
Keep a Vendor Patch Schedules
Stay informed about each vendor’s patch release schedule to plan and prepare for upcoming updates.
Document Everything
Keep detailed records of all patching activities, including what was patched, when, and by whom.
Assess Your Vendor Relationships
Periodic assessment of third-party vendors is necessary to ensure resilience. The CrowdStrike outage has prompted many organizations to reconsider their vendors. Businesses should assess vendor relationships to confirm they meet the organization’s risk tolerance and operational needs.
Scheduling annual assessments can help keep this task from being forgotten. Assessments should include one for vendor risk and a security questionnaire.
Vendor Risk Assessment
This assessment evaluates the potential risks that the vendor may introduce to your organization. This includes understanding the vendor’s operations, data handling practices, and risk profile.
Security Questionnaire
This should be comprehensive and help you understand the vendor’s security policies, practices, and standards. Topics may include encryption, incident response, access controls, and employee security training.
Consider Diversification
Companies may wish to review their diversification processes when relying on critical software to run their operations. As highlighted by the CrowdStrike outage, over-dependence on a single software solution can expose the business to significant risks. This can include operational disruptions due to software failures, security vulnerabilities, or vendor instability.
Software diversification helps you not be reliant on one system. This can provide contingency options and flexibility in the face of unexpected challenges. By incorporating several complementary software tools or services, companies can enhance resilience, maintain business continuity, and mitigate potential risks.
Building Digital Resilience in the Face of Uncertainty
While incidents like the CrowdStrike outage can be rare, their impact can be severe. The unpredictability of such events can be scary, but with these proactive practices, there’s little to fear. Remember, a resilient business is a prepared business. By taking these steps, you can protect your operations and buckle in for a smooth ride in the digital landscape.
Get More Content Like This In Your InboxABOUT THE AUTHORS
Willy Rojas
Engineer II, Infrastructure
Willy has been with Trinity Logistics for eight years. He’s held several other IT positions while here, starting as a Service Desk Intern to Senior Service Desk, IT Systems Administrator II, and Infrastructure Engineer II. Willy finds cybersecurity fascinating because it’s often changing and giving him something new to learn. He also finds satisfaction in knowing that the work he does every day is important, from keeping confidential information secure to keeping business operations running smoothly.
Dustin O’Bier
Manager, Infrastructure
Dustin has been working at Trinity for 21 years. Previous positions he’s held include Help Desk Specialist, System Administrator I, Senior System Administrator, and IT Systems Manager. Dustin enjoys the collaborative aspect of his role. He loves working alongside a team of people to solve complex problems. Dustin’s main focuses as Manager in Infrastructure cover three distinct aspects; the Core Infrastructure of the Regional Service Centers (RSCs), Security, and the company’s Cloud practice. He finds each focus brings a unique set of challenges, making his role dynamic and engaging.