Dear All,
What happened this time with CrowdStrike defines a risk that will increase exponentially with time. The IT industry is trending for a deep digitalisation: cloud services, document management software, IoT monitoring, automatization, Artificial Intelligence engines, etc.
At DWORKIN®, we have been preparing for such incidents and have been proactively working with several of our customers to mitigate the potential issues and create a Business Continuity Plan.
As soon as the news hit, a companywide alert was sent out and we immediately initiated an internal dialogue to assess the situation. Our approach was to address the issue through both proactive and reactive measures, making direct contact with customers to understand the impact and scope of the problem. Senior engineers were placed on standby researching all recommended solutions to get businesses back up and running. The recommended fix, however, required a manual intervention on each device.
Additionally, customers affected by the issue reached out to us directly. We promptly dispatched engineers to provide support and resolve the problem. Behind the scenes, our engineering team worked diligently to gather more details and develop a comprehensive understanding of the solution.
The issues our customers faced were varied. Here are a couple of examples:
- In one case, all the critical business servers were offline. Our engineers on site applied the correct solution, working all through Friday until all services were operational.
- Another customer’s end user computers were affected in their offices across 7 countries globally. Our team mobilised and all users were up and running, back to working normally by Monday.
Two main technical solutions were provided:
1) The faulty CrowdStrike driver had to be removed from the system, so that it did not load during the boot up phase. This was achieved by either:
- a) booting the device into safe mode, logging in as an administrator, removing one faulty file, booting the device back to normal mode and updating the CrowdStrike to newer/fixed versions. For this, a local admin account was required.
- b) attaching the system drive of the device to another running system (bootable USB drive, Virtual Machine manager), deleting the faulty driver and continuing as in the step above – for this you need to have access to the drive and the drive needs to be unencrypted or a BitLocker key must be utilised.
- c) If the BitLocker key is not known or available, the only method of recovery is to carry out a system restore from the latest backup. This indeed forces all customers to ensure that a routine backup procedure and recovery test is in place.
2) Restarting the device manually (even physically power off and on again) multiple times – Windows has some fallback procedures, that can detect and temporarily disable faulty drivers. Yet the process is not the same on all devices, some devices had to be restarted more than 15 times before the faulty driver was detected and suspended. Then CrowdStrike could be updated.
DWORKIN was able to react and provide solutions in time. Customer satisfaction is intrinsic to the DWORKIN DNA, so it was a priority to provide on-call support and dispatch engineers with the right knowledge. Customers were actively updated on the actions being taken.
Even in unpredictable situations like this, the DWORKIN team can respond and be present alongside its clients. And when we receive comments like this, we know we have done our jobs well.
“Many thanks for your support during last days since CrowdStrike issue! I understood today from local departments leaders that your attitude, effort, and professionalism during the last days is much appreciated, so I am glad to share with you...”
Thank you,
Your DWORKIN team.