The Crowdstrike IT outage resulted in billions of systems running Windows to crash and enter a boot loop, leading to many airports, businesses, hospitals, and banks to freeze their services and proceedings. Crowdstrike is a cybersecurity company that specialises in using AI and algorithms to proactively detect malicious activity in systems and fend them off ahead-of-time
Summary
A driver update pushed for the Crowdstrike Falcon sensor software in Windows led to the bootloop crash
The driver was intended to run in kernel mode as a boot-start driver in order to monitor the entire system for proactively finding and suppressing cyber-threats. However, this means that the Falcon driver gained unfettered access to the entire OS from the firmware level. If anything went wrong with the driver, the entire system risks crashing
The Falcon sensor’s driver obtained the Windows Hardware Quality Labs (WHQL) digital certificate, which is given to the vendor after they have proven that their driver software has passed rigorous tests and validations in different configurations. The certificate allows the driver to sit and run in Windows’ kernel mode but on the condition that the driver’s code and purpose doesn’t change. If it does, a reapplication for a fresh certificate must be carried out, and the process isn’t instant. To work around this limitation, Crowdstrike started pushing dynamic definition files called channel files that the driver processed. This way, the driver’s code itself didn’t change. An example of a channel file here is C-00000291.sys
, which was actually one of the corrupted files that caused the crash
The problem started here - the definition files were, by definition, unsigned code. Arbitrary code execution could happen there, and if any error pops up in any of those files, the driver gets hit with the error and the entire system risks getting blue-screened. The driver wasn’t built to be resilient towards such errors. It should contain the damage within the calling function, and not crash the entire system