Blue screens are only the beginning
As much of the world recovers from last Friday's CrowdStrike / Azure black swan event, the Ministry of Testing folks thought we'd offer up a few light-hearted accounts of prized devices (and underpowered work laptops) in their death throes or just playing dead.
Note: Some of you may never have seen a Windows "blue screen of death," which we'll sometimes call BSoD! Read on for more details.
Some of these system failures are classic Windows blue-screen "Abandon All Hope Ye Who Enter Here" incidents that have dismayed Windows users since back before a few of you were born. And some, well, have more to do with heedless end users than anything else (one of the authors of this article raises her hand).
By the way, Linux users who are chuckling smugly to yourselves: you are NOT immune from a black swan event of your own. If you haven't heard of the xz exploit-via-social-engineering saga, now's your chance to read up. Take one overworked, unpaid engineer maintaining critical code, add at least one shadowy committer / savvy, talented developer who was probably trained in Le Carré-level spycraft, and what follows could have been the result.
From the linked article:
"Their grand scheme was:
- sneakily backdoor the release tarballs, but not the source code
- use sockpuppet accounts to convince the various Linux distributions to pull the latest version and package it
- once those distributions shipped it, they could take over any downstream user/company system/etc"
If that prospect is too scary to contemplate right now, read on for lighter tales of silicon gone wrong from Ady Stokes and Rabi'a Brown.
Birth of the BSoD!
According to Wikipedia the first BSoD or Blue Screen of Death occurred in Windows NT 3.1, the first version of Windows NT, in 1993. The screen was designed to inform, not to be a ‘crash’ screen. It would come up if there was a DOS (Disk Operating System) error, and it was supposed to display an error message.
However, an inherent bug caused random characters to be shown instead:
Probably the most famous and public BSoD appeared as Bill Gates was presenting Windows 98 to a live television audience. It was an embarrassing but funny moment for him and the company, including a “Whoooooa” response when it appeared, and you can watch it on YouTube.
Are red and purple the new blue?
While many people who work in technology are familiar with blue screens of death, in researching this issue I found there are also red and purple versions. In fact, there are a number of different screens of death. Let’s look at a few.
Red screen of death
Seen on Windows Vista, whose original project name was "Longhorn," this was seen when a reset failed. The term "red screen of death" has also been used to refer to PlayStation error messages.
Purple screen of death
This purple diagnostic screen occurs when VMWare's VMKernel catches a machine check exception and the system “cannot continue.”
Black screen of death
From Windows 3 onwards, if the operating system failed to boot up, you'd see a black screen. Boot failure could be caused by a number of issues, only some of which actually generated an error message. This often left the user stranded with a black screen and a flashing cursor.
An example of an error message display with instructions for what to do next.
No operating system is perfect
Early Apple Macintosh systems used images of "happy" and "sad" Macs. While the operating system was loading, a smiling Mac OS logo appeared.
But if something went wrong, the sad Mac would be shown. Simpler times indeed.
Amazon Web Services (AWS) has failed multiple times!
Because of the way software is so interdependent on other services these days, if one part goes down, it can bring many others with it. AWS is no exception.
In 2017 a typing mistake when patching a billing problem brought down the cloud and all the services using it. From Apple to storage and payments, several hours were lost before the problem was addressed.
December 2021 was not the best month for AWS: there were THREE major outages, affecting many services. From device problems to power cuts, it's estimated that millions of users were affected, calling into question the risk of having so many companies rely on so few providers.
There’s humour in failure
What could be more human than reacting to last week's outage, the reported largest IT outage in history, by making jokes about it? And we made all kinds of jokes, from the simple, ‘it’s late but the Y2K (year 2000 problem) is finally here,’ to people saying they thought their first day at Microsoft or CrowdStrike went well. There was so much comic invention, it seems only fair to highlight some.
Small changes have catastrophic effects
X user @itsfoss2 shared a modified video of someone removing a blue smartie from an art installation and it all falling down. The person is a CrowdStrike intern and what is falling down is the BSoD.
https://x.com/itsfoss2/status/1814314761254838419
Have a break
Incredibly speedy marketing post from KitKat
You can’t stop some people
X user Leo Skelly said, “Meh. An IT outage ain’t gonna stop me from working” and showed how he could still write binary code.
If it's not a BSoD, then what is it?
There are times when what is happening doesn’t fit into your range of knowledge. Turning on my laptop recently to see gibberish on one of my three monitors fit that category for me.
Being logical, the first thing I did was check that the cables were all inserted as they should be. They were. Next, I checked my display settings, trying different configurations for my three screens. Two were fine, the other definitely was not.
Next came a restart, because who doesn’t love turning devices off and on again! No luck. A screen resolution change was my next attempt, also with no luck. Next up came investigating the graphics driver settings and any other settings I could think of. More frustration.
Finally, I took the monitor out of the configuration completely, removed the cable, and spent the rest of the morning on two screens. After dinner (don’t @ me, dinner is in the middle of the day because I grew up with dinner ladies!) I set up the monitor anew. Bingo, it worked great. For about a week.
I finally resorted to trying new equipment, and the least expensive solution was new cables. Fortunately, that finally put the problem to rest.
I'm not sure what the moral of the story is: persistence, keep trying. Or, maybe it is just that software and hardware can and always will be weird.
Core meltdown, or why the word "laptop" shouldn't be taken literally
A few years ago, I "inherited" a lovely, more than adequately powered Dell Latitude from a former employer who didn't seem to care if I returned it. I wiped it with Darik's Boot & Nuke and installed debian with an encrypted hard disk. (More about those later.)
A bit later, I took it with me on my "year abroad" outside the US, where it traveled with me from country to country. I didn't have a proper desk of my own anymore, and rarely used coworking spaces or tables at cafés. Instead, I usually ensconced the laptop on my lap, which on cold days was amply covered with blankets.
Problem was: the vents were on the bottom of the laptop. You can see where this is going. After a few months of this heedless abuse, the Dell gave up the ghost unceremoniously. When it finally occurred to me to look at the bottom and back edge of the laptop, I saw a smaller-scale, less dramatic version of this:
This was what the core of one of the Three Mile Island nuclear reactors looked like after quite a while without cooling water. Had there been a full-on meltdown, we would have had our own Chernobyl in Pennsylvania.
I'm lucky I didn't electrocute myself or set myself on fire.
Look upon my works, ye mighty, and despair.
Grubby encryption lockouts: a thick brick wall you can scale
I've run Linux at home for nearly two decades, and have worked with many of the major FOSS distributions (distros) like Ubuntu, debian, Arch, and Manjaro. Issues come up sometimes, but a good web search or two generally yields a solution. It helps to work with a user-friendly, well-maintained distro like Manjaro, of course, especially if the user community is active and willing to help.
Nearly 10 years ago, I installed debian for the first time on the late lamented Dell (the one that melted). I also decided to encrypt my hard disk for the first time as well, using Linux Unified Key Setup (LUKS). I wasn't yet using 1Password, so I committed that encryption password to muscle memory. (I don't mess around anymore: my latest encryption password is in 1Password.)
Muscle memory is one thing, typing too fast is another. The GRUB bootloader emphatically does NOT echo your password by default (or perhaps at all) as you type it. So when you first encounter the cryptomount error below, you're likely to think, "my hard drive got fried." Not: “I mistyped my password.”
Enter passphrase for hd0.gpt1(uuid):
Attempting to decrypt master key…
error: access denied
error: disk 'cryptouuid/uuid' not found.
Entering rescue mode…
grub rescue >
The first few times I encountered this error, I simply rebooted. And magically, after a reboot (and a correctly entered password), the error disappeared. Hurrah! But it kept recurring. I started to suspect that a mistyped encryption password was to blame, but either I couldn't find any relevant information online or I simply didn't search.
Finally, YEARS later (I don't want to admit how many), I tested my hypothesis on a new device. Yep, an incorrect password was to blame.
A web search quickly revealed how to address the error: three commands in GRUB will do it.
grub rescue > cryptomount (hd0,gpt1) // you take the device names from the first line above. Note the comma.
Enter passphrase for hd0.gpt1(uuid)
Slot 0 opened // this means your disk can now be accessed
grub rescue > insmod normal
grub rescue > normal // your usual list of OS boot options should appear shortly
A green screen of… the uncanny
Manjaro is by far my favorite Linux distro for reasons I mentioned above: ease of use, ease of finding answers, and a good tradeoff between configurability and working "out-of-the-box." (Arch left me in tears frequently, Ubuntu eats too many resources, and debian isn't terribly well-maintained.)
This attractive minimalist desktop greets me upon a successful boot.
But the operating system and running software do work themselves into a state sometimes, and I don't yet know why. Sometimes when I boot up, usually after a recent shutdown, I get this alternate-universe desktop image:
This seriously scared me at first. I'm running Manjaro on a six-year-old Intel NUC that was in storage for about two years. It's mortal, just like I am.
And let's face it: that colour is just EERIE. Is someone trying to tell me something?
The first time it happened, I tried a simple logout, for lack of any other ideas. Logging back in, the alien green haze was gone. I still see green every so often, but the trick continues to work.
For now.
A sparkling screen of death
After the Dell tragedy, I was still without a permanent address, and I needed a new laptop for short money that would ship to a location outside the United States. I bought a budget Windows 10 laptop from one of the big companies that make such things.
"This time," I said to myself, "I won't be so careless. I'll keep the laptop's vents free of obstructions." I bought a Roost laptop stand, giving those vents plenty of breathing room, and ran the laptop in the stand 99 percent of the time.
I hadn't bought a Windows device for myself in several years, and much to my surprise, the user experience was good to great. I didn't run into much bloatware, and most programs ran reliably without slowing down the laptop or crashing. And I saw none of the blue screens of death I remembered so vividly from my pre-2006 Windows devices.
Two years passed, I found myself a permanent address at last, and I installed the laptop in its Roost on my desk. Now, I did keep it running most of the time, but I figured the sleep mechanism would take care of any overheating problem. I also left the laptop unplugged and switched on at times to drain the battery a bit.
One day this past spring, I logged in as usual. Right away I noticed a strange pattern of sparkling lights on the desktop image. And then… nothing. Blank screen. I tried plugging it back in. Nothing: deadwood. I tried leaving it plugged in for a while. Nothing.
I said to myself: dead battery. I did a web search on my options, all of which, given the policies of the laptop's manufacturer, involved me buying a brand new laptop. And that purchase was definitely not within my budget at the time.
Oddly enough, a week or so earlier, I had remembered the 2018-era Intel NUC I'd brought from my previous home. And I'd said to myself: how about trying out a new-to-you Linux distro? Within about two hours, Manjaro was chugging away nicely on the NUC, plugged into my living room TV as a display. Thank goodness. I wouldn't need to buy a new laptop after all.
I'm typing this on the Manjaro box right now. However, even the NUC seems to run "hot" even though it's adequately ventilated. So I've taken to shutting it down and leaving it that way unless I absolutely need to type a lot of material quickly. I've opened the tiny box to see if I can replace components, but I found out quickly I'd need doll-sized hands to do so.
So I'm enjoying Manjaro on my "old" NUC for now. Until the next… mysterious … incident, at least.
For more information
- 10 historical software bugs with extreme consequences, Solarwinds Pingdom blog
- Black Swan Events and Their Impact on Investments, Brian J. Bloch, Investopedia
- Backgrounder on the Three Mile Island Accident, United States Nuclear Regulatory Commission
- Blue screen of death Wikipedia
Learn More with Ministry of Testing
- Crowdstrike Mass Global IT Outage - AJ Wilson