Bypassing safety check for an obviously safe change
This is less concrete technical than my usual blog post.
For every 100 changes we’re 99% sure won’t cause an outage, one will
It’s actually hard to be 99% sure of anything. I’m not 99% sure today’s Thursday. I say that because more often than one day in a hundred, I’ll think “hmm… feels like Wednesday” when it’s not.
I just closed my eyes and tried to remember what time it is. I don’t think I can guess with 99% accuracy what hour I’m in. (but to be fair, it’s de-facto Friday afternoon today, as I’m off tomorrow).
Anyway… the reason I say this is that this should be kept in mind every time someone comes and says they want to circumvent some process for a change that they are absolutely sure won’t cause an outage, that can actually be put into numbers. And those numbers are “you are not 100% sure of anything”.
By saying you are 99% sure this won’t cause an outage (and are you right about that?) you are saying that for every 100 requests like yours that will bypass normal checks, there will be an outage. You are taking on an amortized 1% of the cost of an outage for your change by bypassing the safety barriers.
And now I realize where my thinking of this comes from. It’s from Eliezer Yudkowsky on Infinite Certainty.
Or is it? I can’t be 100% sure…