jump to navigation

Robust Processes October 17, 2011

Posted by PythonGuy in Programming.
add a comment

I often over-engineer my processes. I adhere, roughly, to the idea, “Be liberal in what you receive, conservative in what you send.”

I ran into an issue today that demonstrated why this is a good philosophy.

In the abstract view, I had a system with several individual component processes. Each process depended on the output of the previous process, through a roundabout way. (One accepted internet connections, another accepted log messages in our log format, another analyzed those log messages, and another summarized the analyses.)

We ran into a problem. A less-than-honest user of our system fed some very bad data in through the internet connection. That process didn’t particularly care what the input looked like, and fed it on through the rest of the system. It wasn’t until much later than an exotic problem arose that shut the entire system down.

As we meddled with the backend process that barfed on the bad data, we decided that that kind of data wasn’t valuable in the first place, so we might as well shut it down from the beginning.

If we had only implemented that fix, then our system would still be broken. See, we still had bad data in our system that had to be processed. So the dual fix of (a) eliminated the acceptance of obviously bad data, and (b) building a system more tolerant of bad data were needed.

I am often amazed by the engineers in the physical world. These people build machines that have the same issues with bad input, except their bad input includes “dust in the air” and “metal filings that are created by wear and tear.” They have to build even the most sensitive components of their system to be tolerant of this kind of bad data, even though they install filters and covers and such to eliminate as much of it as possible. If they didn’t do so, then the entire machine would come screeching to a halt the first time something bad happened.

We need to build our software the same way. No matter how internal a component is, it needs to be ready to accept bad data.