"He who defends everything defends nothing"
Sun Tzu, The Art of War.
Given the recent and not so recent events (the I-35W Highway bridge collapse in Minneapolis, Hurricane Katrina, and the 9/11 attacks) the potential for catastrophic failures / disasters should no longer be foreign to the thought processes of anyone in the leadership or decision making position. Many organizations and institutions are actively developing disaster recovery or business continuity plans. It is well understood that no matter how hard we try, no system can be fully protected from all attacks or without any vulnerable component. The goal should be to build a survivable system. While it is impossible to prepare for all possible disaster scenarios ("He who defends everything defends nothing"), it is critical that leaders and decision makers consider system survivability as the foundation of any disaster planning strategy.
What is a survivable system?
A survivable system is a system that will continue to operate and meet its mission objectives (ie, essential services), in a timely manner, even when the functions of its components have been compromised. The determination of what constituents essential services is usually based on an organization's policies and the experience of the decision makers. For example, an automobile with sound structural integrity is drivable after a crash. A business organization with an established business continuity policy will continue to provide essential services even after the integrity of its information systems has been compromised due to a denial of service attack. A survivable financial system must be able to provide secure, confidential, reliable, and timely services in the event of any failures in its communication components. The primary structural elements of a bridge (eg, the skeleton) should hold even when other components fail.
Policies and acceptable system performance tradeoffs typically drive a system survivability goal. A recent article titled Recovering from the Unthinkable (Heather B. Hayes, Washington Technology, June 25, 2007) reports on the success of some organizations including SI-International Inc., Northrop Grumman, and the National Institute of Science and Technology (NIST) who have developed and applied several robust models for dealing with disaster recovery. These successful models include system replication, storing backup systems offsite, and having backup employees.
Self-managing Properties of Surviving Systems
Survivalability services are those services used to detect, predict, prevent failures, and support recovery from system failures. Design of physical structures such as bridges, buildings, and roadways demands the inclusion of the self-managing properties in the fabric of each structure to ensure survival. To survive, a system needs to possess four self-managing properties. These include (i) self-configuring – the ability to automatically adapt to changes in the environment; (ii) self-healing – the ability to detect, diagnose, and react to disruptions; (iii) self-optimizing – the ability to automatically optimize resource usage to meet user needs; and (iv) self-protecting – the ability to anticipate / predict, detect, identify, and protect the system from disruptions.
Every system must provide an optimum service level during normal operations. The same system must also be able to provide essential services when its components fail due to malicious attacks or major disasters. Therefore, decision makers need to integrate the four self-managing properties into the design and usage / operations of a product (eg, a bridge, a building or an aircraft), a process, or a business model.