Now that summer is finally here you probably can't even remember the series of grotty colds that did the rounds during the winter months. This is exactly as it should be. But, let's take a moment to admire the minor miracle that happens every time our body deals with one of these illnesses.
Remember, when you catch a cold, you're suffering a virus you've never encountered before, and yet you're able to recover from it, and better still, come back stronger.
How is this possible? The answer is resilience. Our body is an example of a resilient system.
This term, resilience, in relation to systems, comes from the field of ecology. Resilience is a measure of how readily a system can persist in a changing environment. And, if we think about it, this is what we want from our cyber systems in the face of an adaptive threat.
The ability to cope with unforeseen incidents, both malicious and accidental, is a design challenge we currently face. The common 'castle' approach to securing our systems (building 'bigger walls') is becoming less effective, as the threat adapts in a fast paced cyber environment. Social engineering, for example, completely bypasses our defensive walls.
The problem is that a protective approach to security, like the castle, is static - it either works or it doesn't. If we want our cyber systems to effectively defend against an adaptive threat, we need to design for persistence and the ability to evolve through periods of change.
What we want is for our systems to be more like our body. We want them to cope through the bad times, adapting to be stronger and wiser. We want them to become more resilient.
What makes something resilient?
What can cyber security learn from naturally occurring resilient systems, like those explained by Rafe Sagarin?  This is what the StSG's cyber resilience research is exploring. Let's look at the example of how our body fights a cold.
Our skin is our body's first line of defence. Limiting the number of viruses that enter our body, it helps reduce the number of times we fall ill.
If a cold virus successfully bypasses this initial barrier and gets into our body, the immune system will detect the damage it causes. This feedback leads to the directing of specialised cells to the area of damage to begin the recovery process. This quick detection and response limits the impact of the infection.
During the recovery process, critical body parts retain their functionality (our heart keeps beating), but other parts of the system may have reduced functionality, resulting in the various cold symptoms we are so familiar with.
As the virus is destroyed, our immune system learns from the incident (in the form of antibodies), adapting so that we shouldn’t fall ill from this strain of cold virus if we encounter it again.
The upshot of all this is: Although our body cannot protect from viral infections all of the time, it has appropriate measures to deal with an illness once it strikes.
Resilient IT systems
So, let's transfer these lessons to cyber security. The characteristics that make up a resilient system can be broken down into 4 phases: prepare, absorb, recover and adapt. We've taken this approach from a technological aspect for a long time, and the NIST framework also adopts a similar approach. A resilient systems design builds on this, whilst considering the people and processes involved.
Prepare: Preventative security is still important. Defences that protect from known attacks (e.g. firewalls, policies) will help improve your resilience. However, it's not enough to rely solely on prevention.
You must accept that incidents will occur (we all get colds!). It's how you react to them that will determine how much damage they cause. Incident response plans need to be developed, identifying the key parts of your system that need to remain viable throughout.
To produce an effective incident response plan, you must know how your system is used in practice. A response plan that tells staff to do things in ways which differ from their habitual methods will add unnecessary confusion to an already stressful event. Increased communication with your front line staff will also help bring to light any problems that you're unaware of with the day to day use of your systems.
Absorb: Defence in depth and diversity of technology are two methods that can be used to reduce the risk of an incident escalating into a catastrophic failure of your system. Segregating your system so you can reduce certain features, whilst still retaining critical functions, aids survivability.
Early detection of an incident limits the damage caused. Methods to help this aren't driven by technolgy alone, they're people driven too. In fact, people can be your strongest defence. Creating a positive security culture helps increase early detection as it encourages staff to report anything suspicious without fear of blame.
Recover: No two incidents will play out the same, so we cannot expect our incident response plans to be perfect. Communication is therefore crucial during an incident. This includes to staff, stakeholders and to your customers.
Accepting that an incident is occurring doesn't always come naturally, it's all too easy to bury your head in the sand until it's too late ("I'm going to bed, I feel awful" is a result of your body telling you to take time out and recover). Decision makers are people too, and in the event of a crisis they may not react as expected. People can become fixated on one problem, whilst remaining oblivious to another one emerging, therefore ensure that everyone feels confident enough to speak up if the correct decisions are not being made.
Adapt: To be truly resilient, our systems must be able to adapt in the face of a changing environment. This is what differentiates resilience from robustness. The next step in resilient system design would be to create systems that respond to environmental changes autonomously. Ideas such as changing the layout of a system in response to the detection of an attacker, or autonomous detection and patching of vulnerabilities, would all constitute adaptive systems.
Whilst acknowledging that this will be extremely difficult, the 2016 DARPA cyber grand challenge  gives an insight into the potential of adaptive systems. Designing systems that are flexible enough that they can adapt is difficult - disruption and increased attack surface are some of the concerns that need to be overcome. Yet these continuous incremental adaptations will increase our resilience.
Importantly, we also need to adapt in times when there is not an incident. As Nicola B recently wrote, it is tempting to say, "We haven't had any incidents yet, so we must be doing something right." Yet complacency will ultimately result in failure. Our bodies constantly refresh our antibody reserves to avoid this. We must listen to the feedback we receive and act on it, these adaptations transform our system to a more secure state.
A manifesto for cyber resilience by design
In the face of an increasingly adaptive threat, we must accept that our systems will be compromised. To combat this, we need to switch our view from purely protective security to more flexible systems that concentrate on dealing with incidents to reduce the harm caused.
Our response to incidents has a direct impact on how much damage is caused, and whilst technology is important in this, we can’t afford to underestimate the role people play in our systems. People are remarkably resilient, their ability to think on their feet and adapt are skills that we should harness to improve our IT systems resilience.
We are researching methods that can be applied to help improve cyber resilience. If you have any feedback, would like to know more, or wish to share any experiences you have had implementing a resilient approach to security, then please get in touch in the comments section below.
Sociotechnical Security Researcher