Blog post

How the NCSC built its own IT system

Created:  13 Feb 2017
Updated:  13 Feb 2017
Author:  Richard C
NCSC IT: Agile infrastructure

“When are you going to share the design of the NCSC IT system?” We’ve heard this question a lot over the last couple of months, from folks in both public and private sector organisations. It's a good question too, because it turns out that building modern IT systems which are both secure and enabling for their users, is hard. But, as we shall see, not impossible.

This blog post is the first in a series describing the approach we've taken at the NCSC, and some of the choices we made. Hopefully, these posts will be useful to anyone grappling with the problem of creating modern, flexible and secure IT infrastructure. 

The NCSC context

Let’s start with a little background. The NCSC formed from several different organisations, including CERT-UK, CESG, the Centre for Cyber Assessments and part of CPNI.

None of the existing IT systems designed for working with OFFICIAL information met the needs of the new organisation. Nor did they strike the right balance of security, usability, and functionality required by our new mission. So we had to build something new.

The needs of our users

Our essential user needs were pretty typical. A highly resilient system, suitable for working with all of our OFFICIAL data. We had to support mobile and multi-site working, provide best of breed services to our users, and be an exemplar for how you build OFFICIAL IT using modern technology.

And – most importantly – the system had to be a pleasure for people to use. We were serious about that being the most important characteristic. A highly secure solution that no-one uses isn’t secure at all.

Forming the team

To build a service and roll it out to several hundred people, timescales were tight. Previous experience taught us that to successfully deliver an IT system at pace we would need a multi-disciplinary team which was empowered to make decisions.

To move at the pace required, we needed a multi-disciplinary team, empowered to make the necessary decisions. Our IT and security teams, as well as our suppliers, pulled out all the stops. With their help we put together a crack group of infrastructure, cloud, networking, security, procurement (and more!) experts. Common Technology Services in GDS also kindly contributed one of their technical consultants to spend time with us sharing their experiences of what had worked (or failed) elsewhere.

We realised early on that one thing was going to be crucial: Our project manager needed be an agile delivery veteran. A traditional waterfall approach was never going to bring this job in on time.

Applying agile techniques to an infrastructure project

Our project manager came from an agile software delivery background. He quickly pointed out that there are aspects of an infrastructure project where agile techniques would be difficult to apply. Agile delivery of infrastructure projects, we were beginning to discover, is tricky. 

Notably, the procurement of commercial services and equipment is a poor fit for agile processes. While it’s possible to iterate the code which defines the configuration of our service, frequently changing our minds about the hardware we use just isn't practical. There are some choices it pays to get right first time.

However, I’m pleased to say that in most other ways, we stayed true to agile delivery.

Sensible about risk

It’s entirely possible to build good, secure tech using an agile approach. What’s different is that you evolve the system over time, taking risks in sensible ways while you build new functionality or security into the system. So, on day one, we were running a relatively high risk in some areas while we were comfortable with the controls we had in place elsewhere.

When our initial roll out started we sometimes took well-informed decisions to accept calculated risks, knowing that we would implement additional controls as deployment numbers increased. Each sprint added new features our users wanted, but also brought better security to the system.

Our friends in the internal security team have been great working with us in this new way. The system will never be ‘accredited’ in the traditional sense of a point-in-time decision, because it will never be ‘done’. The risks we take change on a sprint by sprint basis. We’ll continue to take sensible decisions, security being considered as an important factor, along with various other demands of the project, like usability and cost.

Design principles for the project

Before starting work on an initial design we needed the team to agree on some basic principles which would guide our decision making. We brainstormed these with the technical teams, based on lessons learned from our collective experiences, and what we’d seen work well in other government departments.

The principles we developed fell into three categories: user experience, technology, and security.

1. User-experience principles:

  • Users will have a choice of devices - This way users can pick the device which best suits the needs of their job and working preferences
  • We will keep device builds as ‘vanilla’ as possible - Recognising that customisation will make them harder to maintain
  • We prefer web apps to thick client apps - Our default app is the browser
  • Do the hard work to make things simple - As far as our users are concerned, things should just work (credit: GDS)
  • User experience should be fantastic - Security should be good enough (One of our risk management mantras)

 

2. Technology principles:

  • We will follow the government Technology Code of Practice  - Following government good practice means we're avoiding the past mistakes of others 
  • Our architecture will be well integrated but loosely coupled - This allows us to change components and services easily
  • We'll describe our infrastructure in code and we’ll automate deployment - This means it should be trivial to create a reference environment, or rebuild parts of the system when we need to
  • Our cloud-first approach will be to use SaaS where we can, followed by PaaS and then IaaS - More on this in a future blog
  • On-premise infrastructure will be the bare minimum necessary - Printers, build networks etc.
  • We don't need ‘gold level’ support or availability for everything - Only for some aspects, such as our email and communications tools

 

3. Security principles:

  • We will follow our own advice on securing enterprise technology - We’d look silly if we didn’t!
  • We will use the native IPsec VPN clients on end user devices - They’ll be configured to use PRIME 
  • We will patch aggressively and automatically - This applies to servers as well
  • We will avoid creating complex trust relationships with other IT systems - This will help us maintain autonomy over our own risk decisions, without affecting users from other departments we collaborate with

 

With these principles in place, we began designing a system and choosing cloud services to meet an initial set of user needs.

We’ll share our high level design in our next blog, but if you have any comments on these initial thoughts, we’d be glad to hear from you, below.

8 comments

Jonathan - 14 Feb 2017
For patching aggressively how does this work for mobile devices where it is controlled by Apple and for Android devices by the various manufacturers? Is it safe to allow data to be on Android devices when the manufacturer patching policies are inconsistent? Google provides new versions of Android and security patches but manufacturers seem to be slow in implementing them and abandon older models too quickly.
Richard C - 14 Feb 2017
You're right to spot that different manufacturers provide support (and importantly, security updates) for varying lengths of time. This was a big factor in our initial choice of mobile platform - we want to be sure that when security bugs are found in the technology we are using, that the manufacturer will swiftly roll out a fix. We're not offering our users an Android option at the moment, but when we do we'll be sure to choose a flagship device we think will have longevity and be patched regularly and quickly. Many manufacturers have also committed to patching security updates from Google within 30 days, but even though some manufacturers don't appear to publish their support lifetimes it's possible to look at historical data as the basis for making a decision. Our iPhones have been updated with patches from Apple several times since the network went live, and we can monitor the updating process from our MDM server. There will be more on end user devices and MDMs in a future blog.
Simon - 14 Feb 2017
Hi, I wondered if you could expand on this: "We will avoid creating complex trust relationships with other IT systems - This will help us maintain autonomy over our own risk decisions, " What scenarios are you avoiding and what do you determine as complex? Thanks Simon
Richard C - 24 Feb 2017
An example of a trust relationship we'd like to avoid where possible is an Active Directory Forest Trust - even with our closest partners. Instead we want to have simple trust relationships when we collaborate with our partners. For example, we might choose to allow users from another organisation to access some NCSC resources in a cloud service by presenting an OAUTH token issued by an organisation we trust. We think this helps us keep our architecture simple and isolates our core infrastructure (which you can read about in the latest blog post in this series) from that of our partners. This way, a compromise of our partner's core infrastructure wouldn't mean a compromise of our core infrastructure (although it could allow access to the resources we've shared) and vice versa.
Phil Dockerill - 14 Feb 2017
Thank you Richard C, this is an interesting blog post. "Our default app is the browser" Do you allow the users to choose the browser or do you use the Default OS browser e.g. Safari on a Mac? I am curious from a security and a patching point of view. Thank you
Richard C - 24 Feb 2017
So far we've rolled out Windows 10 and Apple iOS devices. On both of these platforms we've provided our users the native browser app and Google Chrome. They're free to use whichever they prefer. We'd be happy to consider making other browsers, such as Firefox, available through our application stores if our users ask for them.
Paul Owen - 15 Feb 2017
"We will patch aggressively and automatically - This applies to servers as well" Oooh, I like that. Aside from the once-in-ten-years BSOD risk acceptance that this principle implies, I'm interested how this sits with your devops people. For example, I imagine the following would be pre-requisites: 1. A well behaved and sane risk management function that understands the security opportunity cost of patching this way 2. Automated nightly unit and integration tests of patches ahead of promotion into the next environment (which in turn implies) 3. Well separated, completely representative and authentic dev, test and pre-prod environments 4. Developing against strictly applied loose coupling principles between services and servers (very little custom configuration allowed, lest it gets broken during the automated patch cycle) 5. Attended upgrades (the ones that come with choices) queued and flagged for a human automatically every day and managed by a dedicated devops bod. Did I miss anything? Cheers, Paul
Alex Davies - 16 Feb 2017
You'll find most manufacturers (if they are decent) patch their apps relatively quicker than the OS making the apps somewhat coded around the shortcomings of the unpatched OS with some quite bespoke workarounds (some coding techniques could be questioned), some BYOD packages support remote deployment tooling and versioning patches amongst the policy controls for their devices.

Leave a comment

Was this blog post helpful?

We need your feedback to improve this content.

Yes No