In the second blog about the NCSC's IT system we focus on our high-level design, including the single sign-on architecture, our initial end user device choices, and how we tackled the captive portal problem.
If you haven't seen the first blog on our IT system yet, we recommend you start there.
Some big cloud decisions
You might recall from the design principles we produced for this project, that we wanted to make as much use of SaaS as possible. Well, there were some areas where we felt we could do so comfortably, and others where we couldn't (yet).
The area we felt most comfortable using SaaS was office productivity. This comes down to the fact that there are products available which we understand the security properties of in some depth. And, thanks to this understanding, we were able to clearly articulate the risks we believed we were taking to our senior leadership team. They in turn could make an informed judgement. Whilst security was a factor here - it had to be good enough - there were, of course, other factors in our choice (features, cost, user experience etc...).
Cutting to the chase
I know that if I don't tell you the outcome of this decision making process, we'll be bombarded with comments, so I'll cut to the chase - we chose Office 365. This isn't an endorsement, or an assessment from the NCSC that Office 365 is better - from a security perspective - than G Suite (the other major SaaS office productivity platform for which we’ve published guidance). It's the choice we made in our context, and we'd encourage you to make your own choices too. We're also aware that Office 365 can be configured well, or poorly, so we've worked hard to configure the service to our liking and following our own guidance.
There are some other SaaS services we've selected too. But the data we planned to use with most of them was much less sensitive than we planned to handle in Office 365. So, whilst we still carried out some due diligence in understanding some of the security properties, we were less constrained when looking at options. This included tools for project planning, and collaborating on the content we plan to publish on our website (such as this blog).
However, there were a few areas where we didn't yet have the knowledge or confidence we would have wanted to rely on SaaS. These areas were focused around the underpinning security infrastructure we needed - notably device management, user identity and our trust infrastructure. In these cases we chose to use IaaS services where we could rely on a strong security boundary - the hypervisor. Here we didn't pick just one cloud provider, we chose to build across two different IaaS offerings.
Our high-level architecture
Conceptually you could think of our architecture as a set of SaaS services along with the core infrastructure we need to ensure that only devices and users we trust can connect to our resources within them.
We get our confidence in the devices by having a strong degree of control over their provision, configuration and maintenance. And we get our confidence in users through having strong user-device authentication. We then issue single sign-on tokens to our users with which they can access the services they need.
Of course, there are some exceptions. A large percentage of our users are technical, and have some niche analysis, research or development environments that they need to reach, so we also provide the necessary plumbing to allow them to connect to those.
Our 'core' infrastructure in more detail
Our core infrastructure is mostly behind the scenes. Our users will rarely need to care about it as it's largely transparent to them. The core contains the minimum necessary to glue everything together, and to provide the management and security functions we need. It does the following:
- provides a directory of our users, devices and other infrastructure
- authenticates users and devices and provides single sign-on to allow users to reach the cloud services we use
- provides filtered and protected internet access for our users
- maintains secure configuration of our devices
- manages deployment of software to devices
- provides the means for specialists to connect to their niche systems
As far as our users are concerned, the basic steps they take to access their cloud-based email or productivity tools are quite simple:
- They authenticate to their device to boot or unlock it.
- Make sure they're connected to Wi-Fi or 4G network at work, home or elsewhere.
- They open their email application or use the browser to connect to the services they want to use.
There are a few things going on in the background to make the user experience this simple. Firstly, booting or unlocking the user's device allows a hardware-protected certificate to be accessed - this certificate is used to bring up an IPsec VPN through to our core infrastructure. Using hardware-protected certificate storage which can only be accessed by an authenticated user means that we now implicitly know that we are dealing with an NCSC user, on a device that we provisioned.
When the user now browses to the SaaS service they wish to use, we've configured the service to require a valid SAML or OAUTH token that could only have been issued by our infrastructure, only accessible to devices which have connected over our VPN. The user's browser automatically collects a token from the single sign-on service and the user can now access the SaaS service they were after.
End user devices
We always intended to offer users a choice of devices. However, to help get the service up and running quickly, we needed to limit the choice a little. As their main device, users were offered either a laptop or tablet running Windows 10, and those with a need were also offered a smartphone (running Apple iOS). We plan to expand our offerings a little to include other major platforms in due course.
We were careful in our initial choice of devices to ensure the hardware-based protections we wanted were available and easy to configure.
Both our Windows and iOS builds are in line with our own guidance. However, there are a couple of important choices we made:
- Users do not have the ability to modify the settings on their devices which enforce the security controls we think necessary.
- All devices are using the native IKEv2 VPN to connect to VPN gateways within our core infrastructure. All user traffic is forced via the VPN making it subject to our corporate protections and monitoring.
Initially, we considered whether we could use a single mobile device management (MDM) tool for maintaining the configuration of both Windows and Apple iOS devices. However, we concluded that - for now - we need to use separate tools to manage each device platform. This choice ensures we can configure both platforms exactly as we want them. Like many of the decisions documented here, we'll revisit this again in future, particularly as the management tools evolve. We'll cover MDM in more detail in a future blog.
Dealing with Wi-Fi captive portals
Requiring a VPN to be connected all of the time means that users need a way of authenticating to Wi-Fi networks when staying in hotels or visiting companies we're working with. Often these networks have a 'captive portal' that the user needs to access via a browser before being granted access.
Allowing the user to disable their VPN to authenticate to a captive portal would probably mean our users would often not reconnect the VPN before browsing the web, leaving them outside of the protections we can offer via our infrastructure and therefore, vulnerable to attack. So we needed a different approach.
To tackle this problem on our Windows devices, we've created a metro app - the only app on the device which is allowed to communicate outside of the VPN. Using this app, the user get a highly sandboxed place through which they can talk to the captive portal. As soon as they're online, the app closes and their VPN comes up automatically. Once we're finished ironing out the last few bugs, we plan to open source the app on our GitHub page so others who want to use it can do so.
The final important architectural point I'll mention is that not all of our infrastructure is equal. There are some parts of our system which need to be better protected than others. These include certain functions in our core infrastructure, like our Certificate Authorities and Active Directory, as well as the devices and accounts we use to manage those.
We've built a tiered administration model, with components deployed into the appropriate tier, based on the protections they need. We tend to have different administrators accessing infrastructure in the different tiers, and when a single administrator does have access to both tiers, they have separate accounts for each.
We also apply these tiered protections to the devices used by administrators. The design we're using here is very similar to Microsoft's Privileged Access Workstation model, which we will go into more detail on, in a future blog.
You'll be glad to hear that there's more to come in this series. We're currently working on several posts that will go into detail on some of the most interesting and challenging areas of this project. As usual, if you have any questions or comments, we'd be glad to hear from you.