In designing any system composed of multiple interconnected services, a key consideration is to ensure that the data that is sent between these services is trusted.
While industry best practices already exist defining how to secure connections between entities, there is significant additional complexity when attempting to scale these designs to secure thousands of agents that exist in a potentially hostile network environment, while keeping the operational burden to a minimum.
When designing Qush Reveal we had some key design criteria:
By using a hardened TLS stack for the agent connection we are able to ensure the transport is encrypted, trusted and safe from man in the middle attacks. For this post I’m going to focus on the other criteria, establishing, renewing and revoking trust, and the unique way we solve this in Reveal.
Reveal Agent enrollment
The Agent enrollment scheme is based around tokens which are granted by the server and allow agents to request a certificate. To initially enroll an agent we generate a single use token which is included in an enrollment bundle. When installing the agent we provide it the enrollment bundle, this bundle contains the token and some additional configuration data that the agent can use to bootstrap its connection to the server.
The agent generates a new CSR and sends it to the server with its enrollment token, the server validates the token and ensures it hasn’t been used before. If the token is valid the server generates a unique identifier for the agent (the Agent UUID) and can issue a certificate with this ID. In addition to the certificate it also issues a new enrollment token which can later be used by the agent when it needs to renew its certificate.
The enrollment token system also provides a simple way to extend the process of enrolling agents to create complex deployment scenarios. Arbitrary properties can be attached to tokens, such as a cluster identifier which can be used to attach policy to agents as they enroll.
Enrollment tokens are cryptographically verifiable by the server, so they cannot be forged by a malicious party without the private key held securely within the server.
The enrollment tokens themselves effectively provide an agent with access to the infrastructure, with a token an attacker could request a certificate and send data to the server; so it is important that these tokens are kept securely. In the case of a token accidentally being disclosed there are various protection mechanisms that help restrict the scope of any disclosure:
Due to the distributed architecture of Reveal, it is important for each component to be able to authenticate connections from agents in order to authorize them to perform certain actions (such as sending event data to the server). Conversely management and creation of certificates is better handled centrally such that there is a single isolated, secure and audited authority for the whole system. Unlike certificates, tokens cannot be used for authentication, they only grant the right to request a certificate. By decoupling these two responsibilities and using a token system to issue certificates we have enabled a secure and scalable system for enrolling agents.
This dive into the design and internals of our agent enrollment process has shown the process and thinking that goes into ensuring the security and integrity of Reveal and data it collects, while minimizing the administrative burden as the deployment grows from 10 agents to 10,000.