Technical Analysis of why Phala will not be affected by the Intel SGX chip vulnerabilities

This post was originally published here

Author: Dr. Shunfan(Shelven) Zhou, lead researcher of Phala Network, one of the authors of Phala whitepaper, has been in security research for 7 years. He is the lead author of An Ever-evolving Game: Evaluation of Real-world Attacks and Defenses in Ethereum Ecosystem, USENIX Security Symposium 2020 and other papers on program analysis.

Abstract

On Nov. 30th, security expert Andrew Miller pointed out that the vulnerabilities of Intel SGX will bring great security risks to projects such as Secret Network, which has aroused extensive discussions in the community. Intel SGX, as the most widely adopted implementation of TEE, is also used by Phala’s off-chain workers. While with a different system design that reduces the attack surface and mitigates the consequences, our dev team considers the impacts from such vulnerabilities on Phala controllable.

This article will explain to readers:

  • Why ÆPIC Leak and MMIO vulnerabilities can cause Secret Network vulnerability
  • The reasons why Phala uses Secure Enclave (TEE)
  • How do we ensure Phala won’t be compromised by such SGX vulnerabilities
  • Future security mechanisms

Summary of Secret Network Vulnerability

1. How does the vulnerability come?

  • Hardware with unpatched vulnerabilities (ÆPIC Leak and MMIO Vulnerabilities, announced by Intel on Aug. 9, 2022) is allowed to join the Secret Network to be its nodes. The Secret team freezes the registration after the whitehat reports this problem;
  • The same master decryption key in Secret Network is shared across all nodes.

Combining these two, the secrecy of the network totally depends on the least secure nodes in the network. Once any one of them is compromised, the secret key is leaked and so is user privacy.

2. What do the attackers achieve?

As I quote from https://sgx.fail: “These vulnerabilities could be used to extract the consensus seed, a master decryption key for the private transactions on the Secret Network. Exposure of the consensus seed would enable the complete retroactive disclosure of all Secret-4 private transactions since the chain began.”

3. Is Phala Network affected by the same vulnerabilities?

No. Phala adopts access control on node (called ‘worker’ in Phala) registration and key hierarchy management, which I will explain later.

Resources:

Design of Phala Trustless Cloud

Why Phala needs Secure Enclave (TEE)

Phala is a permission-less compute cloud which allows any computers to join as workers, so our threat model is that any worker is not trusted by default, they may try to:

  • peek at users’ data;
  • provide false execution results, or do no computation at all;
  • provide low-quality services like reducing CPU performance or blocking network access.

Among these, the Quality-of-Service (the third problem) is incentivized by our Supply-end Tokenomic. And we rely on features of Secure Enclave (a.k.a. Trusted Execution Environment, like Intel SGX) and our key management mechanism to ensure the trustlessness of the whole system.

Secure Enclave provides important hardware-based security promises, including

  • Confidentiality: all the memory values are encrypted;
  • Execution integrity: no one can corrupt the correctness of execution even if he controls the operating system and the physical computer;
  • Remote Attestation: users can remotely verify the hardware and the software running inside the Secure Enclave.

To learn more details about SGX, you can read this article.

These features serve as the trust base for us to “borrow” computer power from people. It is worth noting that as a compute cloud, the core values of Phala is the correct execution of users’ program, and then the privacy of user data. This is different from other projects that solely focus on confidentiality.

Can Phala use Zero-Knowledge Proof, Multi-Party Computation, or Fully Homomorphic Encryption as its Workers?

The answer is no, yes, and yes since these solutions work in different ways.

  • In ZKP case, the user does his own execution and only provides the proof on chain to prove that he really has done that. This is not the cloud computing case where you delegate your computation to others;
  • MPC divides a job into different parts, so any one of the executors cannot know about the original input or the final output;
  • FHE enables executors to directly do the computation on cipher text, so they cannot know about the users’ data.

Unfortunately, the current MPC and FHE solutions all have limitations on the computation they can carry on and their performance, so hardware-based solutions remain the most practical choice. We are exploring the possibilities of supporting TEE solutions from other manufacturers like AMD and ARM. While with the proper abstraction of the interfaces, Phala can use MPC- and FHE-based workers when they are ready.

Access Control on Worker Registration

To join Phala as a worker needs two prerequisites:

  • Hardware with Secure Enclave support. Now we only support Intel SGX, but our investigation on AMD-SEV shows it’s also compatible with the current system;
  • Running unmodified Phala-released programs including Phala node and off-chain pRuntime (short for Phala Runtime).

Phala follows the “Don’t Trust, Verify” principle and applies the Remote Attestation process during its worker registration. That is, the pRuntime is required to generate RA Quotes which are directly provided by the trusted hardware and certified by the hardware manufacturer (in this case, Intel). This report contains important information about the hardware and software:

  • Hardware information
  • Whether pRuntime is running inside SGX;
  • The known vulnerabilities given the current hardware and firmware version. Based on this, Phala blockchain will reject the hardware with blacklisted vulnerabilities and assign each Worker a Confidence Level.
  • Software information
  • The hash of the program binary, which helps ensure the pRuntime is unmodified;
  • The initial memory layout of the program, so its initial state is determined.

With all the information, we can verify both the trusted hardware and the program running in it, also the RA Quotes and the confidence level enable us to measure the security level of the workers and customize our security policy on what hardware is allowed to join the network.

Further, we have our Supply-end Tokenomic to incentivize high-quality service from the workers. This is out of the scope of this article so just check it if you are interested.

Key Hierarchy Management

The world’s first key hierarchy for blockchain-TEE hybrid system is proposed in 2019 in the Ekiden paper, and serves as the basis for the Oasis project. As a compute cloud, Phala improves this design to make it applicable to a network of ~100k nodes. Also, we introduce novel mechanism like key rotation to further improve the robustness of the cloud.

Before we really dig into the details of our contract key management, it’s important for you to know that every entity in our system has its own identity key, i.e., every user has his account, and every worker and gatekeeper (which is elected from the workers) has its own sr25519 WorkerKey pair, which is generated inside pRuntime (so also in SGX) and the private key never leaves the SGX. The identity key is used to:

  • Identify an entity’s message with signing;
  • Establish an encrypted communication channel between users, workers and gatekeepers with ECDH key agreement. By default, any communication between any entities is encrypted in Phala.

MasterKey is the root of trust of the whole network. All the contract-related keys, including ClusterKey and ContractKey, are derived based on it. MasterKey is generated and shared by all the gatekeepers (through the encrypted communication channel mentioned above). The security of MasterKey totally depends on the security of gatekeepers, that’s why they are distinguished among all the workers in that:

  • Gatekeepers are the workers of top confidence level: they are immune to all known SGX vulnerabilities;
  • Unlike normal workers, the endpoints of gatekeepers are not public and you cannot deploy contracts to them. This reduces the remote access to the gatekeepers;
  • Extra staking is required for the gatekeepers to discourage bad behaviors from their operators.

In Phala, workers are grouped into clusters to provide serverless service. A unique ClusterKey is generated for each cluster using the MasterKey (through key derivation), but you cannot revert this process to infer the MasterKey given the ClusterKey. The ClusterKey is shared with all the workers in that cluster.

Finally, when a contract is deployed to a cluster, it’s finally deployed to all the workers in that cluster. These workers will follow the deterministic process and derive the ClusterKey to get the same ContractKey. The ContractKeys are different for different contracts.

What if certain key is leaked?

  • If some WorkerKey is leaked, the attackers can decrypt all the messages sent to it like the ClusterKey of its cluster and then the ContractKeys of that cluster, or even impersonate it to provide false results to the users. Such misbehaviors can be detected by comparing the results from multiple workers, and then the chain can slash it and confiscate all its staking;
  • If a ContractKey is leaked, the attackers can decrypt the states and all the historical inputs of that contract;
  • If a ClusterKey is leaked, the attackers can know the above information of all the contracts in that cluster;
  • If the MasterKey is leaked, then all the historical data is leaked.

What can we do if the worst case happens?

  • Phala has implemented the Key Rotation in gatekeepers, which means with the permission of the Council, gatekeepers can update the MasterKey, then correspondingly the ClusterKeys and ContractKeys.
  • So when the worst case happens, we will first register the new gatekeepers with the latest hardware, deregister all the old ones (since they are very likely to be vulnerable) and switch to the new MasterKey.

Future Security Mechanisms

  1. Use Multi-Party Computation to manage MasterKey

Now the same MasterKey is shared across all the gatekeepers, so it’s leaked if any one of them is compromised. By turning this into MPC, the attackers will have to compromise the majority of the gatekeepers.

2. Enable RA Quotes refresh

Now since Phat Contract is not yet supported in the mainnet, workers only need to submit the RA Quotes once during their registration. When the Phat Contract is released, we will enable the regular RA Quotes refresh so the vulnerable workers will be slashed once some new vulnerabilities are reported and they do not apply the patches.

Finally, I would like to thank Andrew Miller and the security research team for their contribution to the security field. As Andrew said, his team’s target is to help improve security and reduce the occurrence of security accidents. I sincerely look forward to having more in-depth discussions with security researchers to consolidate the trustless infrastructure for the Web3 world.

Leave a Comment