Transforming Sensitive Data Collaboration
With expanding privacy regulations, data transfer restrictions and sophisticated cyber attacks on the rise, organizations face unprecedented challenges to store, process and share sensitive data. Fortunately, new advancements in trusted hardware and cryptographic computing provide secure and privacy-compliant workflows across cloud services and on-premises deployments. These capabilities unlock a new era of private computing and secure collaboration with end-to-end data protection.
Private Data Computing Scenarios
Cloud-based Private data is commonly migrated to the cloud with standard encryption methods that protect it in-transit and at-rest for secure transmission and storage. The data is then decrypted for processing, leaving it vulnerable during the computation. For highly sensitive workloads and data sharing, this was traditionally done in a data clean room or a Sensitive Compartmented Information Facility (SCIF) that restricts physical access but requires trust in the operator. Alternatively, a Trusted Execution Environment (TEE) such as Microsoft Azure confidential computing provides scalable hardware-based security that protects the data during processing.
On-premises Private data and compute workloads remain on premises, often occurring within owned or co-located hardware and data centers. In these scenarios, a data owner’s privacy policy does not typically allow private data to leave the premises or “privacy zone” of their choice. As a result, organizations are often unable to leverage cloud capabilities or participate in emerging data clean rooms and hence, a different privacy-preserving technology is needed to enable secure data collaboration.
There are numerous use cases where companies abiding heterogeneous data privacy policies would like to run collaborative analytics with minimal technical overhead while still complying with their corporate governance and policies.
In this blog, we demonstrate how Inpher’s XOR Platform advances the privacy preserving data sharing landscape for financial institutions that need to join and compute on disparate data sets. We will examine a multifaceted use case which requires data collaboration between three leading financial institutions, which we will refer to as Bank A, B, and C, two of which (A and B) have already migrated their private data to Azure with adherence to their cloud privacy policies referenced above. Bank C, however, is bound to on-premises data privacy policies. The goal of each of the respective banks is to perform private joins of their respective datasets on unique identifiers found in each dataset to ensure secure computing.
XOR Privacy Enhancing Platform
Privacy enhancing technologies (PETs) are a broad category of tools and techniques used to protect the privacy and confidentiality of sensitive data, while allowing for useful analysis and computation. Secure multi-party computation (MPC) is a proven approach that allows multiple parties to perform computations on their private data without revealing anything about the sensitive data itself other than the final result. Similarly, data clean rooms based on trusted execution environments (TEEs) provide an isolated environment for executing sensitive computations. Examples of other PETs are differential privacy, anonymization, homomorphic encryption, federated machine learning and synthetic data.
Our Inpher XOR Platform is a privacy enhancing platform that enables enterprise organizations to privately collaborate on sensitive data. The platform leverages XORs highly scalable, secure multiparty computation engine, and coupled with other privacy enhancing technologies (e.g., fully homomorphic encryption (FHE), differential privacy, federated machine learning and trusted execution environments (TEE)), to enable the most advanced and comprehensive data analysis and computation capabilities when joining multiple data sets, across multiple parties and infrastructures – with privacy preserved. Our XOR Platform provides sophisticated features for privacy preserving machine learning workflows with a variety of user-friendly APIs.
Azure confidential computing is a set of security features and services offered by Microsoft Azure that enables the processing of encrypted data in a trusted execution environment (TEE). This technology provides a secure platform for running sensitive workloads and allows customers to process and analyze their data without exposing it to unauthorized parties. Azure confidential computing includes various security measures such as secure enclaves, which are isolated regions of memory and processing power that can be used to run code and store data in a protected environment, so that even the host of a virtual machine does not need to be trusted.
Better Together: Hybrid data joins with Azure Confidential Computing and Inpher XOR
Consider the following use case in the financial sector: Three financial institutions, Banks A, B, and C, who want to join their datasets privately and compute on them, without revealing any sensitive information to each other. Bank A owns a large transaction dataset with fields such as beneficiary account and ordering account as well as the amount, timestamp and a binary flag for each transaction. The flag indicates whether Bank A has marked the transaction as potentially fraudulent.
Below is a sample from A’s dataset:
beneficiary_account,ordering_account,timestamp,amount,flag
YY_209104638,KK_256297547,2022-04-01 00:02:03,1032.87,0
ZZ_575407640,QQ_946682039,2022-04-01 00:03:15,1388.22,0
CC_301017236,MM_116158610,2022-04-01 00:06:44,510.55,1
The account prefix indicates the bank of the corresponding accounts. In this example, the dataset of Bank A is extremely large and includes a large number of banks – a good reason to first perform a plaintext join (in an isolated environment) rather than a more expensive MPC operation.
To add to the complexity of this example, Banks B and C have personal information about their account holders which they cannot share with external institutions (such as bank A), as well as themselves.
The typical format of the accounts datasets of B and C are as follows:
account,amount,name,address,city,zip,country CH_123456789,112345.67,John Smith,123 Main St,New York,10001,USA CH_234567890,223456.78,Jane Doe,456 Elm St,San Francisco,94109,USA CH_345678901,334567.89,Bob Johnson,789 Oak St,Chicago,60607,USA |
and
account,amount,name,address,city,zip,country YY_321987654,101234.56,Kevin Lee,789 Pine St,Vancouver,V6B 1S3,Canada YY_678901234,112345.67,Rebecca Smith,234 Cedar St,Toronto,M5V 1M1,Canada YY_123456789,223456.78,Adam Johnson,456 Maple St,Sydney,2000,Australia |
The banks also need to comply with the following privacy policies:
- Requirement 1: Data privacy policies of banks A and B allow plaintext joins in a data clean room hosted by another party such as Microsoft Azure
- Requirement 2: Data privacy policy of bank C restricts data from leaving the bank’s premises
In this scenario, providing a privacy-compliant solution requires a hybrid combination of PETs, namely TEE and MPC. A TEE is leveraged to ensure that Requirement 1 is satisfied, and MPC addresses Requirement 2 by guaranteeing that data never leaves premises.
For the purpose of this blog, we assume that both A and B have already migrated their datasets to Microsoft Azure Cloud Services and have the data available for plaintext computations using Azure confidential computing in a data clean room.
Inpher’s XOR Platform enables Bank C to privately join its data together with the resulting computations derived from bank A and B from the data clean room.
Figure 1 illustrates a deployment of Inpher’s XOR Platform on Azure that addresses the above setting:
- The XOR service is deployed on Azure Cloud for federated compute orchestration, but has no access to to the private data sources..
- Two Inpher XOR machines are deployed and running:
- One on the Azure confidential VM ECasv5 (to enable the private join between Bank A and Bank B in a data clean room)
- Another one on premise at Bank C.
The first private join is between the transactions in A on the beneficiary account and the account identifiers in the bank data B. This join is performed on a secure enclave using TEE technology, ensuring that the data is kept secure and private. This approach has the advantage that it is performed entirely in plaintext and the output of the operation is a dataset that is much smaller than the original large transaction dataset of financial institution A. The second private join is between the transactions of A on the ordering account and the account identifiers in the bank data B, following the same process as the first join.
The resulting joined datasets between A and B residing in the data clean room then needs to be further joined with the on-premise data of Bank C. This join is performed using secure multiparty computation with XOR. The resulting table (the join of the datasets of A, B and C) is then either revealed to a predefined and collaboration approved data analyst, or kept secret-shared and later reused as input to other operations such as feature engineering, model building, model inference, etc. The advantage of this approach is that no private data leaves the premises of bank C and no private data in the data join of banks A and B (kept in the Azure confidential computing virtual machine) is revealed.
We emphasize the simplicity of the deployment and of the xor-py programming API by illustrating the major operations below:
from xor import session from xor import dataframe as xf import jsonwith session.open_session(“inpher.tee.xorg.inpher.io“, 3000, api_token=“*”, use_ssl=False) as sess: x=sess.list_datasets(“two-parties”) # the default client to use for interaction with the XOR Service client = xf.Client(“two-parties”, “inpher.tee.xorg.inpher.io”, 3000, “*”, False) # A reference to the transaction data owned by A. Remember, this notebook assumes that the # data has been uploaded to a trusted execution environment represented by MPC party, “player-0”. tee_transactions = xf.use(“transactions_a”, “player-0”, client=client) # A reference to the account data owned by B, also available in the TEE. tee_accounts = xf.use(“accounts_b”, “player-0”, client=client)# A reference to the account data owned by C, this time on a separate MPC party, “player-1”. c_accounts = xf.use(“accounts_c”, “player-1”, client=client)# Declare the desired join operation in the TEE. This will use a plaintext join. In this specific instance # an exact match on specific columns is performed. However, if the fields containing the join keys # could contain similar but not exact matches, a fuzzy join could be used instead. tee_join = xf.join([tee_accounts, tee_transactions], on=[[“account”], [“beneficiary_account”]])# Declare the desired join operation via MPC. This will use a private set intersection. mpc_join = xf.join([tee_join, c_accounts], on=[[“ordering_account_1”], [“account”]]) # Run and get the result result = mpc_join.run(reveal=True) result_df = result.reveal() |
Conclusion
In summary, Azure confidential computing provides a valuable security solution for protecting computation and data on remote servers. Additionally, when it comes to data collaboration between multiple parties with heterogeneous data privacy policies across cloud services and on-premises deployments, the Inpher XOR Platform delivers a scalable solution with cryptographic security that complements TEEs – thus expanding the capabilities for privacy preserving data collaboration. In addition, the highly automated deployment tools and federated compute orchestration through XOR provides seamless integration with mainstream cloud services, including Microsoft Azure. The use case of private dataset joins in the financial sector illustrates the unmatched benefits of hybrid solutions between Inpher XOR and Azure confidential computing for the data driven enterprise.