Privacy Budget: A Roadmap to Privacy Preserving Data Collaboration

Privacy Budget: A Roadmap to Privacy Preserving Data Collaboration

Authors:

In the world of data collaboration, it is generally understood that when two or more parties share training data for their AI models, they achieve more accurate predictions while mitigating bias because they are learning from deeper and more diverse data points. In other words, transformative potential is achieved when multiple parties can securely collaborate on data with trust that privacy and regulatory compliance are maintained.

 

What Is a Privacy Budget?

In a previous blog post, we proposed expanding the concept of the privacy budget, which should be allocated in any data-driven project that uses privacy-enhancing technologies (PETs). PETs ensure input privacy, enabling computation on sensitive input data without revealing its contents or allowing deduction. Yet the result of a computation often allows some conclusions about the input data; and with repeated inquiries on the same data, the privacy risk grows. Thus a privacy budget ensures output privacy by restricting the allowed computations on a dataset, keeping the total amount of revealed information within the defined bounds of the “budget.”

In this post, we will explore how the privacy budget concept can be applied when multiple parties collaborate on sensitive data.

 

Privacy Budget and Secure Multiparty Computation 

Secure multiparty computation (MPC) is a PET that provides methods for multiple parties to perform joint computations without moving their sensitive input data or revealing it to each other. However, the final result of these computations is sometimes revealed to one or multiple parties, which may compromise output privacy. Estimating the amount of revealed information by party and data source can be challenging, especially when the parties collude (e.g., by sharing data or outputs) and when there is correlation among the data sources (i.e., information about one dataset can reveal information about the other dataset) or the requests (e.g., certain queries in combination may compromise the identity of an individual). In this complex setting, the concept of a privacy budget is especially important and useful to ensure that a threshold is set for the amount of information that can be explicitly revealed during the joint computation. 

A privacy budget can be implemented in the following steps.

1. Privacy budget allocation. Initially, a privacy budget is defined per data source and per party.

2. Privacy budget monitoring. Before running any computation, the privacy budget of each input data source and party is checked.

3. Privacy budget balancing. After a computation is executed, the consumed privacy budgets are subtracted from the respective privacy allowances. The defined privacy budget per data source and per party may be partially or fully consumed, depending on the specific context of a joint computation, and taking into account intermediate results, metadata, and the final output.

Steps 2 and 3 are repeated until the privacy budget is exhausted. In this scenario, the privacy budget acts as a form of access control, allowing data owners to limit the amount of information revealed about their data sources.  

One approach is to visualize the privacy budget with a table. While the privacy budget concept might be most useful in contexts like machine learning and artificial intelligence, we offer a simple example of three coworkers who want to compute their average salary using MPC.

In the privacy budget allocation, each combination of data source and participant is allocated a privacy budget. Since Allie, Brian, and Caroline each already know their salaries, they are allocated an infinite budget for their own input data. Note that this is a particular case, because Allie, Brian, and Caroline are simultaneously input parties (providing their salary as input for the computation) and compute parties (participating in the computation). In general, those roles could be split between different parties, in which case a finite privacy budget could be allocated to each compute party per data source.

Data source Allie Brian Caroline
Allie’s salary 100% 100%
Brian’s salary 100% 100%
Caroline’s salary 100% 100%

 

In privacy budget monitoring (below), we see the effect the computation of average salary would have on each privacy budget if Allie, Brian, and Caroline find out the result. The existing privacy budget allows the execution of this computation.

Computation Data source Allie Brian Caroline
mean Allie’s salary -50% -50% -50%
mean Brian’s salary -50% -50% -50%
mean Caroline’s salary -50% -50% -50%

 

And here is the updated table in privacy budget balancing, after execution of the computation:

Data source Allie Brian Caroline
Allie’s salary 50% 50%
Brian’s salary 50% 50%
Caroline’s salary 50% 50%

 

Note that the three coworkers could also decide to forward their average salary to another party and not reveal this result to each other, in which case only the privacy budget of the results party would be affected.

This example highlights the impact collusion and chained computations have on the privacy budget. If Allie were to find out Brian’s salary (for example via collusion, if Brian told her), then she would have enough information to calculate Caroline’s salary. On the other hand, if Allie and Brian were to do the same computation with a different coworker (let’s call him Dennis), they could figure out the salary difference between Caroline and Dennis. 

Two important fields of active research are privacy budget composition and privacy budget in multiparty settings. 

  • Privacy budget in the context of multiple computations (that is, privacy budget composition), refers to how privacy budgets interact when multiple privacy-preserving operations are performed consecutively. Understanding how privacy budgets combine or accumulate during a sequence of operations (that can be correlated or not) is an open problem. 
  • Privacy budget in multiparty settings explores how privacy budgets interact when different parties collaborate and have different access to the revealed data. This deeply depends on the collusion model between the different parties. For example, when using differential privacy in a multiparty setting, methods for an optimal distribution of a given privacy budget have been suggested in Optimal Distribution of Privacy Budget in Differential Privacy

In conclusion, a critical aspect of ensuring robust privacy preservation is to account for the privacy budget allocated in any given PET data collaboration project. To further explore this topic, please see our white paper Balancing Privacy: Leveraging Privacy Budgets for Privacy-Enhancing Technologies and view our related Web Talk below.