Posted by Hidayet Aksu, Software Engineer, and Adam Sealfon, Research Scientist, Google
In recent years, Google launched the Privacy Sandbox initiative to find responsible ways for advertisers to measure the effectiveness of their campaigns. The goal is to deprecate third-party cookies, while also addressing any competition concerns with the UK’s Competition and Markets Authority. Cookies are small pieces of data that websites store on a user’s device to personalize their browsing experience and serve relevant content or ads. The Privacy Sandbox aims to provide a privacy-preserving alternative to tracking browsing data across the web.
Differential privacy (DP) is used by many browsers to provide privacy-preserving APIs, such as the Attribution Reporting API (ARA), which doesn’t rely on cookies for ad conversion measurement. ARA encrypts individual user actions and collects them in an aggregated summary report, estimating measurement goals like the number and value of conversions attributed to ad campaigns. Configuring API parameters, like allocating a contribution budget across different conversions, is crucial for maximizing the utility of the summary reports.
In our paper, “Summary Report Optimization in the Privacy Sandbox Attribution Reporting API”, we introduce a formal mathematical framework for modeling summary reports. We then formulate the problem of maximizing the utility of summary reports as an optimization problem to obtain the optimal ARA parameters. We evaluate the method using real and synthetic datasets and demonstrate significantly improved utility compared to non-optimized summary reports.
The ARA summary reports can be modeled using four algorithms: Contribution Vector, Contribution Bounding, Summary Reports, and Reconstruct Values. Contribution Bounding and Summary Reports are performed by the ARA, while Contribution Vector and Reconstruct Values are performed by an AdTech provider. The objective of our work is to assist AdTechs in optimizing summary report algorithms.
To evaluate the quality of an approximation, we selected the ð-truncated root mean square relative error (RMSREð) as our error metric. To optimize utility measured by RMSREð, we choose a capping parameter, C, and a privacy budget, ð¼, for each slice. The combination of both determines how measurements are encoded and passed to the ARA for processing. We used the SLSQP minimizer from the scipy.optimize library to select privacy budgets and capping parameters.
To address limitations in accessing real conversion datasets, we developed a method for generating synthetic data that replicates the characteristics of real data. We analyzed real conversion datasets to uncover relevant characteristics and used statistical modeling to create realistic synthetic datasets for experimentation.
We evaluated our algorithms on three real-world datasets and three synthetic datasets, partitioned into training and test sets. Our optimization-based algorithm consistently achieved lower error than baselines on both real-world and synthetic datasets.
In conclusion, we studied the optimization of summary reports in the ARA and presented a rigorous formulation of the problem. Our optimization-based approach significantly improved the utility of summary reports compared to non-optimized approaches.
Source link