Advances in private training for production on-device language models

Advances in private training for production on-device language models – Google Research Blog

Zheng Xu, Research Scientist, and Yanxiang Zhang, Software Engineer at Google, discuss the importance of language models (LMs) trained to predict the next word given input text in various applications. Gboard utilizes LMs to enhance users’ typing experience with features like next word prediction, Smart Compose, smart completion, slide to type, and proofread. Deploying models on users’ devices offers advantages like lower latency and increased privacy.

The blog explores the development of on-device language models for Gboard, focusing on private training methods using federated learning (FL) and formal differential privacy (DP) guarantees. FL allows mobile phones to collaborate in learning models while keeping training data on devices, and DP ensures data anonymization. Gboard’s NWP neural network LMs are trained with FL and DP guarantees, with small values of (É, Î´) representing stronger privacy guarantees.

Privacy principles in Gboard include transparency, data minimization, data anonymization, and auditability. Recent advancements have led to the implementation of DP mechanisms like clipping and adding noise to model updates to prevent memorization. The blog also discusses the use of SecAgg encryption and the DP-FTRL algorithm to enhance privacy guarantees.

All NWP neural network LMs in Gboard now have DP guarantees, and future launches of LMs trained on user data will require DP. By following best practices, Gboard has achieved strong privacy guarantees for models trained directly on user data. The blog highlights the successful deployment of Portuguese and Spanish LMs with Îµ ≤ 1 DP guarantees, demonstrating the commitment to privacy while improving user experience.

Ongoing research focuses on improving privacy-utility-computation trade-offs, extending DP-FTRL to distributed DP, and enhancing auditability and verifiability. The authors acknowledge the contributions of colleagues and emphasize the importance of balancing privacy and utility in ML algorithms.

Source link