Re-weighted gradient descent via distributionally robust optimization

Re-weighted gradient descent via distributionally robust optimization – Google Research Blog

Ramnath Kumar and Arun Sai Suggala from Google Research have introduced a variant of the classical stochastic gradient descent (SGD) algorithm called Stochastic Re-weighted Gradient Descent (RGD). RGD is a lightweight algorithm that re-weights data points during each optimization step based on their difficulty. Unlike traditional algorithms like SGD, which give equal importance to all samples, RGD gives more importance to points that the model identifies as more difficult. This re-weighting is done by calculating the difficulty of a point using its loss and reweighting it by the exponential of its loss. RGD can be implemented with just two lines of code and can be combined with popular optimizers like SGD, Adam, and Adagrad.

The authors demonstrate that the RGD reweighting algorithm improves the performance of various learning algorithms across different tasks, including supervised learning, meta-learning, domain adaptation, and class imbalance. They show improvements over state-of-the-art methods on benchmarks like DomainBed, Tabular classification, GLUE, and ImageNet-1K.

The RGD algorithm is inspired by distributionally robust optimization (DRO), which assumes a “worst-case” data distribution shift may occur and aims to make the model robust to these perturbations. The authors develop RGD as a technique for solving the DRO objective, specifically focusing on Kullback-Leibler divergence-based DRO.

The authors acknowledge that RGD may not provide performance improvements in scenarios where training data has a high volume of corruptions. However, they suggest that applying an outlier removal technique to the RGD algorithm could potentially handle such scenarios.

In conclusion, RGD is a promising technique for improving the performance of deep neural networks across various domains. It is simple to implement and can be easily integrated into existing algorithms. The authors express their gratitude to the anonymous reviewers and other members involved in the research.

Source link