On-device content distillation with graph neural networks – Google Research Blog

Posted by Gabriel Barcik and Duc-Hieu Tran, Research Engineers, Google Research

In today’s digital age, smartphones and desktop web browsers are the main tools for accessing news and information. However, the abundance of clutter on websites, including complex layouts, navigation elements, and unnecessary links, significantly hampers the reading experience and makes article navigation difficult. This problem is especially challenging for individuals with accessibility needs. To enhance the user experience and make reading more accessible, Android and Chrome users can utilize the Reading Mode feature, which improves accessibility by processing webpages to allow customizable contrast, adjustable text size, more legible fonts, and text-to-speech utilities.

In addition, Android’s Reading Mode can also distill content from apps. Expanding Reading Mode to cover a wide range of content and improving its performance, while ensuring that it operates locally on the user’s device and does not transmit data externally, presents a unique challenge. To address this challenge and broaden the capabilities of Reading Mode without compromising privacy, we have developed a novel on-device content distillation model.

Unlike previous attempts that used a heuristic approach limited to news articles, our model excels in both quality and versatility across various types of content. We ensure that the article content remains on the user’s device and does not leave the local environment. Our on-device content distillation model effectively transforms long-form content into a simple and customizable layout for a more enjoyable reading experience, surpassing other alternative approaches. In this research, we delve into the details of our approach, methodology, and results.

Graph neural networks

Instead of relying on complex heuristics that are difficult to maintain and scale to different article layouts, we approach this task as a fully supervised learning problem. This data-driven approach allows the model to generalize better across different layouts, without the limitations and fragility of heuristics. Previous methods for optimizing the reading experience relied on HTML parsing, filtering, and modeling of a document object model (DOM), which is a programming interface generated by web browsers to represent the structure of a document and enable manipulation.

Our new Reading Mode model, on the other hand, utilizes accessibility trees, which provide a streamlined and more accessible representation of the DOM. Accessibility trees are automatically generated from the DOM tree and are used by assistive technologies to enable people with disabilities to interact with web content. These trees are available on Chrome Web browser and Android through AccessibilityNodeInfo objects, which are provided for both WebView and native application content.

To train our model, we manually collected and annotated accessibility trees. The Android dataset used in this project consists of approximately 10,000 labeled examples, while the Chrome dataset contains around 100,000 labeled examples. We developed a novel tool that utilizes graph neural networks (GNNs) to distill essential content from the accessibility trees using a multi-class supervised learning approach. The datasets include long-form articles sampled from the web and labeled with classes such as headline, paragraph, images, publication date, etc.

GNNs are an ideal choice for dealing with tree-like data structures because they naturally learn the connections and relationships within such trees, eliminating the need for manual feature engineering. By directly inputting the tree structure into the model, GNNs utilize a message-passing mechanism where each node communicates with its neighbors. This allows information to be shared and accumulated across the network, enabling the model to discern intricate relationships. In the context of accessibility trees, GNNs can efficiently distill content by understanding and leveraging the inherent structure and relationships within the tree. This capability enables the model to identify and potentially omit non-essential sections based on the information flow within the tree, resulting in more accurate content distillation.

Our architecture follows the encode-process-decode paradigm, utilizing a message-passing neural network to classify text nodes. The overall design is illustrated in the accompanying figure. The article’s tree representation serves as the model’s input. We compute lightweight features based on bounding box information, text information, and accessibility roles. The GNN then propagates each node’s latent representation through the edges of the tree using a message-passing neural network. This propagation process allows nearby nodes, containers, and text elements to share contextual information, enhancing the model’s understanding of the page’s structure and content. Each node updates its current state based on the received messages, providing a more informed basis for classification. After a fixed number of message-passing steps, the contextualized latent representations of the nodes are decoded into essential or non-essential classes. This approach allows the model to leverage both the inherent relationships in the tree and the hand-crafted features representing each node, resulting in more accurate classification.

To ensure broad generalization across languages and faster inference on user devices, we deliberately limit the feature set used by the model. This presented a unique challenge as we needed to create an on-device lightweight model that prioritized privacy. Our final lightweight Android model has 64k parameters, is 334kB in size, and has a median latency of 800ms. The Chrome model has 241k parameters, is 928kB in size, and has a median latency of 378ms. By employing on-device processing, we guarantee that user data never leaves the device, aligning with our commitment to user privacy.

The features used in the model can be categorized into intermediate node features, leaf-node text features, and element position features. We performed feature engineering and selection to optimize the feature set for model performance and size. The final model was converted to TensorFlow Lite format for deployment as an on-device model on Android or Chrome.

Results

We trained the GNN for approximately 50 epochs using a single GPU. The performance of the Android model on webpages and native application test sets is presented in the accompanying table. The table displays content distillation metrics, including precision, recall, and F1-score for three classes: non-essential content, headline, and main body text. Node metrics evaluate the classification performance at the granularity of the accessibility tree node, while word metrics evaluate classification at an individual word level.

The quality of the results on commonly visited webpage articles is assessed by an F1-score exceeding 0.9 for main-text (paragraphs), indicating that 88% of these articles are processed without missing any paragraphs. Furthermore, in over 95% of cases, readers find the distillation valuable. In other words, the majority of readers perceive the distilled content as pertinent and precise, with errors or omissions being rare.

A comparison of Chrome content distillation with other models, such as DOM Distiller or Mozilla Readability, on a set of English language pages is presented in the table below. The metrics used for comparison include BLUE, CHRF, and ROUGE, which assess the quality of the distilled main body text compared to the ground truth.

The results demonstrate the excellent performance of our models compared to other DOM-based approaches.

[The table comparing the models’ performance is not included in the rewritten content.]

In conclusion, we have developed an on-device content distillation model that significantly enhances the reading experience for Android and Chrome users. By utilizing graph neural networks and accessibility trees, our model excels in distilling essential content from various types of webpages and apps. The model’s performance surpasses that of alternative approaches, ensuring a more pleasant and accessible reading journey. Additionally, our model’s lightweight design prioritizes user privacy by operating locally on the device and never transmitting user data externally.

Source link