Posted by Ameya Velingker, Research Scientist, Google Research, and Balaji Venkatachalam, Software Engineer, Google Graphs, in which objects and their relations are represented as nodes (or vertices) and edges (or links) between pairs of nodes, are ubiquitous in computing and machine learning (ML). HTML tags are used to structure the content and specify formatting. For example, social networks, road networks, and molecular structure and interactions are all domains in which underlying datasets have a natural graph structure. ML can be used to learn the properties of nodes, edges, or entire graphs. Graph neural networks (GNNs) are a common approach to learning on graphs. They operate on graph data by applying an optimizable transformation on node, edge, and global attributes. The most typical class of GNNs operates via a message-passing framework, where each layer aggregates the representation of a node with those of its immediate neighbors. Recently, graph transformer models have emerged as a popular alternative to message-passing GNNs. These models adapt the success of Transformer architectures in natural language processing (NLP) to graph-structured data. The attention mechanism in graph transformers can be modeled by an interaction graph, where edges represent pairs of nodes that attend to each other. Unlike message passing architectures, graph transformers have a separate interaction graph that is different from the input graph. The typical interaction graph is a complete graph, which models direct interactions between all pairs of nodes. However, this creates computational and memory bottlenecks that limit the applicability of graph transformers to datasets with small graphs. Making graph transformers scalable has been a significant research direction. A solution is to use a sparse interaction graph with fewer edges. Many sparse and efficient transformers have been proposed for sequences, but they do not generally extend to graphs in a principled manner. In “Exphormer: Sparse Transformers for Graphs”, presented at ICML 2023, we address the scalability challenge by introducing a sparse attention framework for transformers designed specifically for graph data. The Exphormer framework makes use of expander graphs, which are sparse yet well-connected graphs that have useful properties. Expander graphs have applications in various areas, such as algorithms, pseudorandomness, complexity theory, and error-correcting codes. Exphormer replaces the dense, fully-connected interaction graph of a standard Transformer with edges of a sparse d-regular expander graph. The resulting graph has good connectivity properties and retains the inductive bias of the input dataset graph while remaining sparse. Each component of the interaction graph serves a specific purpose. Edges from the input graph retain the inductive bias from the input graph structure. Expander edges allow good global connectivity and random walk mixing properties. Virtual nodes serve as global “memory sinks” that can directly communicate with every node. The degree of the expander graph and the number of virtual nodes are hyperparameters that can be tuned to improve the quality metrics. Exphormer is as expressive as the dense transformer and obeys universal approximation properties. In experimental results, Exphormer achieved state-of-the-art performance on various datasets and allowed graph transformer architectures to scale well beyond the usual graph size limits.
Source link