The “Attention is All You Need” transformer revolution has had a profound effect on the design of deep learning model architectures. Not long after BERT, there were several other models like RoBERTa, ALBERT, DistilBERT, SpanBERT, DeBERTa, and many more. This has led to the formation of a museum of transformer architectures on the HuggingFace side panel, and the pace of new models being developed has only increased.
Model architecture refers to the computational graph underlying the execution of the model. For example, below is a snippet from Netron showing part of the computational graph of T5. Each node represents an operation or a variable, forming a node graph architecture.
Even though there are numerous architectures currently available, it is certain that there will be even more modifications and breakthroughs in the future. However, it is human researchers who need to understand these models, make hypotheses, troubleshoot, and test them. As models become larger and more complex, understanding their architectures becomes more challenging. With AI guidance, humans can potentially discover model architectures that would otherwise take many years or decades to discover without AI assistance.
Intelligent Model Architecture Design (MAD) is the idea that generative AI can assist scientists and AI researchers in designing better and more effective model architectures faster and easier. Large language models (LLMs) have already provided immense value and creativity in various tasks. The question is, can we leverage the same intelligent assistance and creativity for designing model architectures? Researchers could prompt the AI system with their ideas and associate text-based descriptions of model architectures to learn what techniques and names are associated with specific models.
The importance of model architecture lies in the fact that it plays a crucial role in the performance of AI models. While there is a push for high-quality datasets, model surgery methods that improve training efficiency, inference speed, and accuracy are still being explored. MAD is important for achieving the best results with minimal training.
Neural Architecture Search (NAS) aligns with the idea of intelligent MAD, aiming to alleviate the burden of manually designing architectures. NAS benchmarks consist of machine learning datasets where X represents a machine learning architecture expressed as a graph, and Y represents the evaluation metric when that architecture is trained and tested. NAS techniques focus on recombining existing components of model architectures to discover new variations.
Efficient NAS (ENAS) addresses the problem of exhaustive training and evaluation by training a super network with candidate models that share the same weights. Parameter-sharing between candidate models makes NAS more practical. From a generative AI perspective, there is an opportunity to pre-train on model architectures and use this foundation model for generating architectures as a language.
Representing architectures as graphs, rather than text, is beneficial as it allows the model to learn on a representation closer to the ground truth. Graph Neural Networks (GNNs) and graph transformers are effective in working with graph structures, which are prevalent in MAD. While code generation has limitations, the combination of graph transformers and LLMs shows promise for generating architectures.
In conclusion, the use of generative AI in MAD can assist researchers in designing better model architectures. NAS techniques and graph-based approaches offer potential solutions to automate the design process and enhance the performance of AI models.
Source link