Friday, May 9, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

This Machine Learning Research Introduces Mechanistic Architecture Design (Mad) Pipeline: Encompassing Small-Scale Capability Unit Tests Predictive of Scaling Laws

April 6, 2024
in AI Technology
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Creating deep learning architectures requires a lot of resources because it involves a large design space, lengthy prototyping periods, and expensive computations related to at-scale model training and evaluation. Architectural improvements are achieved through an opaque development process guided by heuristics and individual experience rather than systematic procedures. This is due to the combinatorial explosion of possible designs and the lack of reliable prototyping pipelines despite progress on automated neural architecture search methods. The necessity for principled and agile design pipelines is further emphasized by the high expenses and lengthy iteration periods linked to training and testing new designs, exacerbating the problem. 

Despite the abundance of potential architectural designs, most models use variants on a standard Transformer recipe that alternates between memory-based (self-attention layers) and memoryless (shallow FFNs) mixers. The original Transformer design is the basis for this specific set of computational primitives known to enhance quality. Empirical evidence suggests that these primitives excel at specific sub-tasks within sequence modeling, such as context versus factual recall.

Researchers from Together AI, Stanford University, Hessian AI, RIKEN, Arc Institute, CZ Biohub, and Liquid AI investigate architecture optimization, ranging from scaling rules to artificial activities that test certain model capabilities. They introduce mechanistic architectural design (MAD), an approach for rapid architecture prototypes and testing. Selected to function as discrete unit tests for critical architecture characteristics, MAD comprises a set of synthetic activities like compression, memorization, and recall that necessitate just minutes of training time. Developing better methods for manipulating sequences, such as in-context learning and recall, has led to a better understanding of sequence models like Transformers, which has inspired MAD problems.

Using MAD, the team evaluates designs that use well-known and unfamiliar computational primitives, including gated convolutions, gated input-varying linear recurrences, and additional operators like mixtures of experts (MoEs). They use MAD to filter to find potential candidates for architecture. This has led to the discovery and validation of various design optimization strategies, such as striping—creating hybrid architectures by sequentially interleaving blocks made of various computational primitives with a predetermined connection topology.

The researchers investigate the link between MAD synthetics and real-world scaling by training 500 language models with diverse architectures and 70–7 billion parameters to conduct the broadest scaling law analysis on developing architectures. Scaling rules for compute-optimal LSTMs and Transformers are the foundation of their protocol. Overall, hybrid designs outperform their non-hybrid counterparts in scaling, reducing pretraining losses over a range of FLOP compute budgets at the compute-optimal frontier. Their work also demonstrates that novel architectures are more resilient to extensive pretraining runs outside the optimal frontier.

The state’s size, similar to kv-caches in standard Transformers, is an important factor in MAD and its scaling analysis. It determines inference efficiency and memory cost and likely directly affects recall capabilities. The team presents a state-optimal scaling methodology to estimate the complexity scaling with the state dimension of various model designs. They discover hybrid designs that strike a good compromise between complexity, state dimension, and computing requirements.

By combining MAD with newly developed computational primitives, they can create cutting-edge hybrid architectures that achieve 20% lower perplexity while maintaining the same computing budget as the top Transformer, convolutional, and recurrent baselines (Transformer++, Hyena, Mamba).

The findings of this research have significant implications for machine learning and artificial intelligence. By demonstrating that a well-chosen set of MAD simulated tasks can accurately forecast scaling law performance, the team opens the door to automated, faster architecture design. This is particularly relevant for models of the same architectural class, where MAD accuracy is closely associated with compute-optimal perplexity at scale.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter.

Don’t Forget to join our 39k+ ML SubReddit

📢New research on mechanistic architecture design and scaling laws.

– We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date

– For the first time, we show that architecture performance on a set of isolated token… pic.twitter.com/khJAXnvwWA

— Michael Poli (@MichaelPoli6) March 28, 2024

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…



Source link

Tags: ArchitectureCapabilityDesignEncompassingIntroduceslawsLearningMachineMadMechanisticPipelinePredictiveResearchscalingSmallScaletestsunit
Previous Post

Unifying Neural Network Design with Category Theory: A Comprehensive Framework for Deep Learning Architecture

Next Post

How to Use Google Colab: A Beginner’s Guide

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
How to Use Google Colab: A Beginner’s Guide

How to Use Google Colab: A Beginner’s Guide

Glacial Inflation Slowdown Set to Back Fed Rate-Cut Caution

Glacial Inflation Slowdown Set to Back Fed Rate-Cut Caution

‘Don’t just invest in children’s funds’: Edelweiss’ Radhika Gupta busts myths on investing for minors

'Don't just invest in children's funds': Edelweiss' Radhika Gupta busts myths on investing for minors

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In