Saturday, June 28, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Google DeepMind Researchers Propose a Novel AI Method Called Sparse Fine-grained Contrastive Alignment (SPARC) for Fine-Grained Vision-Language Pretraining

January 24, 2024
in AI Technology
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Contrastive pre-training using large, noisy image-text datasets has become popular for building general vision representations. These models align global image and text features in a shared space through similar and dissimilar pairs, excelling in tasks like image classification and retrieval. However, they need help with fine-grained tasks such as localization and spatial relationships. Recent efforts incorporate losses between image patches and text tokens to capture finer details, improving performance in fine-grained retrieval, image classification, object detection, and segmentation. Despite these advancements, challenges like computational expense and reliance on pretrained models persist.

Researchers from Google DeepMind have developed SPARse Fine-grained Contrastive Alignment (SPARC), a method for pretraining fine-grained multimodal representations from image-text pairs. SPARC focuses on learning groups of image patches corresponding to individual words in captions. It utilizes a sparse similarity metric to compute language-grouped vision embeddings for each token, allowing detailed information capture in a computationally efficient manner. SPARC combines fine-grained sequence-wise loss with a contrastive loss, enhancing performance in coarse-grained tasks like classification and fine-grained tasks like retrieval, object detection, and segmentation. The method also improves model faithfulness and captioning in foundational vision-language models.

Contrastive image-text pre-training methods like CLIP and ALIGN have popularized learning general visual representations by leveraging textual supervision from large-scale data scraped from the internet.FILIP proposes a cross-modal late interaction mechanism to optimize the token-wise maximum similarity between image and text tokens, addressing the problem of coarse visual representation in global matching. PACL starts from CLIP-pre-trained vision and text encoders and trains an adapter through a contrastive objective to improve fine-grained understanding. GLoRIA builds localized visual representations by contrasting attention-weighted patch embeddings with text tokens, but it becomes computationally intensive for large batch sizes.

SPARC is a method for pretraining fine-grained multimodal representations from image-text pairs. It uses a sparse similarity metric between image patches and language tokens to learn a grouping of image patches for each token in the caption. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples, enabling detailed information to be learned computationally inexpensively. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to encode global and local information simultaneously.

The SPARC study assesses its performance across image-level tasks like classification and region-level tasks such as retrieval, object detection, and segmentation. It outperforms other methods in both task types and enhances model faithfulness and captioning in foundational vision-language models. In the evaluation, zero-shot segmentation is conducted by computing patch embeddings and determining class matches through cosine similarity with text embeddings of ground-truth classes. Intersection over Union (IoU) is then calculated to measure the accuracy of predicted and ground-truth segmentations for each class.

SPARC improves performance over competing approaches in image-level tasks (classification) and region-level tasks (retrieval, object detection, and segmentation). SPARC achieves improved model faithfulness and captioning in foundational vision-language models. The evaluation of SPARC includes zero-shot segmentation, where patch embeddings of an image are compared to text embeddings of ground-truth classes. The matching class for each patch is assigned based on maximum cosine similarity, and IoU is calculated for each class. The study mentions using Flamingo’s Perceiver Resampler in training SPARC, which suggests incorporating this method in the experimental setup.

In conclusion, SPARC is a method that helps pretrain fine-grained multimodal representations from image-text pairs. To achieve this, it uses fine-grained contrastive alignment and a contrastive loss between global image and text embeddings. SPARC outperforms competing approaches in image-level tasks such as classification and region-level tasks such as retrieval, object detection, and segmentation. SPARC improves model faithfulness and captioning in foundational vision-language models. To evaluate SPARC, zero-shot segmentation is used where patch embeddings of an image are compared to text embeddings of ground-truth classes. The study suggests using Flamingo’s Perceiver Resampler in training SPARC and recommends incorporating it in the experimental setup.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

Source link

Tags: AlignmentCalledContrastiveDeepMindFinegrainedGoogleMethodPreTrainingProposeResearchersSPARCsparseVisionLanguage
Previous Post

Gmail Enhances AI Email Drafting with Voice Input

Next Post

CCSC Technology, Richtech Robotics, SU Group among industrial movers

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
CCSC Technology, Richtech Robotics, SU Group among industrial movers

CCSC Technology, Richtech Robotics, SU Group among industrial movers

Episode #518: Jared Dillian on the Keys to Live a Stress-Free Financial Life – Meb Faber Research

Episode #518: Jared Dillian on the Keys to Live a Stress-Free Financial Life - Meb Faber Research

Cognitive behavioral therapy alters brain activity in children with anxiety

Cognitive behavioral therapy alters brain activity in children with anxiety

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
How ‘Chain of Thought’ Makes Transformers Smarter

How ‘Chain of Thought’ Makes Transformers Smarter

May 13, 2024
Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

October 2, 2023
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

October 30, 2023
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In