Friday, May 9, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Microsoft Researchers Introduce Kosmos-2.5: A Multimodal Literate Model for Machine Reading of Text-Intensive Images

September 25, 2023
in Data Science & ML
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Lately, massive language fashions (LLMs) have gained prominence in synthetic intelligence, however they’ve primarily targeted on textual content and struggled with understanding visible content material. Multimodal massive language fashions (MLLMs) have emerged to bridge this hole. MLLMs mix visible and textual info in a single Transformer-based mannequin, permitting them to study and generate content material from each modalities, marking a big development in AI capabilities.

KOSMOS-2.5 is a multimodal mannequin designed to deal with two intently associated transcription duties inside a unified framework. The primary job entails producing textual content blocks with spatial consciousness and assigning spatial coordinates to textual content traces inside text-rich photographs. The second job focuses on producing structured textual content output in markdown format, capturing varied types and buildings.

Each duties are managed beneath a single system, using a shared Transformer structure, task-specific prompts, and adaptable textual content representations. The mannequin’s structure combines a imaginative and prescient encoder primarily based on ViT (Imaginative and prescient Transformer) with a language decoder primarily based on the Transformer structure, linked by way of a resampler module.

To coach this mannequin, it undergoes pretraining on a considerable dataset of text-heavy photographs, which embody textual content traces with bounding containers and plain markdown textual content. This dual-task coaching strategy enhances KOSMOS-2.5’s total multimodal literacy capabilities.

The above picture reveals the Mannequin structure of KOSMOS-2.5. The efficiency of KOSMOS-2.5 is evaluated throughout two most important duties: end-to-end document-level textual content recognition and the era of textual content from photographs in markdown format. Experimental outcomes have showcased its robust efficiency in understanding text-intensive picture duties. Moreover, KOSMOS-2.5 reveals promising capabilities in eventualities involving few-shot and zero-shot studying, making it a flexible device for real-world purposes that cope with text-rich photographs.

Regardless of these promising outcomes, the present mannequin faces some limitations, providing beneficial future analysis instructions. For example, KOSMOS-2.5 at present doesn’t help fine-grained management of doc parts’ positions utilizing pure language directions, regardless of being pre-trained on inputs and outputs involving the spatial coordinates of textual content. Within the broader analysis panorama, a big route lies in furthering the event of mannequin scaling capabilities.

Take a look at the Paper and Undertaking. All Credit score For This Analysis Goes To the Researchers on This Undertaking. Additionally, don’t neglect to hitch our 30k+ ML SubReddit, 40k+ Fb Neighborhood, Discord Channel, and Electronic mail E-newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra.

In case you like our work, you’ll love our e-newsletter..

Janhavi Lande, is an Engineering Physics graduate from IIT Guwahati, class of 2023. She is an upcoming knowledge scientist and has been working on this planet of ml/ai analysis for the previous two years. She is most fascinated by this ever altering world and its fixed demand of people to maintain up with it. In her pastime she enjoys touring, studying and writing poems.

🚀 The tip of challenge administration by people (Sponsored)



Source link

Tags: ImagesIntroduceKosmos2.5LiterateMachineMicrosoftmodelMultimodalReadingResearchersTextIntensive
Previous Post

What’s Behind the Agents’ Knock Knock? Decoding Contact Center Ratings

Next Post

Setting Clear Automation Goals is the First Step to Choosing the Right Automation

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Setting Clear Automation Goals is the First Step to Choosing the Right Automation

Setting Clear Automation Goals is the First Step to Choosing the Right Automation

Why Investors have to Appreciate the Diversity of AI 

Why Investors have to Appreciate the Diversity of AI 

Make Sure You Catch These 10 Sessions #CMWorld

Make Sure You Catch These 10 Sessions #CMWorld

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

Part 1: ABAP RESTful Application Programming Model (RAP) – Introduction

November 20, 2023
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

A faster, better way to prevent an AI chatbot from giving toxic responses | MIT News

April 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In