Sunday, June 22, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Amazon MSK Introduces Managed Data Delivery from Apache Kafka to Your Data Lake

September 27, 2023
in Cloud & Programming
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter



I am thrilled to announce a new feature of Amazon Managed Streaming for Apache Kafka (Amazon MSK) that allows you to continuously load data from an Apache Kafka cluster to Amazon Simple Storage Service (Amazon S3). This feature utilizes Amazon Kinesis Data Firehose, an extract, transform, and load (ETL) service, to read data from a Kafka topic, transform the records, and write them to an Amazon S3 destination. With Kinesis Data Firehose, you can easily configure the service in the console without the need for any code or infrastructure.

Kafka is commonly used for building real-time data pipelines that efficiently move large amounts of data between systems or applications. Many AWS customers have adopted Kafka to capture streaming data such as click-stream events, transactions, IoT events, and application and machine logs. These customers rely on real-time analytics, continuous transformations, and distributing this data to data lakes and databases in real time.

However, deploying Kafka clusters comes with its own challenges. The first challenge is setting up, configuring, and maintaining the Kafka cluster itself. To address this, we released Amazon MSK in May 2019, which simplifies the process of setting up, scaling, and managing Apache Kafka in production. With MSK, you can focus on your data and applications while we take care of the infrastructure.

The second challenge is writing, deploying, and managing application code that consumes data from Kafka. This typically involves coding connectors using the Kafka Connect framework and then managing a scalable infrastructure to run these connectors. Additionally, you need to handle data transformation, compression, error management, and retry logic to ensure data integrity during the transfer out of Kafka.

Today, we are excited to announce a fully managed solution that enables you to deliver data from Amazon MSK to Amazon S3 using Amazon Kinesis Data Firehose. This solution is serverless, requiring no server infrastructure management or code development. You can configure data transformation and error-handling logic with just a few clicks in the console.

The architecture of this solution is illustrated in the diagram below. Amazon MSK serves as the data source, Amazon S3 acts as the data destination, and Amazon Kinesis Data Firehose manages the data transfer logic.

With this new capability, you no longer need to develop code to read data from Amazon MSK, transform it, and write the resulting records to Amazon S3. Kinesis Data Firehose handles the reading, transformation, compression, and write operations to Amazon S3. It also manages error handling and retries in case of any issues. Records that cannot be processed are delivered to the S3 bucket of your choice for manual inspection. The system automatically scales out and scales in to handle the volume of data without any provisioning or maintenance operations required on your end.

Kinesis Data Firehose delivery streams support both public and private Amazon MSK provisioned or serverless clusters. They also support cross-account connections to read from an MSK cluster and write to S3 buckets in different AWS accounts. The delivery stream reads data from your MSK cluster, buffers it based on configurable thresholds, and then writes the buffered data to Amazon S3 as a single file. While MSK and Data Firehose must be in the same AWS Region, Data Firehose can deliver data to Amazon S3 buckets in other Regions.

Kinesis Data Firehose delivery streams offer support for data type conversions. Built-in transformations are available to convert JSON data to Apache Parquet and Apache ORC formats. These columnar data formats optimize storage space and enable faster queries on Amazon S3. For non-JSON data, you can use AWS Lambda to transform input formats such as CSV, XML, or structured text into JSON before converting the data to Apache Parquet/ORC. Additionally, you can specify data compression formats such as GZIP, ZIP, and SNAPPY before delivering the data to Amazon S3, or you can deliver the data in its raw form.

To get started, you can use an AWS account with an existing Amazon MSK cluster and applications streaming data to it. You can create and configure the data delivery stream using the AWS Management Console, AWS CLI, AWS SDKs, AWS CloudFormation, or Terraform. Simply navigate to the Amazon Kinesis Data Firehose page in the console and choose “Create delivery stream”. Select Amazon MSK as the data source, Amazon S3 as the delivery destination, and configure the required parameters. Once the delivery stream is created, you can see the data appearing in the chosen destination format in your S3 bucket.

This new capability is available in all AWS Regions where Amazon MSK and Kinesis Data Firehose are available. You are billed based on the volume of data going out of Amazon MSK, measured in GB per month. The billing system takes into account the exact record size without any rounding. Detailed pricing information can be found on the pricing page.

We are excited to see the reduction in infrastructure and code that you will experience by adopting this new capability. Start configuring your first data stream between Amazon MSK and Amazon S3 today.



Source link

Tags: AmazonApachedataDeliveryIntroducesKafkaLakeManagedMSK
Previous Post

Re-imagining the opera of the future | MIT News

Next Post

Options for passwordless authentication in Django apps

Related Posts

Top 20 Javascript Libraries You Should Know in 2024
Cloud & Programming

Top 20 Javascript Libraries You Should Know in 2024

June 10, 2024
Simplify risk and compliance assessments with the new common control library in AWS Audit Manager
Cloud & Programming

Simplify risk and compliance assessments with the new common control library in AWS Audit Manager

June 6, 2024
Simplify Regular Expressions with RegExpBuilderJS
Cloud & Programming

Simplify Regular Expressions with RegExpBuilderJS

June 6, 2024
How to learn data visualization to accelerate your career
Cloud & Programming

How to learn data visualization to accelerate your career

June 6, 2024
BitTitan Announces Seasoned Tech Leader Aaron Wadsworth as General Manager
Cloud & Programming

BitTitan Announces Seasoned Tech Leader Aaron Wadsworth as General Manager

June 6, 2024
Copilot Studio turns to AI-powered workflows
Cloud & Programming

Copilot Studio turns to AI-powered workflows

June 6, 2024
Next Post
Options for passwordless authentication in Django apps

Options for passwordless authentication in Django apps

A Strategic Must for Modern Contact Centers

A Strategic Must for Modern Contact Centers

What is TensorFlow and how does it work?

What is TensorFlow and how does it work?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
How ‘Chain of Thought’ Makes Transformers Smarter

How ‘Chain of Thought’ Makes Transformers Smarter

May 13, 2024
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

October 30, 2023
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In