Sunday, June 1, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Building Better ML Systems — Chapter 4. Model Deployment and Beyond | by Olga Chernytska | Sep, 2023

September 28, 2023
in AI Technology
Reading Time: 3 mins read
0 0
A A
0
Share on FacebookShare on Twitter



When deploying a model to production, there are two important questions to ask:

1. Should the model return predictions in real time?
2. Could the model be deployed to the cloud?

The first question forces us to choose between real-time vs. batch inference, and the second one — between cloud vs. edge computing.

Real-Time vs. Batch Inference

Real-time inference is a straightforward and intuitive way to work with a model: you give it an input, and it returns you a prediction. This approach is used when prediction is required immediately. For example, a bank might use real-time inference to verify whether a transaction is fraudulent before finalizing it.

Batch inference, on the other hand, is cheaper to run and easier to implement. Inputs that have been previously collected are processed all at once. Batch inference is used for evaluations (when running on static test datasets), ad-hoc campaigns (such as selecting customers for email marketing campaigns), or in situations where immediate predictions aren’t necessary. Batch inference can also be a cost or speed optimization of real-time inference: you precompute predictions in advance and return them when requested.

Cloud vs. Edge Computing

In cloud computing, data is usually transferred over the internet and processed on a centralized server. On the other hand, in edge computing data is processed on the device where it was generated, with each device handling its own data in a decentralized way. Examples of edge devices are phones, laptops, and cars.

Streaming services like Netflix and YouTube are typically running their recommender systems in the cloud. Their apps and websites send user data to data servers to get recommendations. Cloud computing is relatively easy to set up, and you can scale computing resources almost indefinitely (or at least until it’s economically sensible). However, cloud infrastructure heavily depends on a stable Internet connection, and sensitive user data should not be transferred over the Internet.

Edge computing is developed to overcome cloud limitations and is able to work where cloud computing cannot. The self-driving engine is running on the car, so it can still work fast without a stable internet connection. Smartphone authentication systems (like iPhone’s FaceID) run on smartphones because transferring sensitive user data over the internet is not a good idea, and users do need to unlock their phones without an internet connection. However, for edge computing to be viable, the edge device needs to be sufficiently powerful, or alternatively, the model must be lightweight and fast. This gave rise to the model compression methods, such as low-rank approximation, knowledge distillation, pruning, and quantization.

Easy Deployment & Demo

“Production is a spectrum. For some teams, production means generating nice plots from notebook results to show to the business team. For other teams, production means keeping your models up and running for millions of users per day.” Chip Huyen, Why data scientists shouldn’t need to know Kubernetes

To deploy a model, you need a server (instance) where the model will be running, an API to communicate with the model (send inputs, get predictions), and (optionally) a user interface to accept input from users and show them predictions.

Google Colab is Jupyter Notebook on steroids. It is a great tool to create demos that you can share. It does not require any specific installation from users, it offers free servers with GPU to run the code, and you can easily customize it to accept any inputs from users (text files, images, videos).

FastAPI is a framework for building APIs in Python. You may have heard about Flask, FastAPI is similar, but simpler to code, more specialized towards APIs, and faster.

Streamlit is an easy tool to create web applications. It is easy, I really mean it. And applications turn out to be nice and interactive — with images, plots, input windows, buttons, sliders,… Streamlit offers Community Cloud where you can publish apps for free.

Cloud Platforms. Google and Amazon do a great job making the deployment process painless and accessible. They offer paid end-to-end solutions to train and deploy models (storage, compute instance, API, monitoring tool, workflows,…).

Like all software systems in production, ML systems must be monitored. It helps quickly detect and localize bugs and prevent catastrophic system failures.

Technically, monitoring means collecting logs, calculating metrics from them, displaying these metrics on dashboards like Grafana, and setting up alerts for when metrics fall outside expected ranges.

What metrics should be monitored? Since an ML system is a subclass of a software system, start with operational metrics. Examples are CPU/GPU utilization of the machine, its memory, and disk space; number of requests sent to the application and response latency, error rate; network connectivity.

While operational metrics are about machine, network, and application health, ML-related metrics check model accuracy and input consistency.

Accuracy is the most important thing we care about. This means the model might still return predictions, but those predictions could be entirely off-base, and you won’t realize it until the model is evaluated.

Why could model accuracy drop at all? The most widespread reason is that production data has drifted from training/test data. To automatically detect data drift (it is also called…



Source link

Tags: BuildingChapterChernytskaDeploymentmodelOlgaSepSystems
Previous Post

What is TensorFlow and how does it work?

Next Post

How to Create CSS Ribbon Shapes with a Single Element — SitePoint

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
How to Create CSS Ribbon Shapes with a Single Element — SitePoint

How to Create CSS Ribbon Shapes with a Single Element — SitePoint

Micron’s mixed guidance weighs on shares but analysts see pullback as a buying opportunity By Investing.com

Micron's mixed guidance weighs on shares but analysts see pullback as a buying opportunity By Investing.com

How to Improve Your Instagram Results: Small Changes With Big Impacts

How to Improve Your Instagram Results: Small Changes With Big Impacts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Turkish Airlines Marketing Strategy: Beyond “Globally Yours”

Turkish Airlines Marketing Strategy: Beyond “Globally Yours”

May 29, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In