Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

How to Deploy Apache Airflow on Vultr Using Anaconda — SitePoint

May 14, 2024
in Cloud & Programming
Reading Time: 6 mins read
0 0
A A
0
Share on FacebookShare on Twitter



In this article, we’re going to deploy an Airflow application in a Conda environment and secure the application using Nginx and request SSL certificate from Let’s Encrypt. Airflow is a popular tool that we can use to define, schedule, and monitor our complex workflows. We can create Directed Acyclic Graphs (DAGs) to automate tasks across our work platforms, and being open-source, Airflow has a community to provide support and improve continuously. This is a sponsored article by Vultr. Vultr is the world’s largest privately-held cloud computing platform. A favorite with developers, Vultr has served over 1.5 million customers across 185 countries with flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. Learn more about Vultr.

Deploying a Server on Vultr
Let’s start by deploying a Vultr server with the Anaconda marketplace application. Sign up and log in to the Vultr Customer Portal. Navigate to the Products page. Select Compute from the side menu. Click Deploy Server. Select Cloud Compute as the server type. Choose a Location. Select Anaconda amongst marketplace applications. Choose a Plan. Select any more features as required in the “Additional Features” section. Click the Deploy Now button.

Creating a Vultr Managed Database
After deploying a Vultr server, we’ll next deploy a Vultr-managed PostgreSQL Database. We’ll also create two new databases in our database instance that will be used to connect with our Airflow application later in the blog. Open the Vultr Customer Portal. Click the Products menu group and navigate to Databases to create a PostgreSQL managed database. Click Add Managed Databases. Select PostgreSQL with the latest version as the database engine. Select Server Configuration and Server Location. Write a Label for the service. Click Deploy Now. After the database is deployed, select Users & Databases. Click Add New Database. Type in a name, click Add Database and name it airflow-pgsql. Repeat steps 9 and 10 to add another database in the same managed database and name it airflow-celery.

Getting Started with Conda and Airflow
Now that we’ve created a Vultr-managed PostgreSQL instance, we’ll use the Vultr server to create a Conda environment and install the required dependencies.

Check for the Conda version:
Create a Conda environment:
“`html
$ conda create -n airflow python=3.8
“`
Activate the environment:
Install Redis server:
“`html
(airflow) $ apt install -y redis-server
“`
Enable the Redis server:
“`html
(airflow) $ sudo systemctl enable redis-server
“`
Check the status:
“`html
(airflow) $ sudo systemctl status redis-server
“`
Install the Python package manager:
“`html
(airflow) $ conda install pip
“`
Install the required dependencies:
“`html
(airflow) $ pip install psycopg2-binary virtualenv redis
“`
Install Airflow in the Conda environment:
“`html
(airflow) $ pip install “apache-airflow[celery]==2.8.1” –constraint “https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt”
“`

Connecting Airflow with Vultr Managed Database
After preparing the environment, now let’s connect our Airflow application with the two databases we created earlier within our database instance and make necessary changes to the Airflow configuration to make our application production-ready.

Set environment variable for database connection:
“`html
(airflow) $ export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=”postgresql://user:password@hostname:port/db_name”
“`
Make sure to replace the user, password, hostname, and port with the actual values in the connection details section by selecting the airflow-pgsql database. Replace the db_name with airflow-pgsql.

Initialize the metadata database. We must initialize a metadata database for Airflow to create necessary tables and schema that stores information like DAGs and information related to our workflows:
“`html
(airflow) $ airflow db init
“`
Open the Airflow configuration file:
“`html
(airflow) $ sudo nano ~/airflow/airflow.cfg
“`
Scroll down and change the executor:
executor = CeleryExecutor

Link the Vultr-managed PostgreSQL database, and change the value of sql_alchemy_conn:
sql_alchemy_conn = “postgresql://user:password@hostname:port/db_name”
Make sure to replace the user, password, hostname, and port with the actual values in the connection details section by selecting the airflow-pgsql database. Replace the db_name with airflow-pgsql.

Scroll down and change the worker and trigger log ports:
worker_log_server_port = 8794
trigger_log_server_port = 8795

Change the broker_url:
broker_url = redis://localhost:6379/0

Remove the # and change the result_backend:
result_backend = db+postgresql://user:password@hostname:port/db_name
Make sure to replace the user, password, hostname, and port with the actual values in the connection details section by selecting the airflow-celery database. Replace the db_name with airflow-celery.

Save and exit the file.

Create an Airflow user:
“`html
(airflow) $ airflow users create \n –username admin \n –firstname Peter \n –lastname Parker \n –role Admin \n –email spiderman@superhero.org
“`
Make sure to replace all the variable values with the actual values. Enter a password when prompted to set it for the user while accessing the dashboard.

Daemonizing the Airflow Application
Now let’s daemonize our Airflow application so that it runs in the background and continues to run independently even when we close the terminal and log out. These steps will also help us to create a persistent service for the Airflow webserver, scheduler, and celery workers.

View the airflow path:
“`html
(airflow) $ which airflow
“`
Copy and paste the path into the clipboard.

Create an Airflow webserver service file:
“`html
(airflow) $ sudo nano /etc/systemd/system/airflow-webserver.service
“`
Paste the service configurations in the file. airflow webserver is responsible for providing a web-based user interface that will allow us to interact and manage our workflows. These configurations will make a background running service for our Airflow webserver:
“`html
[Unit] Description=”Airflow Webserver” After=network.target [Service] User=example_user Group=example_user ExecStart=/home/example_user/.local/bin/airflow webserver [Install] WantedBy=multi-user.target
“`
Make sure to replace User and Group with your actual non-root sudo user account details, and replace the ExecStart path with the actual Airflow path including the executable binary we copied earlier in the clipboard. Save and close the file.

Enable the airflow-webserver service, so that the webserver automatically starts up during the system boot process:
“`html
(airflow) $ systemctl enable airflow-webserver
“`
Start the service:
“`html
(airflow) $ sudo systemctl start airflow-webserver
“`
Make sure that the service is up and running:
“`html
(airflow) $ sudo systemctl status airflow-webserver
“`
Our output should appear like the one pictured below.

Create an Airflow Celery service file:
“`html
(airflow) $ sudo nano /etc/systemd/system/airflow-celery.service
“`
Paste the service configurations in the file. airflow celery worker starts a Celery worker. Celery is a distributed task queue that will allow us to distribute and execute tasks across multiple workers. The workers connect to our Redis server to receive and execute tasks:
“`html
[Unit] Description=”Airflow Celery” After=network.target [Service] User=example_user Group=example_user ExecStart=/home/example_user/.local/bin/airflow celery worker [Install] WantedBy=multi-user.target
“`
Make sure to replace User and Group with your actual non-root sudo user account details, and replace the ExecStart path with the actual Airflow path including the executable binary we copied earlier in the clipboard. Save and close the file.

Enable the airflow-celery service:
“`html
(airflow) $ sudo systemctl enable airflow-celery
“`
Start the service:
“`html
(airflow) $ sudo systemctl start airflow-celery
“`
Make sure that the service is up and running:
“`html
(airflow) $ sudo systemctl status airflow-celery
“`
Create an Airflow scheduler service file:
“`html
(airflow) $ sudo nano /etc/systemd/system/airflow-scheduler.service
“`
Paste the service configurations in the file. airflow scheduler is responsible for scheduling and triggering the DAGs and the tasks defined in them. It also checks the status of DAGs and tasks periodically:
“`html
[Unit] Description=”Airflow Scheduler” After=network.target [Service] User=example_user Group=example_user ExecStart=/home/example_user/.local/bin/airflow scheduler [Install] WantedBy=multi-user.target
“`
Make sure to replace User and Group with your actual non-root sudo user account details, and replace the ExecStart path with the actual Airflow path including the executable binary we copied earlier in the clipboard. Save and close the file.

Enable the airflow-scheduler service:
“`html
(airflow) $ sudo systemctl enable airflow-scheduler
“`
Start the service:
“`html
(airflow) $ sudo systemctl start airflow-scheduler
“`
Make sure that the service is up and running:
“`html
(airflow) $ sudo systemctl status airflow-scheduler
“`
Our output should appear like that pictured below.

Setting up Nginx as a Reverse Proxy
We’ve created persistent services for the Airflow application, so now we’ll set up Nginx as a reverse proxy to enhance our application’s security and scalability following the steps outlined below.

Log in to the Vultr Customer Portal. Navigate to the Products page. From the side menu, expand the Network drop down, and select DNS. Click the Add Domain button in the center. Follow the setup procedure to add your domain name by selecting the IP address of your server. Set the following hostnames as your domain’s primary and secondary nameservers with your domain registrar: ns1.vultr.com ns2.vultr.com

Install Nginx:
“`html
(airflow) $ apt install nginx
“`
Make sure to check if the Nginx server is up and running:
“`html
(airflow) $ sudo systemctl status nginx
“`
Create a new Nginx virtual host configuration file in the sites-available directory:
“`html
(airflow) $ sudo nano /etc/nginx/sites-available/airflow.conf
“`
Add the configurations to the file. These configurations will direct the traffic on our application from the actual domain to the backend…



Source link

Tags: AirflowAnacondaApachedeploySitePointVultr
Previous Post

New compute-optimized (C7i-flex) Amazon EC2 Flex instances

Next Post

Watermarking AI-generated text and video with SynthID

Related Posts

Top 20 Javascript Libraries You Should Know in 2024
Cloud & Programming

Top 20 Javascript Libraries You Should Know in 2024

June 10, 2024
Simplify risk and compliance assessments with the new common control library in AWS Audit Manager
Cloud & Programming

Simplify risk and compliance assessments with the new common control library in AWS Audit Manager

June 6, 2024
Simplify Regular Expressions with RegExpBuilderJS
Cloud & Programming

Simplify Regular Expressions with RegExpBuilderJS

June 6, 2024
How to learn data visualization to accelerate your career
Cloud & Programming

How to learn data visualization to accelerate your career

June 6, 2024
BitTitan Announces Seasoned Tech Leader Aaron Wadsworth as General Manager
Cloud & Programming

BitTitan Announces Seasoned Tech Leader Aaron Wadsworth as General Manager

June 6, 2024
Copilot Studio turns to AI-powered workflows
Cloud & Programming

Copilot Studio turns to AI-powered workflows

June 6, 2024
Next Post
Watermarking AI-generated text and video with SynthID

Watermarking AI-generated text and video with SynthID

Introducing Veo and Imagen 3 generative AI tools

Introducing Veo and Imagen 3 generative AI tools

Flash 1.5, Gemma 2 and Project Astra

Flash 1.5, Gemma 2 and Project Astra

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Graph neural networks in TensorFlow – Google Research Blog

Graph neural networks in TensorFlow – Google Research Blog

February 6, 2024
13 Best Books, Courses and Communities for Learning React — SitePoint

13 Best Books, Courses and Communities for Learning React — SitePoint

February 4, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In