Skip to main content

Whilst working on various projects for customers, we share insights, lessons learned, or practical hands on guidance. These blogs are provided “as is”, and you should use good judgement when following.

Working with the RedshiftToS3Transfer operator and Amazon Managed Workflows for Apache Airflow

Introduction

Inspired by a recent conversation within the Apache Airflow open source slack community, I decided to channel the inner terrier within me to tackle this particular issue, around getting an Apache Airflow operator (the protagonist for this post) to work.

I found the perfect catalyst in the way of the original launch post of Amazon Managed Workflows for Apache Airflow (MWAA). As is often the way, diving into that post (creating a workflow to take some source files, transform them and then move them into Amazon Redshift) led me down some unexpected paths to here, this post.

Read more β†’

Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment

update I am grateful to Michael Grabenstein for spotting some mistakes in the original post/code. I hope these have now been rectified in this post.

Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment

What better way to celebrate CDK Day than to return to a previous blog where I wrote about automating the installation and configuration of Amazon Managed Workflows for Apache Airflow (MWAA), and take a look at doing the same thing but this time using AWS CDK.

Read more β†’

Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part One

update: I have changed the post to use standard Apache Airflow variables rather than using AWS Secrets Manager.

Part One - Automating Amazon Athena

As part of an upcoming DevDay event, I have been working on how you can use Apache Airflow to help automate your Extract, Load and Transform (ELT) Workflows. Amazon Athena and Amazon EMR are two AWS services that help customers who have existing SQL skills/expertise and are looking at tools such as Presto or Apache Hive when undertaking those transformations.

Read more β†’

Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part Two

Part Two - Automating Amazon EMR

In Part One, we automated an example ELT workflow on Amazon Athena using Apache Airflow. In this post, Part Two, we will do the same thing but automate the same example ELT workflow using Amazon EMR.

Make sure you recap the setup from Part One. All the code so you can reproduce this yourself can be found in the GitHub repository here.

Automating Amazon EMR

To recap: We are using the Movielens dataset, loaded it into our data lake on Amazon S3 and we have been asked to a) create a new table with a subset of the information we care about, in this instance a particular genre of films, and b) create a new file with the same subset of information available in the data lake.

Read more β†’

Monitoring and logging with Amazon Managed Workflows for Apache Airflow

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 6, where to find logs to help you understand and troubleshoot your Apache Airflow workflows, and how you can monitor your Apache Airflow environments. Specifically I will cover a couple of things:

Read more β†’

A simple CI/CD system for your Amazon Managed Workflows for Apache Airflow development workflow

updated Feb 19th

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 5, how you can setup a very simple CI/CD setup to enable faster development of your Apache Airflow DAGs. Specifically I will cover a couple of things:

Read more β†’

Interacting with Amazon Managed Workflows for Apache Airflow via the command line

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 4, how you can interact and access the Apache Airflow via the command line. Specifically I will cover a couple of things:

Read more β†’

Accessing your Amazon Managed Workflows for Apache Airflow environments

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 3, how you can interact and access the Apache Airflow environments. Specifically I will cover a couple of things:

Read more β†’

Working with permissions in Amazon Managed Workflows for Apache Airflow

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 2, how to ensure that you control access to Apache Airflow following best practices such as default no access/least privilege.

Read more β†’

Automating the installation and configuration of Amazon Managed Workflows for Apache Airflow

updated, August 25th Thanks to Philip T for spotting a typo in the cloudformation code below - it is ok in the GitHub repo, but I have fixed it now below.

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 1, automating the installation and configuration of Managed Workflows for Apache Airflow (MWAA).

Read more β†’