Skip to main content

Whilst working on various projects for customers, we share insights, lessons learned, or practical hands on guidance. These blogs are provided “as is”, and you should use good judgement when following.

Contributing to the Apache Airflow project - Part Two

This is the second and concluding post providing an overview of the experience and journey contributing to the Apache Airflow project. You can catch Part One here.

Contributing to Apache Airflow - Part Deux

In Part One of this series, we took our first steps in contributing to the Apache Airflow project. With a little bit more knowledge and experience, our first interactions with the Airflow community, we are ready to start exploring how the code works and see how we might go about fixing this.

Read more β†’

Orchestrating hybrid workflows using Amazon Managed Workflows for Apache Airflow (MWAA)

Using Apache Airflow to orchestrate hybrid workflows

In some recent discussions with customers, the topic of how open source is increasingly being used as a common mechanisms to help build re-usable solutions that can protect investments in engineering and development time, skills and that work across on premises and Cloud environment. In 2021 my most viewed blog post talked about how you can build and deploy containerised applications, anywhere (Cloud, your data centre, other Clouds) and on anything (Intel and Arm). I wanted to combine the learnings from that post (and the code) and apply it to another topic I have been diving deeper into, Apache Airflow. I wanted to explore how you can combine the two to see how you can start to build data pipelines that work across hybrid architectures seamlessly.

Read more β†’

Contributing to the Apache Airflow project - Part One

Contributing to Apache Airflow

Introduction

In this series of posts, I am going to share what I learn as embark on my first upstream contribution to the Apache Airflow project. The purpose is to show you how typical open source projects like Apache Airflow work, how you engage with the community to orchestrate change and hopefully inspire more people to contribute to this open source project. I will post regular updates as a series of posts, as the journey unfolds.

Read more β†’

Running my dev.to blog using Hugo on Netlify

Running my dev.to blog using Hugo on Netlify

I am a big fan of dev.to, and the work that the team do to foster a great community of builders is something that keeps me there. I have always maintained another blog (running on Netlify, which is also super awesome), kind of like a mirror. Up until last year, I was able to publish to dev.to and it would take care of publishing to that mirror. It was super easy - I write in markdown locally using Macdown, so I would create my posts, publish to dev.to and it would automagically sync up to my mirror.

Read more β†’

Setting up MWAA to use a KMS key

Introduction

In a previous post, I shared how you can using AWS CDK to provision your Apache Airflow environments using the Managed Workflows for Apache Airflow service (MWAA).

I was contacted this week by Michael Grabenstein, who flagged an issue with the code in that post. The post used code that configured a kms key for the MWAA environment, but when trying to deploy the app it would fail with the following error:

Read more β†’

Integrating Amazon Timestream in your Amazon Managed Workflows for Apache Airflow v2.x

Integrating with Amazon Timestream in your Apache Airflow DAGs

Amazon Timestream is a fast, scalable, and serverless time series database service perfect for use cases that generate huge amounts of events per day, optimised to make it faster and more cost effective that using relational databases.

I have been playing around with Amazon Timestream to prepare for a talk I am doing with some colleagues, and wanted to see how I could integrate it with other AWS services in the context of leveraging some of the key capabilities of Amazon Timestream. For example, you might have a use case where you want to benefit from some of the powerful capabilities of the Timestream query engine to create/export data that you want to store within a data lake. Maybe you need just a subset of the data within a data warehouse such as Amazon Redshift, or perhaps you need to make the data available within Timestream to other systems and applications.

Read more β†’

Reading and writing data across different AWS accounts with Amazon Managed Workflows for Apache Airflow v2.x

Reading and writing data across different AWS accounts in you Apache Airflow DAGs

As regular readers will know, I sometimes lurk in the Apache Airflow slack channel to see what is going on. If you are new to Apache Airflow, or want to get a deeper understanding then I highly recommend spending some time here. The community is super welcoming and eager to help new participants.

It was during a recent session I came across an interesting problem that one of the builders was having, which was how to access (read/write) data in an S3 bucket which was in a different account to the one hosting Amazon Managed Workflows for Apache Airflow (MWAA).

Read more β†’

Working with parameters and variables in Amazon Managed Workflows for Apache Airflow

Maximising the re-use of your DAGs in MWAA

During some recently conversations with customers, one of the topics that they were interested in was how to create re-usable, parameterised Apache Airflow workflows (DAGs) that could be executed dynamically through the use variables and/or parameters (either submitted via the UI or the command line). This makes a lot of sense, as you may find that you repeat similar tasks in your workflows, and so this approach allows you to maximise the re-use of that work.

Read more β†’

Creating a multi architecture CI/CD solution with Amazon ECS and ECS Anywhere

Please let me know how I can improve posts such as this one, by completing this very short survey. $25 AWS credits will be provided for the first 20 completed - take the survey

Organisations are moving their workloads to the cloud as quickly as they can. While most applications can be easily migrated to the cloud, some applications need to remain on-premises due to low-latency or data sovereignty requirements.

Regardless of where workloads may reside, organisations want to be able to develop once and be able to deploy workloads to the cloud or on-premises in an agile and consistent fashion using a common set of APIs to manage and operate.

Read more β†’

Working with Amazon EKS and Amazon Managed Workflows for Apache Airflow v2.x

Introduction

The Apache Airflow slack channel is a vibrant community of open source builders that is a great source of feedback, knowledge and answers to problems and use cases you might have when trying to do stuff with Apache Airflow. This week I picked up on someone seeing errors with Amazon EKS, and so I thought what better time to try out the new Apache Airflow 2.x version that was recently launched in Amazon Managed Workflows for Apache Airflow (MWAA).

Read more β†’