Skip to main content

Whilst working on various projects for customers, we share insights, lessons learned, or practical hands on guidance. These blogs are provided “as is”, and you should use good judgement when following.

Getting mwaa-local-runner up on AWS Cloud9

Here is a quick recipe if you are looking to get mwaa-local-runner up and running on your Cloud9 developer setup. This might not be the most optimised way, so I am very happy to received suggestions on how to improve this. What I will cover here is how to deploy mwaa-local-runner onto a standard Cloud9 IDE, deployed in a default VPC.

Updating my AWS Cloud9 environment

The first thing I needed to do was to increase the size of my local disk as Cloud9 only provides 10gb of storage. This is fine for typical use cases, but we are going to be building container images, so need to set this higher.

Read more β†’

Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift

Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift

I was recently looking at some Stack Overflow questions from the AWS Collective and saw a number of folk having questions about the integration between Amazon Redshift and Managed Workflows for Apache Airflow (MWAA). I thought I would put together a quick post that might help folk address what I saw were some of the common challenges.

There is some code that accompanies this post, which you can find at the GitHub repository cdk-mwaa-redshift.

Read more β†’

Self managed Apache Airflow with Data on EKS

I have written in the past about how you can get started with Apache Airflow using the AWS managed service, Managed Workflows for Apache Airflow. But what if you want to self managed Apache Airflow? When I speak with developers, there are sometimes reasons why a managed service might not fit their needs. Some of the common things that come up include:

  • whether you need the increase level of access, a greater level of control of the configuration of Apache Airflow
  • have the need to have the very latest versions or features of Apache Airflow
  • if you have the need to run workflows that use more resources that managed services provide (for example, need significant compute)

Total Cost Ownership One thing to consider when assessing managed vs self managed is the cost of the managed service against the total costs of you having to do the same thing. It is important to assess a true like for like, and we often see just the actual compute and storage resources being compared without all the additional things that you need to make this available.

Read more β†’

VSCode and Apache Airflow

VSCode and Apache Airflow

In this short post, I wanted to highlight how you can use a VSCode plugin to work with a local running instance of Apache Airflow to improve the developer experience. This post was inspired by a tweet from Kaxil Naik who was asking about what features developers are looking for when using VSCode and Pycharm and Apache Airflow.

In this post I will show you how you can configure mwaa-local-runner, an open source project that provides you with an easy way to get a local Apache Airflow environment up and running (that is configuration wide, aligned to the Amazon Managed Workflows for Apache Airflow service MWAA), together with some VSCode plugins.

Read more β†’

sbomqs, an open source tool to quality check your SBOMS

When putting together a previous post on how to use open source tools to create a software bill of materials (SBOM), Ritesh Noronha alerted me to another project, sbomqs that aims to simplify the evaluation of SBOM quality for both producers and consumers. A quality SBOM is one that is accurate, complete, and up-to-date. It should accurately reflect the components and dependencies used in the software application, including their version and optionally any known vulnerabilities. In addition, it should be easily accessible and understandable by stakeholders, such as developers, security teams, and compliance officers. I guess these are some of the heuristics used.

Read more β†’

Building a software bill of materials (SBOM) using open source tools

This is the second post exploring how you can use open source tools to help you build a stronger defence against common software supply chain attacks. In this blog post, I look at syft, an open source CLI tool and Go library for generating a Software Bill of Materials (SBOM) from container images and filesystems. We will use examples and build on the previous post, Getting hands on with Sigstore Cosign on AWS.

Read more β†’

Getting hands on with Sigstore Cosign on AWS

Getting hands on with Sigstore Cosign on AWS

I am currently putting together some content around how you can use a number of open source tools to help build a stronger defence against common software supply chain attacks. In this blog post, I look at emerging tools from Sigstore, and focus in this post on Cosign, a tool that supports container image signing, verification, and storage in an Open Container Initiative (OCI) registry. Cosign aims to make signatures frictionless. I will look at other tools in future posts.

Read more β†’

Configuring the KubernetesPodOperator on Managed Workflows for Apache Airflow (MWAA) - non OIDC Amazon EKS Clusters

Configuring the KubernetesPodOperator on Managed Workflows for Apache Airflow (MWAA) - non OIDC Amazon EKS Clusters

Today I came across an interesting question around the use of the KubernetesPodOperator working on EKS Clusters where you have not configured OIDC. They had followed my blog post, and when it came to running the DAG, they got the following error:

[2023-01-26, 13:03:18 UTC] {{kubernetes_pod.py:566}} INFO - Creating pod mwaa-pod-test.0ab20a7075b84175b2a9a3fe32796f53 with labels: {'dag_id': 'kubernetes_pod_example_iam_authenticator', 'task_id': 'pod-task', 'execution_date': '2023-01-26T130310.1069420000-c39a2d8b8', 'try_number': '1'}
[2023-01-26, 13:03:19 UTC] {{kubernetes_pod.py:612}} ERROR - (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '47ae7378-7037-4bee-851b-0ac9515c8228', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 26 Jan 2023 13:03:19 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

As I still had the environment I built from this blog post I decided to see if I could reproduce the problem.

Read more β†’

Running the KubernetesPodOperator in different AWS accounts when using Amazon Managed Workflows for Apache Airflow v2.x

Running KubernetesPodOperator in different AWS accounts

update August, 14th

I wanted to update to newer version of MWAA, so I have tested the original blog post against EKS 1.24 and MWAA version 2.4.3. I also had a few messages about whether this would work across different AWS regions. The good news is that it does. I have also put together a repo for this here

I thought that I would also check/update that it works for newer versions of MWAA, so I had 2.4.3 up and running so thought I would use that. I did have to update the requirements.txt from the original post below so that it is compatible with Airflow 2.4.3. If you are using newer versions, you will need to make suitable changes. Check your constraints files for the right versions.

Read more β†’

Experimenting with digital lanyards - introducing the Badger2040

Experimenting with digital lanyards

As someone who attends events on a regular basis, I have spent a fair bit of time over the years looking at interesting ways to engage with attendees. One of the problems I was looking to solve was how do I share useful information with attendees without having to interrupt the conversations (something that typically happens as I try and find those links on my mobile phone).

Read more β†’