Skip to main content

Find out more about some of the interesting technologies and projects beachgeek consulting can help you with by reading some of our in depth technical blogs.

Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part One

update: I have changed the post to use standard Apache Airflow variables rather than using AWS Secrets Manager.

Part One - Automating Amazon Athena

As part of an upcoming DevDay event, I have been working on how you can use Apache Airflow to help automate your Extract, Load and Transform (ELT) Workflows. Amazon Athena and Amazon EMR are two AWS services that help customers who have existing SQL skills/expertise and are looking at tools such as Presto or Apache Hive when undertaking those transformations.

Read more →

Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part Two

Part Two - Automating Amazon EMR

In Part One, we automated an example ELT workflow on Amazon Athena using Apache Airflow. In this post, Part Two, we will do the same thing but automate the same example ELT workflow using Amazon EMR.

Make sure you recap the setup from Part One. All the code so you can reproduce this yourself can be found in the GitHub repository here.

Automating Amazon EMR

To recap: We are using the Movielens dataset, loaded it into our data lake on Amazon S3 and we have been asked to a) create a new table with a subset of the information we care about, in this instance a particular genre of films, and b) create a new file with the same subset of information available in the data lake.

Read more →

Monitoring and logging with Amazon Managed Workflows for Apache Airflow

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 6, where to find logs to help you understand and troubleshoot your Apache Airflow workflows, and how you can monitor your Apache Airflow environments. Specifically I will cover a couple of things:

Read more →

A simple CI/CD system for your Amazon Managed Workflows for Apache Airflow development workflow

updated Feb 19th

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 5, how you can setup a very simple CI/CD setup to enable faster development of your Apache Airflow DAGs. Specifically I will cover a couple of things:

Read more →

Interacting with Amazon Managed Workflows for Apache Airflow via the command line

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 4, how you can interact and access the Apache Airflow via the command line. Specifically I will cover a couple of things:

Read more →

Accessing your Amazon Managed Workflows for Apache Airflow environments

innovate

Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 3, how you can interact and access the Apache Airflow environments. Specifically I will cover a couple of things:

Read more →

Working with permissions in Amazon Managed Workflows for Apache Airflow

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 2, how to ensure that you control access to Apache Airflow following best practices such as default no access/least privilege.

Read more →

Automating the installation and configuration of Amazon Managed Workflows for Apache Airflow

updated, August 25th Thanks to Philip T for spotting a typo in the cloudformation code below - it is ok in the GitHub repo, but I have fixed it now below.

innovate Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here

In this post I will be covering Part 1, automating the installation and configuration of Managed Workflows for Apache Airflow (MWAA).

Read more →

TIL: Testing an Amazon Cloudwatch alarm

Today I was setting up an application load balancer that sits in front of a test application I have put together. Setting this up was super easy, and very quickly I had my domain pointing to the alias and serving requests.

As part of the setup, I wanted to monitor the application load balancer to let me know when requests were failing to the downstream application (anything other than an HTTP 200) and so I set this up super easily in Amazon Cloudwatch. I now had monitoring and a nice dashboard that gave me the health of the application from the application load balancer perspective.

Read more →

Amazon Aurora - setting up and configuration, four ways

In this post I want to share four different approaches to installing and configuring your Amazon Aurora database clusters.

Everything in this post is covered in detail in the embedded video, but I wanted to share some additional information that I did not include in the video that was easier done in this blog.

{% youtube wZfh9PurE9E %}

Why four ways?

The approach in the video was to look at the journey you might take when learning a new technology and then how you move to productise that technology. One of the principal building blocks of creating modern applications is that you move to repeatable and reproducible environments and the move towards Infrastructure as Code.

Read more →