Recently I was involved in moving 180 DAGs from our own EC2 based airflow installation to EKS hosted airflow installation. Since we were very short on time we did a lift and shift move of DAGs on the new infrastructure. In the new installation, we made a change to use AWS SSM as our backend for getting secrets.
Binary Search is a divide and conquer algorithm which breaks down the search space in two halves, after each iteration. In each iteration, it discards half of the search space.
Snowflake integration objects enable us to connect with external systems from Snowflake. In my last blog, we went through how to create an API integration to invoke an AWS Lambda. In this blog, we will see how we can integrate AWS S3 so that we can access data stored and query it in snowflake.
Variables and connections in Airflow make the installation plug & play and also, these decide which system to talk to for various purposes. In a large system, these will grow very fast, and managing their lifecycle (creating, updating) becomes challenging. As a data platform provider, we allow each product team to have its own self-serve Airflow installation. So setting these variables and connections needed centralization, so that we can reuse common ones and it's easy to manage.
Airflow provides one alternative and two built-in backends in the descending order of priority as below:
To use a third…
Auto Scaling is one of the prime features of the cloud. And AWS Elastic Map-reduce is no exception to that. With the mix of instances and purchasing options (on-demand and spot), we can achieve faster job turnarounds at a reasonable price.
EMR comes with Instance groups (IG) & Instance fleets options for scaling. Earlier giving control to you, while later is fully managed by AWS. In this blog, we will set up an Instance group.
First, we need to create IG. We can have 1 IG core and many for Task instances. …
Recently I and Supreeth Chandrashekar collaborated on a task, where we wanted to run multiple inferences on some of data stored in Snowflake. We leveraged external function to trigger lambda which in turn got the inference from AWS Sagemaker endpoints(serving different models). In this blog, we will discuss how to invoke lambda from Snowflake.
Python has three Jump operators which can move the program control from one place to another. In this blog, we go through these three jump operators.
Pass — The
pass acts as a placeholder, a syntactic sugar. Whenever the python interpreter encounters pass it does nothing. This feature allows the enclosing construct to be a valid one.
Continue — The
continue operator ignores every code (following the pass statement) in the innermost loop and continues the next iteration of the loop
Break — The
break operator breaks out of the innermost enclosing loop.
Let us see the use of three…
Recently due to some updates from our security team, we had to install an agent on all our hosts in the AWS account. We run 1100+ EC2 instances in our account. These servers have varied OS (Amazon Linux, Fedora, CentOS, RHEL, Ubuntu, Windows, FreeBSD, etc. Also, these servers power various workloads like EMR (various versions), EKS, ECS, Airflow, Tableau, Alation. Many of these are vendor configured servers that have their own AMIs. Creating AMIs for each type with the agent would have taken a long time and a huge effort. …
In this blog, we will go through a medium-tagged question. This question is frequently asked in interviews for Amazon & Facebook.
Given an array of meeting time intervals
intervals[i] = [starti, endi], return the minimum number of conference rooms required.
Input: intervals = [[9,10],[4,9],[5,17]]
We need to find how many concurrent meetings are happening. Once we have that info we can find the no of meeting rooms required. To find the concurrent meetings at any given time we need to know how many meetings’ start and end time covers that.
We can do a brute-force…
Today’s question is from Daily Leetcode Coding Challenge — May Edition. It is a medium-tagged question. Let us look into the problem statement.
Evaluate the value of an arithmetic expression in Reverse Polish Notation.
Valid operators are
/. Each operand may be an integer or another expression.
Note that the division between two integers should truncate toward zero.
Input: tokens = ["2","1","+","3","*"]
Explanation: ((2 + 1) * 3) = 9
From the above example, we understand that the last two elements become operand whenever we encounter an operator. Also, once we have computed the…