Image for post
Image for post
Photo by Deva Darshan on Unsplash

In the blog post, we will see some best practices for authoring DAGs. Let’s start.

DAG as configuration file

Airflow scheduler scans and compile DAG files at each heartbeat. If DAG files are heavy and lot of top level codes are present in it, scheduler will consume lot of resources and time to process them at each heartbeat. So it is advised to keep the DAGs light, more like a configuration file. As an step forward it will be a good choice to have YAML/JSON based definition of workflow and then generating the DAG based off that. This has double…

Image for post
Image for post
Photo by Rob Sheahan on Unsplash

Layers are logical collection of Nodes/Neurons. At the highest level, there are three types of layers in every ANN:

Image for post
Image for post
Photo by Mark Duffel on Unsplash

Regularization is a principle which penalizes complex models so that they can generalizes better. It prevents overfitting. In this blog we will visit common regularization techniques.

Your neural network is only as good as the data you feed it.

Data Augmentation

The performance of deep learning neural networks often improves with the amount of data available. But we don’t usually have huge amount of data. Data augmentation is a technique to artificially create new training data from existing training data. Depending upon when we apply these transformations we have two types of augmentation:

Online — perform all the necessary transformations beforehand


Image for post
Image for post
Photo by Nicholas Rean on Unsplash

Optimizers are methods/algorithms used to modify the attributes like weights & learning rate in order to minimize the loss.

Gradient Based

Batch Gradient Descent — Regression & classification

It computes the gradient of the loss function w.r.t. to the parameters for the entire training dataset.

for i in range(epochs):
param_gradient = evaluate_gradient(loss_function, data, params)
params = params - learning_rate * param_gradient

Computation heavy
Memory intensive

Stochastic Gradient Descent

In SGD the parameter update happens for each training example and label.

for i in range(epochs):
for sample in data:
params_gradient = evaluate_gradient(loss_function, sample, params)
params = params - learning_rate * params_gradient

Image for post
Image for post

Loss functions quantify how good or bad the model is performing. In terms of optimization its an convergence indicator. Choosing the right loss function becomes very critical. In this blog we will see some of the most common loss functions.

from sklearn.metrics import hinge_loss
from sklearn.metrics import log_loss
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from scipy.stats import entropy

Regression Loss

Predicting continuous values

Mean Squared Error —Also known as L2 loss. This is used if you have Gaussian Distribution. This loss function will penalize overestimates heavily. This is one of the most common loss function. …

Image for post
Image for post
Photo by Rodion Kutsaev on Unsplash

Optimization algorithm like Stochastic Gradient Descent(SGD) depends on the initial values of the parameters. Initial values, when chosen wisely, help in avoiding slow convergence, also they ensure that we don’t keep oscillating off the minima. In simple terms weight initialization prevents activation outputs from exploding or vanishing during the forward pass of neural network. In this blog we will look at some initialization techniques:

Image for post
Image for post
Photo by Franki Chamaki on Unsplash

Data platform is an integrated technology solution, which makes data accessible. A good data platform helps in creating information and making that information available in people’s hand who can use this information.

Data is a precious thing and will last longer than the systems themselves.

A working Data platform will consist of five major services/frameworks.

  1. Data Ingestion
  2. Data Storage & Organization framework
  3. Data Computation framework
  4. Data Security Framework
  5. Data Governance & Quality framework
  6. Data consumption services

Data is produced by systems at various places and in various formats. A good data platform should have a Ingestion service with plug and…

Image for post
Image for post
Photo by Meagan Carsience on Unsplash

Activation function, also known as Transfer functions, decide whether input to the neuron is relevant or not. These functions are applied at hidden layer to introduce nonlinearity. This nonlinearity helps in understanding complex relationships. Activation functions are also used at output layer.

In my previous blog we looked at basics of Airflow. This blog will cover some advance topics.

Airflow allows missed DAG Runs to be scheduled again so that the pipelines catchup on the schedules that were missed for some reason. It also allows rerunning of DAGs in back date manually & backfill those runs. Backfill & Catch up are confusing at first glance. In this blog we will understand the concepts. But before we start on these we need to refresh about “start_date” and “execution_date”.

Start Date & Execution Date

start_date date at which DAG will start being scheduled

schedule_interval the interval of…

Image for post
Image for post
Photo by tian kuan on Unsplash

Airflow is an open source workflow management platform. We define those workflows with DAGs/“Configuration as code” written in python.

Apache Airflow at high level have following components talking to each other.

Amit Singh Rathore

Cloud | ML | Big Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store