Image for post
Image for post
Photo by Daniele Levis Pelusi on Unsplash

Confusion Matrix is a tabular summary of prediction result of a classification problem. It is a table with different combinations of predicted and actual values. The below image summarizes those combinations. Here rows represent actual value while columns represent predicted values.

Image for post
Image for post
Image source

Cloud adoption is like dating. If you rush, a security incident will be around the corner. To avoid that we need to have a good strategy. In this blog we will see some pointers to have a secure relationship with cloud.

Infrastructure as Code

Manual deployments are prone to misconfigurations. We need to develop a culture of minimizing console usage for deployments. All deployments in non-POC environments should come from a Provisioning/Deployment engine which takes code as input. This can be CFN based or DSL based (terraform or troposphere) or any other custom solution.


Identify & Set Actionable security perspective goals

We should have a list of controls to check against and write Infrastructure code such that Security is built into the system. For example we can implement intelligent Identity and Network based guardrails while we write our IaC. Below are some possible & actionable categories of the goals. …

Image for post
Image for post
Photo by Ayooj Rangaraj on Unsplash

Apache Spark is a unified computing engine/framework for large-scale data processing.


Consume a service from a private network without traversing internet.

This is what a private link is at it core. It allows application in a private subnet of VPC to connect to a service which is not in current VPC without leaving the AWS network. VPC peering allows us to do same but it establishes a connection with a scope which is too big. Also peering does have a restriction of non-overlapping CIDRs. Private Link allows us to only connect with the service we need. It gives control at both end i.e. service provider end as well as consumer end. …

EMR is a managed cluster platform that simplifies running big data frameworks e.g. Hadoop, Spark, Presto on AWS cloud.

EMR Components — Cluster & Node

Image for post
Image for post

Cluster: A cluster is simply collection of EC2 instances called Nodes. Based on the nodes role, they are categorized in three types. Master, Core and Task Node.

Master Node: Manages & monitors the cluster, coordinates the distribution of data and tasks among other nodes. It also tracks the status of tasks. Every

Core Node: Nodes that hold data and execute tasks.

Task Node: Provides extra compute power. Does not hold any data.

EMR Architecture

Amazon EMR service architecture is comprised of four basic layers. …

Image for post
Image for post

NLP is a sub-field of AI which enables computers understand & process human generated text data. In this blog we will learn the basic tasks of NLP and also some applications of NLP.

Text Data Pre-processing

Once we have text, first task that is performed is to pre-process the data.

Sentence Segmentation

Break the text into individual sentences.

Image for post
Image for post
Image source: Google

Logging is the mode an application uses to communicate with the audience. The key in any communication (logging) is to think about your audience and what it needs. That is why we have different log level for different audience (developer, sysadmins etc).

Image for post
Image for post
Photo by Stephen Phillips — on Unsplash

AWS CloudWatch Logs Insights is an SQL like interactive solution for querying, analysing & visualising log-data from cloudWatch. Cloudwatch logs can be VPC flow log, cloudTrail logs, Contact Flow Logs, RDS Logs, Service specificlogs, or custom application logs.

Image for post
Image for post
Photo by Glen Carrie on Unsplash

In this blog we will try to build a miniature version of AWS Simple Notification Service.

  1. Problem Statement
  2. Requirement Gathering
  3. Building MVP
  4. Building for Resiliency and Availability

Problem Statement

Simple notification service allows user to publish messages to topic. A user subscribe to topic(s). Whenever a message is published to the topic by publisher, subscriber receives the message published in the topic. Both publisher and consumer are unaware of each other. They do not communicate directly.

We are required to Design SNS service that clients all over the world can use to read and write messages.

Requirement gathering:

User should be able to publish messages

Message Order— Message order must be maintained
Grouping by topic — Message must be grouped by topic. …

Stack is a collection of AWS resources which we can manage as a single unit. We can perform operations like CREATE, DELETE and UPDATE on this unit. While we perform these operation the stack transitions from one state to another state. Like from CREATE_IN_PROGRESS to CREATE_COMPLETE. Knowing the transition states helps in debugging any issues. This blog tries to depict the state transition visually.

Image for post
Image for post

At a high level, we perform CREATE | UPDATE | DELETE operation for stack. First we create a stack using some template. Once created we can either delete or update the stack. Also we can delete an stack after update. Internally during the each of the above mentioned operation stack transition through multiple state. …


Amit Singh Rathore

Cloud | ML | Big Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store