Open in app

Sign In

Write

Sign In

Amit Singh Rathore
Amit Singh Rathore

2.1K Followers

Home

About

Published in Dev Genius

·Mar 17

[15 more] Signs of a professional Pyhton programmer

Following up to the previous blog Click Here for Part I Using formatting with f — F string is a better & preferred way of string formatting over .format. This has support for variable substitution, expression evaluation, decimal rounding & precision. Note, F-String has one catch, the variable needs to…

Python

2 min read

Python

2 min read


Published in Dev Genius

·Mar 15

[Solution] Spark — debugging a slow Application

Follow up blog to fix slow jobs This blog is a follow-up to this blog where I list reasons for slow Spark Job. Input / Source Input Layout Partitioned data The right partitioning scheme allows Spark to read only specific data. Also since partitioned columns are not stored along with data it results in fewer data…

Spark

4 min read

[Solution] Spark — debugging a slow Application
[Solution] Spark — debugging a slow Application
Spark

4 min read


Published in Dev Genius

·Mar 13

Spark — debugging a slow Application

Reasons that make an application slow Spark has a lot of native optimization tricks (like Catalyst, CBO, AQE, Dynamic Allocation, and Speculation) up its sleeves to make the job run faster. Still many a time we will see our jobs getting slow & slow…

Spark

4 min read

Spark — debugging a slow Application
Spark — debugging a slow Application
Spark

4 min read


Published in Dev Genius

·Mar 12

Spark Errors — Uncluttered

Understanding spark errors When any Spark application fails, we should identify the errors and exceptions that caused the failure. We can find the exception messages in the spark driver or executor logs. Useful information is also logged into the spark UI. At times the error messages could mean different things…

Spark

4 min read

Spark Errors — Uncluttered
Spark Errors — Uncluttered
Spark

4 min read


Published in Dev Genius

·Mar 11

Spark — Spill

A side effect Spark does data processing in memory. But not everything fits in memory. When data in the partition is too large to fit in memory it gets written to disk. Spark does this to free up memory in the RAM for the remaining tasks within the job. It…

Apache Spark

2 min read

Spark — Spill
Spark — Spill
Apache Spark

2 min read


Published in Dev Genius

·Mar 10

Shuffle in Spark

Data rearrangement in partitions Shuffle is the process of re-distributing data between partitions for operation where data needs to be grouped or seen as a whole. Shuffle happens whenever there is a wide transformation. In Spark DAG (Operator Graph), two stages are separated by shuffle boundaries. …

Spark

4 min read

Shuffle in Spark
Shuffle in Spark
Spark

4 min read


Published in Dev Genius

·Mar 8

Spark partitioning

Controlling the number of partitions in spark for parallelism A partition in spark is a logical chunk of data mapped to a single node in a cluster. Partitions are basic units of parallelism. Each partition is processed by a single task slot. In a multicore system, total slots for tasks…

Spark

5 min read

Spark partitioning
Spark partitioning
Spark

5 min read


Published in Dev Genius

·Feb 28

Airflow & SIGTERM

The Tom & Jerry of pipeline SIGTERM signal is a generic signal (request) used to cause program termination. If the program has a signal handler for SIGTERM that does not actually terminate the application, this kill may have no effect. …

Airflow

4 min read

Airflow & SIGTERM
Airflow & SIGTERM
Airflow

4 min read


Published in Dev Genius

·Feb 27

Spark Event Listeners

A way to know what is happening with spark Jobs Spark has many counters that it uses during job execution. …

Apache Spark

5 min read

Spark Event Listeners
Spark Event Listeners
Apache Spark

5 min read


Published in Dev Genius

·Feb 26

Spark execution environments

Popular choices for running spark applications Spark can be deployed in multiple ways. It can be deployed as a standalone application locally. It can be deployed on a cluster. While choosing the cluster manager we also have options like YARN, K8s, Mesos & any other custom solutions. …

Spark

6 min read

Spark execution environments
Spark execution environments
Spark

6 min read

Amit Singh Rathore

Amit Singh Rathore

2.1K Followers

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML

Following
  • Tony

    Tony

  • Capital One Tech

    Capital One Tech

  • JIN

    JIN

  • Yan Cui

    Yan Cui

  • SystemDesign.one

    SystemDesign.one

See all (88)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech