Open in app

Sign In

Write

Sign In

Amit Singh Rathore
Amit Singh Rathore

3K Followers

Home

Lists

About

Published in

Dev Genius

·2 days ago

Spark Interview Questions IV

Next Installment of the series. Read Part I here. Read Part II here. Read Part III here. What does “lazy evaluation” mean to you? When we tell Spark to work on a particular dataset, it listens to our instructions and writes them down so it doesn’t forget, but it doesn’t do anything until we ask for the…

Spark

5 min read

Spark Interview Questions IV
Spark Interview Questions IV
Spark

5 min read


Published in

Dev Genius

·4 days ago

Spark Interview Questions — III

Another set of questions on Spark Read Part I here. Read Part II here. Read Part IV here. What are the partitioning hints in Spark? There are four partitioning hints available in Spark. COALESCE — This hint is used to reduce the number of partitions REPARTITION — This hint is used to increase or decrease the number…

Apache Spark

5 min read

Spark Interview Questions — III
Spark Interview Questions — III
Apache Spark

5 min read


Published in

Dev Genius

·4 days ago

Data Engineering — On call 6

Issues faced in BAU Data Engineering On call 1 Data Engineering On call 2 Data Engineering On call 3 Data Engineering On call 4 Data Engineering On call 5 ISSUE 1 One of the Spark jobs failed with the following error. After spark application run for a period of time on spark…

Spark

3 min read

Spark

3 min read


Published in

Dev Genius

·Sep 22

Spark Interview Questions — II

Next blog in the Spark Question Interview Series Read Part I here. Read Part III here. What is the potential common issue with below two code snippets? ROW_NUMBER() OVER (order by column_x) repartitionByRange(col1, col2) The above two codes are prone to OOM. In the first code, we are not giving any partition by clause so it will bring everything into a…

Apache Spark

4 min read

Apache Spark

4 min read


Published in

Dev Genius

·Sep 20

Data Engineer On call 5

Issues faced in daily BAU Data Engineering On call 1 Data Engineering On call 2 Data Engineering On call 3 Data Engineering On call 4 ISSUE 1 Spark fails to initialize with the actual database. df = spark.read.format("jdbc"). option("url", "<host url>"). option("dbtable", "CSDTL.MY_TABLE"). option("user", "postgres"). option("password", "<password>"). option("numPartitions", 50). option("fetchsize", 20). load()

Spark

2 min read

Spark

2 min read


Published in

Dev Genius

·Sep 20

Spark — Scheduler Delay

what & why of Spark Scheduler delay In one of the apps in production, we saw an unusual delay in the job. When we looked at the Spark UI we noticed a huge scheduler delay. …

Apache Spark

3 min read

Spark — Scheduler Delay
Spark — Scheduler Delay
Apache Spark

3 min read


Published in

Dev Genius

·Sep 18

Bitwise Manipulation Refresher

Manipulate data at its lowest level. In this blog we some key concepts about bitwise operators that every software engineer should be familiar with. Get ready to get into the world of bits & bytes. In any positional system to represent a number, we use base, which represents the number…

Pytho

7 min read

Bitwise Manipulation Refresher
Bitwise Manipulation Refresher
Pytho

7 min read


Published in

Dev Genius

·Sep 17

Spark Questions — Interview Series

Series on spark question we should know before an interview Read Part II here. Read Part III here. Read Part IV here. How does limit work in Spark? How is different from take? The limit is implemented as LocalLimit (at partition) and then GlobalLimit. This limit is implemented incrementally. First, it tries limit on one partition and then the next (current*…

Spark

3 min read

Spark Questions — Interview Series
Spark Questions — Interview Series
Spark

3 min read


Published in

Dev Genius

·Sep 9

Data Engineering On call 4

Another day, new learnings. Earlier parts of the series: Data Engineering On call 1 Data Engineering On call 2 Data Engineering On call 3 Issue 1 After an update of a spark job, the following exception occurred: java.lang.ClassCastException: cannot assign instance of scala.None$ to field org.apache.spark.scheduler.Task.appAttemptId of type scala.Option in instance of…

Data

3 min read

Data Engineering On call 4
Data Engineering On call 4
Data

3 min read


Published in

Dev Genius

·Sep 2

Prometheus + Blackbox — Service health checks

Using Blackbox-exporter to monitor service endpoints In a big data platform, there are many service endpoints (Hue, Livy, SHS, etc.), that need to be continuously monitored. This monitoring solution needs to have the following features: External probing Configurable parameters & check interval Prometheus format (As this is our monitoring solution) …

Monitoring

5 min read

Prometheus + Blackbox — Service health checks
Prometheus + Blackbox — Service health checks
Monitoring

5 min read

Amit Singh Rathore

Amit Singh Rathore

3K Followers

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML

Following
  • JIN

    JIN

  • Tony

    Tony

  • Pinterest Engineering

    Pinterest Engineering

  • NK

    NK

  • Yan Cui

    Yan Cui

See all (89)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams