Open in app

Sign in

Write

Sign in

Amit Singh Rathore
Amit Singh Rathore

3.6K Followers

Home

Lists

About

Published in

Dev Genius

·Nov 20

Spark Operator — Basics

Managing Spark Jobs as a K8s object Spark Operator allows seamless integration between Apache Spark and Kubernetes. It follows the K8s operator pattern to manage the lifecycle of Spark applications. When using this, a Spark application is declared using YAML files. Components & Architecture The Kubernetes Operator for Apache Spark comprises several key…

Apache Spark

8 min read

Spark Operator — Basics
Spark Operator — Basics
Apache Spark

8 min read


Published in

Dev Genius

·Nov 12

Spark Interview Question — XI

Next installment of the interview series Part I | Part II | Part III | Part IV | Part V | Part VI | Part VII | Part VIII | Part IX | Part X What is Arrow & how does it improve Python UDF in Spark? Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing. Before Arrow…

Apache Spark

4 min read

Apache Spark

4 min read


Published in

Dev Genius

·Nov 11

Advance Data Structures for Data Engineering — Part II

Probabilistic data structures used in big data Read Part I here. 1. HyperLogLog Counting unique items usually requires the amount of memory proportional to the number of items we want to count because we need to remember the elements we have already seen in the past in order to avoid counting them…

Data Engineering

4 min read

Advance Data Structures for Data Engineering — Part II
Advance Data Structures for Data Engineering — Part II
Data Engineering

4 min read


Nov 6

Data Engineering Best Practices

Compilation of some good practices for DE Data engineering is a critical aspect of any data-driven organization. Building robust and efficient data pipelines requires a combination of expertise, well-defined processes, and best practices. …

Data Engineering

5 min read

Data Engineering Best Practices
Data Engineering Best Practices
Data Engineering

5 min read


Published in

Dev Genius

·Nov 2

Spark Interview Questions — X

next part of the interview series Part I | Part II | Part III | Part IV | Part V | Part VI | Part VII | Part VIII | Part IX | Part X What Aggregate Strategies are provided in Spark and how it chooses one? (Hash vs Sort aggregate) The Sort Aggregate requires the rows to be sorted by the grouping key so that…

Spark

4 min read

Spark Interview Questions — X
Spark Interview Questions — X
Spark

4 min read


Published in

Dev Genius

·Oct 24

Spark Threat Modelling

Security consideration of spark cluster & its components The Spark ecosystem is an integral part of most of the analytics workloads at big companies. While generally this is deployed in private network zones but still its security sanitization is still needed to avoid any data breach. …

Apache Spark

4 min read

Spark Threat Modelling
Spark Threat Modelling
Apache Spark

4 min read


Published in

Dev Genius

·Oct 24

Spark Structured Streaming

process streaming data using spark dataframe API Stream Processing can mean different things to different people. Some look at this with the lens of Real-time vs Schedule where they associate it with latency. Some people link this to continuous/ongoing processing of records. While a majority of people link streaming to…

Apache Spark

7 min read

Spark Structured Streaming
Spark Structured Streaming
Apache Spark

7 min read


Published in

Dev Genius

·Oct 24

Spark Interview Questions — IX

Next blog in the spark interview series. Part I | Part II | Part III | Part IV | Part V | Part VI | Part VII | Part VIII | Part IX | Part X What are the different types of window operations available in Spark Streaming? There are four different types of window operations available in Spark Streaming: 1. Tumbling…

Apache Spark

5 min read

Spark Interview Questions — IX
Spark Interview Questions — IX
Apache Spark

5 min read


Published in

Dev Genius

·Oct 16

Spark Interview Questions — VIII

Another part of the Spark interview series. Part I | Part II | Part III | Part IV | Part V | Part VI | Part VII | Part VIII | Part IX | Part X What is the difference between Select vs SelectExpr in Spark? selectExpr() is a powerful method for column selection and transformation when you need to…

Apache Spark

6 min read

Spark Interview Questions — VIII
Spark Interview Questions — VIII
Apache Spark

6 min read


Published in

Dev Genius

·Oct 8

Spark Interview Questions — VII

The next part of the series. Part I | Part II | Part III | Part IV | Part V | Part VI | Part VII | Part VIII | Part IX | Part X What happens when we give Join hints on both sides of join? When the hints are specified on both sides of the Join, Spark selects the hint…

Spark

4 min read

Spark

4 min read

Amit Singh Rathore

Amit Singh Rathore

3.6K Followers

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML

Following
  • Capital One Tech

    Capital One Tech

  • Antonello Benedetto

    Antonello Benedetto

  • Tony

    Tony

  • Ran Isenberg

    Ran Isenberg

  • NK

    NK

See all (89)

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams