Sujith Jay Nair Thinking Aloud

Understanding Apache Spark on YARN

Introduction

Apache Spark is a lot to digest; running it on YARN even more so. This article is an introductory reference to understanding Apache Spark on YARN. Since our data platform at Logistimo runs on this infrastructure, it is imperative you (my fellow engineer) have an understanding about it before you can contribute to it. This article assumes basic familiarity with Apache Spark concepts, and will not linger on discussing them.

.. Read More

Shuffle Hash and Sort Merge Joins in Apache Spark

Introduction

This post is the second in my series on Joins in Apache Spark SQL. The first part explored Broadcast Hash Join; this post will focus on Shuffle Hash Join & Sort Merge Join.

.. Read More

How Conversations on StackOverflow Teach You

image-title-here

Note: This post has some concepts on Scala collections. Do not worry if you have little interest in Scala; the point I am trying to convey has significance beyond my choice of language. This is an exhortation to the engineering community at large to share our learnings more.

.. Read More