Sql Articles
Prefer Unions over Or in Spark Joins
11 Oct 2019 •APACHE-SPARK
SQL
JOINS
A common anti-pattern in Spark workloads is the use of an or
operator as part of ajoin
. An example of this goes as follows:
val resultDF = dataframe
.join(anotherDF, $"cID" === $"customerID" || $"cID" === $"contactID",
"left")
This looks straight-forward. The use of an or
within the join makes its semantics easy to understand. However, we should be aware of the pitfalls of such an approach.
The declarative SQL above is resolved within Spark into a physical plan which determines how this particular query gets executed. To view the query plan for the computation, we could do:
resultDF.explain()
/* pass true if you are interested in the logical plan of the query as well */
resultDF.explain(true)
Shuffle Hash and Sort Merge Joins in Apache Spark
28 Jun 2018 •APACHE-SPARK
SQL
JOINS
Introduction
This post is the second in my series on Joins in Apache Spark SQL. The first part explored Broadcast Hash Join; this post will focus on Shuffle Hash Join & Sort Merge Join.
.. Read MoreBroadcast Hash Joins in Apache Spark
17 Feb 2018 •APACHE-SPARK
SQL
JOINS
Introduction
This post is part of my series on Joins in Apache Spark SQL. Joins are amongst the most computationally expensive operations in Spark SQL. As a distributed SQL engine, Spark SQL implements a host of strategies to tackle the common use-cases around joins.
In this post, we will delve deep and acquaint ourselves better with the most performant of the join strategies, Broadcast Hash Join.
.. Read MoreBehaviour of ORDER BY in FROM: MariaDB vs MySQL
01 Dec 2017 •SQL
MARIADB
MYSQL
DATABASES
TL; DR
In MariaDB, query with ORDER BY
in a FROM
subquery produces an unordered result. In effect, ORDER BY
is ignored in FROM
subqueries. MySQL does not ignore ORDER BY
in FROM
subqueries.