Sujith Jay Nair Thinking Aloud

Natural Languages are Interfaceless

In The Design of Everyday Things, Donald Norman talks about the temperature knobs on his refrigerator:

I used to own an ordinary, two-compartment refrigerator - nothing very fancy about it. The problem was that I couldn’t set the temperature properly. There were only two things to do: adjust the temperature of the freezer compartment and adjust the temperature of the fresh food compartment. And there were two controls, one labeled “freezer”, the other “refrigerator”. What’s the problem? Oh, perhaps I’d better warn you. The two controls are not independent. The freezer control also affects the fresh food temperature, and the fresh food control also affects the freezer.
In fact, there is only one thermostat and only one cooling mechanism. One control adjusts the thermostat setting, the other the relative proportion of cold air sent to each of the two compartments of the refrigerator. It’s not hard to imagine why this would be a good design for a cheap fridge: it requires only one cooling mechanism and only one thermostat. Resources are saved by not duplicating components - at the cost of confused customers.

Norman is talking about the lack of a (good) interface here: a layer to translate (and hide) the structure of the underlying mechanism to the users of the mechanism. 1 The need to translate to the user arises in two scenario:

  1. There is a divide between the want of the user, and the how the mechanism is structured. I like to call it the what-how divide. 2
  2. Although the mechanism & the user’s want are aligned, the mechanism is too convoluted for the user to use in a direct way. A facilitator is needed.

In both cases, a translation is needed, and the translator is termed an interface.

Languages are Interfaceless

(Inter)Faceless a.k.a No-Face

(Inter)Faceless a.k.a No-Face

(Natural) Languages are the quintessential human way of communication. Our advanced languages are arguably the lone differentiators of our species from our cousins in the primate family, and the larger animal kingdom. 3

We have been inventing, honing, assimilating, and discarding languages since the start of our existence as a species. But we do not develop languages with an intent for it to be translated. Languages are not meant by its inventors to be translated. Every language is developed as if it is the only language in existence, and everyone else understands it.

.. Read More

Innovation Loops

The purpose of an engineering organization (at the risk of sounding frivolously reductionist) is to build business value. You can grow an organization’s delivered business value over time by: 1

  • training members: investing in people,
  • improving process: investing in shaping behaviour and communication,
  • staking technical leverage: investing in technology.

A cumulative side-effect of these approaches is to strengthen innovation loops.

Innovation Loops

Innovation loops are informal, intrapreneurial feedback loops in engineering teams which builds products & features to address user demand & pain. It is innovation which circumvents the software development cycle involving product & market research teams. In mature teams, innovation loops complement & reinforce the existing, evolutionary product development feedback cycle. I call product development evolutionary, in contrast to the more revolutionary (or reactive) trait of innovation loops.

Regular product development as green arrows; Innovation loops as red squiggles.

Regular product development as green arrows; Innovation loops as red squiggles.

Innovation loops are more prevalent in infrastructure teams than in product-focused teams. This could be partly explained by the availability of direct communication channels to users which infrastructure teams possess, and product-focused teams do not.

.. Read More

Filling Missing Data

A recent exercise I undertook of upgrading Apache Spark for some workloads from v2.4.3 to v2.4.5 surfaced a number of run-time errors of the form:

org.apache.spark.sql.AnalysisException: Cannot resolve column name "name" among (id, place);
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:223)
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:223)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:222)
  at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
  at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$toAttributes$2.apply(DataFrameNaFunctions.scala:475)

A little poking-around showed this error occurred for transformations with a similar general shape. The following is a minimal example to recreate it:

val df = Seq(
  ("1", "Berlin"),
  ("2", "Bombay")
  ).toDF("id", "place")

df.na.fill("empty",Seq("id", "place", "name"))

This looks wrong, but apparently works fine in v2.4.3 😲. A transformation which attempts to fill in a missing value for a column which does not exist should raise an error: v2.4.5 does that.

.. Read More

A Conversation with Software Engineering Daily

Nubank Data Engineering with Sujith Nair

I was recently on the Software Engineering Daily podcast to talk about Data Engineering at Nubank.

It turned to be a great conversation on functional data engineering, the importance of testability & reproducibility in data engineering (and our approach to achieving it at scale at Nubank), thinking of dataset quality in terms of dataset-as-a-service, and my take on the history of data engineering as a rediscovery of the table abstraction. Check it out here.

Why Are Computer Storage Units Called 'Memory'?

.. Read More