How could we fix programming?

The state of things

“Programming is amazing”, I keep telling people. You can think of anything, build it and poof! it’s real. Unlike physical things where building something many times requires first building an entire factory, software scales up easily. You can find prebuilt components for just about anything, for free, instantly. Build one piece of code, reuse it a million times at no cost. It’s magical.

Except it’s not. Programming is only amazing after a while. Once you’ve built up enough resilience to the constant flow of confusing errors hitting you while you write code. Once you’ve gotten used to the quirks of the specific language and framework you’re using. Once you’ve acquired a second-nature ability to see through the unnatural syntax of whichever language you’re using.

Did I mention there are also a ton of different programming languages out there? Every time a developer gets frustrated with the syntax of a particular language, they think “what if we created a new language to fix this?” Some end up doing it for real, and fortunately natural selection eliminates most of the bad ones (it doesn’t always work, that’s why we now have PHP). Once a new language starts spreading to a bunch of other developers, boom! you now have a separate community of developers who will collaborate on making this particular language as great as possible.

We’ve tremendously benefitted from the innovations that each new language brings to the table. But it’s also a bit of a curse. Someone may have written a very useful open-source library in JavaScript, but a developer working on a Python codebase will be entirely unable to use it. They will have to either write their own version of the library in Python, or rewrite their entire codebase in JavaScript. Multiply this by the number of languages (and frameworks) available to date. If you don’t find this ridiculous, you’ve probably been in the industry for too long.

What makes a language anyway?

Languages fundamentally differ on three aspects:

  1. Syntax
  2. Semantics
  3. Standard library

Yes, I know, I’m oversimplifying. But hear me out anyway.

1. Syntax

You’ll get squiggly red lines in your editor and the compiler/interpreter will fail to parse your code if you don’t respect the syntax.

JavaScript has curly braces, booleans are lower-case true andfalse, line comments start with //:

function doSomething() {
  a = true;
  if (a) {
    ... // Do something.
  }
}

Python has indented blocks, booleans are upper-case True and False, line comments start with #:

def doSomething():
  a = True
  if a:
    ... # Do something.

Haskell has yet another syntax:

doSomething :: IO ()
doSomething = do
  let a = True
  if a
    then ... -- Do something.
    else return ()

These three examples are equivalent despite being syntactically different.

2. Semantics

All (real) languages share the most basic features: you can assign variables, add up numbers, manipulate strings of characters, call functions, etc.

However, each language follows a specific philosophy and works in a certain way. They can be roughly classified into programming paradigms (imperative, object-oriented, functional), but there are many details differentiating two languages within the same paradigm.

When you “declare a class”, “call a function” or “define the type of an argument”, you define the semantics of the program. Some things are allowed by some languages but not by others (C++ allows you to declare a class that extends multiple classes, Go lets you declare “coroutines”). When you use the “+” sign to add a number to a string, the result depends on the semantics of the language. Some will refuse to compile because the types don’t match, others will gladly comply and automatically convert the number to its string representation in base 10.

Semantics are to syntax what ideas are to words. You convey semantics by using the syntax of the language.

3. Standard library

Lastly, each language comes with its own baggage, called the “standard library”.

In Python, you automatically have access to functions such as: - print() to print a message in the console - len() to know the size of a list - and various useful modules such as json, threading, etc

In JavaScript, you’d use console.log() instead of print() and automatically have access to classes such as Object, Array, and so forth.

The standard library is an essential part of a language. It’s what makes it come to life. Without it, you simply cannot do anything.

Ironically, there is no “standard standard library”. Every standard library is radically different from another: some offer the bare minimals, others provide extensive functionality so that developers rarely need to rely on third-party libraries.

An idea

Now that we’ve gone through what a language is made of, I’d like to plant an idea into your mind: what if we had a way of clearly separating syntax, semantics and standard library? What could we achieve then?

Step 1: Make syntax a human-only concern

Syntax is the first one that I’d like to get out of the way. Compilers and interpreters already have a much more efficient way of representing code called an Abstract Syntax Tree (AST for short). Ultimately what we describe with code is an AST like below:

Abstract Syntax Tree for Euclidean algorithm (Wikipedia)

If you look at it, the above syntax tree could be taken from a number of languages. Is it Python? Is It JavaScript? Or maybe C++? It doesn’t matter: it’s the exact same tree.

Of course, a real-life example would be a lot more complex. There’s a reason why we humans write code in text: it’s much more compact, easier to write and (arguably) easier to read. It’s also the way we’ve been doing it since the inception of programming, and hasn’t really been questioned much.

Once you start looking at a more realistic case, the AST will start describing semantics that are specific to the language you’re working with (e.g. class definitions). But you could still share the same AST between languages that have similar semantics, to a certain extent. This would already be quite useful as you would be able to convert parts of your code automatically.

So, I’d like us to think of syntax as a human-only layer on top of the AST. Code should maybe not even be stored or represented as text anywhere but in your text editor. And if you wanted to use a different syntax for the particular language you’re using, so be it. That should not affect anyone else.

I’m actually surprised that there isn’t already a tool to convert code from any language to any other language, when possible. I’m guessing people tried and gave up because it wasn’t all that useful without also covering standard libraries. I’m also trying, obviously.

Step 2: Abstract standard libraries into APIs

APIs are a neat concept. Every software communicates with every other software via APIs. A mobile app talks to a server via an API. A server talks to a database via an API. Everyone talks to each other via an API. It’s the cool thing to do.

Why am I talking about APIs now? Because they’re exactly what we need. APIs are language-agnostic. They are a very simple set of semantics to describe the functionality exposed by a particular code module to the rest of the world (whether it’s a library, an HTTP server or else).

There are many different ways of declaring an API. It may be a JavaScript module on NPM with its API documented in the README file. It may be declared explicitly in code, such as with a TypeScript module. It may not be declared at all and not clearly documented anywhere.

But what matters is: an API declares the “outside surface” of a code module. You can switch out the insides of the module, rewrite it in another language, the API will not change. This is the beauty of APIs. Programming languages are fucked-up, but APIs are cool.

We’ve talked before about standard libraries and how they differ radically from language to language. If we somehow manage to abstract the complexity of the standard library behind a clean API, we can fix this. Semantically, calling print("Hello") is considered different from calling System.out.println("Hello") in Java. However, they’re actually the same API.

There are two ways to go about it. We could either ask humans to stop using the standard library and use our “API layer” instead. Or we could let the computer automatically infer what the code you depend on does. I’m a bit pessimistic on the whole “convince humans to change their ways” topic, so I’ll vote for the latter.

We don’t need to document the API of every single function exposed in the standard library of a language. In any given file, you probably only use a handful of them. We could convert code from one language to another almost automatically. We’d just need a bit of assistance from a human, who will know best what to replace a specific standard library call with. Don’t worry, the computer will still need you. For now.

Step 3: Everything is an API

So, we now have clean code modules defined purely semantically, and interactions with the standard library are abstracted into APIs.

What’s next? Make it an API.

Codebases nowadays are made of multiple files that generally reference each other via “import statements”. It’s very convenient for us humans, but it also means that we need to maintain a structured map of the codebase in our heads. A small change somewhere can have a devastating unintended effect somewhere else, if you haven’t expended enough effort into automated testing. Also, your codebase will grow and compile times will slowly increase.

https://xkcd.com/303/

Again, there’s probably a better way. Modularity is the way.

I’ve expanded on modularity previously, but basically: every self-contained piece of code should be neatly abstracted away as behind an API. I call these modules. You shouldn’t need to even care what language a particular module was written in. When you write a module, you don’t import other files. Actually, files are no longer a thing at this point. You just use APIs that are automagically (somehow) available to you.

Why are modules good? Because: - they encourage design thinking: you need to design the API first - they reduce cognition costs: you just have to “fill in the blanks” - testing becomes easy: you’re just testing an API, you can mock out any dependencies - the world will become a better place: no more language barriers, no more monolithic codebases, happier developers, happier customers

Step 4: Enjoy

I’m not sure what comes after step 3. I think we’ll all be quite satisfied then.


Thanks for reading this far!

I’m keen to hear people’s thoughts, so shoot me an email at f@zenc.io or write a comment below. If you’re in Sydney, I’m always up for coffee.

Shares/claps/following always appreciated. Have a great day!

Read other things I wroteWho's François?