PicoBlog

The Unnecessary Obscurity of Assembly Language

Over Easter I’ll be sharing some shorter posts rather than a longer read on Sunday. This first post looks at the readability of assembly language.

Even in 2023, if we want to make the most of our hardware, then it’s still sometimes necessary to dip into assembly language. The most recent work on getting machine learning applications running on desktop PCs (for example the amazing port of OpenAI’s Whisper to C++ by Georgi Gerganov) have made extensive use of vector instructions and I’m sure debugging these has involved some reading of assembly language output. But reading assembly is really quite hard. Does it need to be so?

The recent article ‘Only Wizards Used Assembly Language’ by

at got me thinking about the role that assembly languages have played over the years.

Wikipedia describes assembly language as:

any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions.

Paul describes how he regarded assembly language as being for ‘Wizards’ in the 1980s. It’s certainly true that most people just used BASIC to program their home computers. I think, though, that there was probably a lot more assembly language programming going on in general than is the case today. Today we have high quality C / C++ / Rust compilers, which would be used where assembly would have been used in the era of 8-bit systems.

To try to test out my hypothesis here is a poll to see how the ‘Chip Letter’ readership splits on this issue. Please vote for an assembly language even if you've used (or read it) a little, at any time in the past.

Loading...

There is no doubt that it’s really hard to build software in assembly. Building complex programs things out of small, simple building blocks is undoubtedly much more challenging than using higher level abstractions.

But even if we don’t often need to write assembly these days, it’s sometimes the case that we need to read it. The popularity of the website Compiler Explorer, which shows the assembly language generated by your program in C / C++ (or lots of other languages), indicates that there is often a real need to understand what your high level program is doing at the assembly language level.

I’ve used Compiler Explorer myself when using Intel’s AVX512 vector instructions.

Given this, assembly language is really hard to read. Here is some sample output from Compiler Explorer. My eyes have usually glazed over by the third ‘movq’.

Let’s just take a simple operation like copying from one register to another. x86, Intel format, has:

mov ebx, eax

The use of a three letter abbreviation (TLA) may have made sense when a few KB on file sizes was a key issue, but maybe not in 2023?

Then of course there is AT&T syntax which reverses the order of the registers and adds a couple of unnecessary ‘%’s.

mov %eax, %ebx

And let’s not get into constructions like:

mov (%esi,%ebx,4), %edx

When we get into more complex instructions then the naming gets even more confusing. 64-bit Arm has the infamous Seven Letter Abbreviation:

FJCVTZS W0, D0

or ‘Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.’ I’m not criticising the need for this instruction, but does ‘FJCVTZS’ really tell you what this instruction is doing?

Intel’s AVX512 has lots and lots of examples like this (eg ‘VCVTTPD2UDQ’ with an Eleven Letter Abbreviation!) and lots and lots of them look really similar.

I could go on, but I’m sure the message is clear: assembly language is really hard to read for the occasional reader. Does it need to be so?

In the earliest days of computing, before the invention of high level languages, assembly language was the only way that you could program the computer.

The first use of what we would today call assembly language appears in the paper ‘Coding for A.R.C’ by Andrew Booth and Kathleen Britten, written as early as 1947 (although a version of the notation appears in an earlier paper). The paper and the background to it are really interesting in many ways, but for this post I’d like to focus on the form of the assembly language it uses. Here is one example:

So taking the simplest example, the assembly language to copy A to R is just ‘A to R’.

This seems a lot more sensible to me than ‘MOV %R, %A’. Booth and Britten actually wanted their assembly language to be readable!

It really feels to me like assembly language took a wrong turn at some point. For a language that is mostly read rather than written, then it’s really not readable at all. Even the first ever assembly language was more readable than the ones we use today.

Can and should we have made assembly language more readable? We don’t write our high level languages in Three Letter Abbreviations or Seven Letter Abbreviations or even Eleven Letter Abbreviations, so why do we still do so for Assembly Language? Why has there been no attempt to get some consistency between architectures? Do let me know what you think in the comments.

ncG1vNJzZmismJqwqbXPpZytrJWne7TBwayrmpubY7CwuY6pZq2glWLCr7rEnJysq5Gnxm67wayarqqZqcZuu8VmmKyrlaKvrcU%3D

Lynna Burgamy

Update: 2024-12-03