18

To further riff on the topic of educational codebases, here a collection of projects that either already to some degree have been designed with an educational purpose in mind or are otherwise suitable candidates for study.

Starting with a project that is very close to the original definition of "designed to be read and studied": Code Catalog is a collection of instructive non-trivial code examples with annotations, that originate from open-source projects, solve general, can be understood with little knowledge of the surrounding context and read in one sitting. What is interesting about this approach is, that the examples are not contrived, but explicitely researched and the educational part being worked out after the fact. I think this model is a very good way to grow the number of educational codebases.

Staying in the same order of magnitude (snippets): Collected primarily under the aspect of being short and useful: 30 seconds of code (see also the repo) provides a bunch of small code samples which accompanying explanatory blog posts.

Peter Norvig gifted the world with a lot of interesting code, his Pytudes are also certainly worthwhile to peruse. Although, as the name suggests, an etude is usually rather intended to be practice (over and over and over), not to be listened to.

But metaphors carry only so far. When we leave of the territory of well defined small problems, snippets and algorithms and turn to whole classes software system, we can find a few quite outstanding codebases: operating systems, virtual machines, compilers/interpreters, editors, browsers. All might be intimidating. All could fill semesters worth of academic course work and person-centuries of actual implementation time, but let's put that all aside, their fundamental ideas can be understood and a few brave souls have taken on the proof by building implementations that, while maybe not a choice for commercial products, are far away from being mere toys.

One example that I've briefly encountered in my own studies is Xv6, a Unix like educational OS, which is heavily inspired by the Lions' Commentary on UNIX 6th Edition (which probably itself should likely go into a Great books curriculum for software engineering).

Then there is Project Oberon, by the late Niklaus Wirth. Somebody once called Donald Knuth the The Patron Saint of Yak Shaves, and while that choice has undeniably merit, I think professor Wirth would be my personal candidate for that patronage, to justify my clain, let me quote the design goals of the Oberon system:

Project Oberon is a design for a complete desktop computer system from scratch. Its simplicity and clarity enables a single person to know and implement the whole system, while still providing enough power to make it useful and usable in a production environment.

Then a more recent project is the Jacobin JVM implementation.

Jacobin is an implementation of the JVM specification for Java 17. It is written entirely in Go with no dependencies.

The goal is to provide a more-than-minimal implementation of the JVM that can run most class files and JARs and deliver the same results as the OpenJDK-based JVMs (that is, the majority of JVM implementations today). A paramount consideration in the design and implementation of Jacobin is the codebase: making it cohesive and containing clear code. The cohesiveness, extensive commenting, and large test suite enable professionals who want to know more about how the JVM works to find the information quickly and in an easily accessible setting. Additional information on the Jacobin wiki provides more background and insight. Because Jacobin is strictly a JVM, its code is tightly focused on Java program execution.

I (like a million or so other people) am getting payed for writing Java, but to me (like the vast majority of the aforementioned other people) the JVM is still mostly terra incognita. This project therefore makes a worthwhile subject of study for a very large mainstream group of software engineers.

Speaking of VMs. When my Java-only colleagues get too condecending about JavaScript, I always snarkily reply that client-side JS delivered what the JVM only promised: write once, run anywhere, and (not only) therefore web browsers are an interesting topic. They combine many aspects of computer science (networking, computer graphics, layout algorithims, security, parsing & interpreting of multiple programming languages) and are so ubiquitous, that they ought to be better understood. Pavel Panchekha and Chris Harrelson have published Web Browser Engineering, an online book which explains how a basic but complete web browser in a couple thousand lines of Python.

Salvatore Sanfilippo, best known as the creator of Redis, has written kilo, a text editor in ~1000 SLOC of C, without any dependencies. Paige Ruten has turned that code (with slight adaptions) into a 184 step tutorial / walk-through / booklet called Build Your Own Text Editor.

Lastly, Mary Rose Cook has reimplemented a core set of git in JavaScript and published Gitlet, which contains the code, an heavily annotade version, an 6000 word essay about it, and then some.

I have some more candidates in my bookmarks, but this sample of codebases suitable for educational purposes at least is an indicator that the format is more than just a nice idea that doesn't scale beyond a small amount of code.

Saturday, the 6th of July 2024

Pages which link here: