2022 picks: software projects to keep an eye on

Jan 19, 2023

These software projects and technologies caught my attention and excitement during 2022, though they may not have necessarily appeared that year. My selection and focus have a clear bias as I have spent most of my life developing open-source, data center, and e-commerce infrastructure software on UNIX-like systems. I only included projects I have tried myself.

I admire software that solves complex problems in a simple, elegant, and lean manner and those that are easily adopted and standardized as the “default”.

Tree-sitter

Description from its website:

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.

Why is this important? This comment summarizes it well:

Incremental parsing of incorrect code is one of those things that is literally impossible in the general case, but tree-sitter has found a lot of good ways to do it that are not just possible for a large fraction of reality, but also performant. It’s hard to understate how impressive a piece of engineering this is.

I see this technology having an impact on IDEs, editors, linters and other tools similar to what the LLVM project did years ago for the compiler and interpreter ecosystem, and what Language Server Protocol did for IDEs during the last years.

For example, the Emacs editor adopted LSP by including eglot by default.

Originally, you could replace the limited regexp-based syntax highlighting in Emacs with the emacs-tree-sitter modes. This is no longer necessary, as from version 29+, Tree-sitter support is part of Emacs by default.

Wireguard

Description from its website:

WireGuard® is an extremely simple yet fast and modern VPN that utilizes state-of-the-art cryptography.

Wireguard’s simplicity:

implemented in ~5000 lines of code, when most VPN solutions range from tens of thousands to hundreds of thousands
works at the interface level, which means you can treat it like any other interface
the state is hidden from the user, so things like roaming just work
It is incorporated in most open-source operating systems. There is a Windows native version and a multi-platform userspace version written in Go.
- NetworkManager, which most people use to manage networks in desktop Linux, has native support for it, and GNOME even displays the toggle for Wireguard connections.
Android/iOS app
Most commercial VPN providers support it, including the only one that is worth your time.

Some higher-level solutions have been built on top of WireGuard. The most impressive is Tailscale, which brings magical usability to access private networks spread across the world.

And last but not least, Fritzboxes, one of the most popular consumer routers in Germany, supports Wireguard natively, since December 2022, which means I now can access my home LAN very easily and from almost any device.

Litestream and liteFS

I believe many of the complicated app architectures today are either too early or just unnecessary.

SQLite is a local database engine that operates on a single file per database. It is the most deployed database in the world, and it is likely running in your pocket inside your phone on many different apps.

Many businesses could start in a single machine using a SQLite database.

Litestream is a project from Ben Johnson that replicates sqlite3 databases to make sqlite globally distributed. The replication part was extracted into LiteFS, while Litestream kept the disaster recovery replication.

With this model, LiteFS uses FUSE (Filesystem in userspace) as a pass-through filesystem to intercept writes to the database to detect transaction boundaries and replicate those in the replica nodes.

Litestream allows replicating databases by continuously copying write-ahead log pages to cloud storage.

An alternative implementation of transaction replication using SQLite built-in VFS is also planned.

The project was since then acquired by Fly.io, which specializes in deploying apps close to the users.

Both projects give SQLite superpowers and allow for resilient and performant applications while keeping the setup and architecture lean and simple.

During the Twitter exodus to Mastodon, I saw people dealing with the complexity and resource requirements of operating Mastodon for a single user. My Fediverse instance is not Mastodon, but gotosocial. Uses 128M ram, a 140M SQLite database, and runs on a 5€ micro VM. The database is replicated to an sftp share with Litestream.

Nix

Nix is a tool for producing reproducible builds and deployments. It takes a different approach to package management using a declarative and functional build description.

When you build something with Nix, it ends in its own directory in the Nix store e.g. /nix/store/hxxrbmr2zh6ph90qi8b4n2m53yvan3fr-curl-7.85.0/ and as long as the inputs do not change, the location, which is content-addressed, will not change either. They will also depend on the exact versions they were built against.

This allows you the installation of multiple versions in parallel, and the current system profile itself is a collection of symbolic links to the right binaries, which means you can roll back very easily.

While Nix can be used on Linux and macOS, there is a full Linux distribution built on this model.

While it can also be used for CI, building container images, etc., I use Nix in two ways:

Declare project dependencies

If I have e.g. a folder with some Ansible roles I use to configure my home gadgets, I can make that project independent from where I am running it by just having a top shell.nix declaring dependencies. Then a simple .envrc file with the line use_nix and direnv setup in my shell.

As soon as I cd into the directory, Ansible is installed and appears in the path. I cd out and it disappears. The nix store is cached, so the second time is very fast (until you nix store gc).

You can use this to have reproducible developer environments.

Nix Flakes is a new format to package Nix-based projects in a more discoverable, composable, consistent and reproducible way.

With Flakes, you could even pin your environment to a specific revision of the package descriptions.
Manage packages, including my own

Some packages I need all the time: Emacs, Chromium, tarsnap, etc. I use Nix for that, and keep my distribution just for the base system.

nix profile install nixpkgs#tarsnap and the package is now always available. I also have packages that are not free to distribute, so I can keep the recipe to build it in git, or just override a few compile options from another package. It is just flexible.

The language is a functional DSL that takes some curve to learn, just like the built-in functions. I am not sure if this will be someday the future of deployments, but for me as been agreat addition to those two use cases..

Stable Diffusion

StableDiffusion is an AI model which allows to:

transform text prompt into images
transform images plus a text prompt into new images
edit images by selecting an area and a prompt

Also impressive are the creations where StableDiffusion is used to change a single video frame, and another model is used to extrapolate the change to the rest of the frames, resulting in full video editing.

The Dreambooth model allows to finetune StableDiffusion for specific subjects. This is what the Lensa app does when generating many avatars from your selfies.

I believe this will have a huge impact on creative industries (design, gaming), and will make their software understand the semantics of the image, just like IDEs have been doing for years offering syntax-aware refactorings.

ChatGPT

I’d like to mention ChatGPT together with Copilot, but I haven’t tried Copilot yet.

These technologies are already proving to be very useful in the context of programming.

Leaving out the controversial topic of training proprietary models on GPL code for another occasion, I am impressed how good ChatGPT is to port code from one dimension to another, eg. rewriting using a different language, library, etc. I think it will become very useful for porting, refactoring and updating software.

For example, I was very pleased with ChatGPT being able to take some Linux commands, and generating me a set of Ansible tasks to replicate the configuration

Phoenix LiveView, hotwire and the return of the server-side HTML

Single-page applications (SPA) are with us for longer than I can remember, but the feeling something is not right in that model continues to live with me.

The architecture duplication on the server and client-side (controllers, views, stores), dividing teams through json messages in two worlds speaking different languages seems broken. The instability of the Javascript eco-system just makes things worse.

I can’t however, picture how to solve the challenges SPAs aim to solve when it comes to highlyy interactive applications.

Phoenix is a web framework for Elixir, a language running on the Erlang VM. His creator has a Rails background, so he took off from where Rails left and brought innovation to the space in the form of Phoenix LiveView, a technique that allows for highly interactive applications without abandoning the server side paradigm.

Other toolkits have appeared which allow to start server side and add interactivity in a structured way without abandoning the server side paradigm. One is HotWire from Basecamp, which includes Turbo and other libraries, and htmx, which works by just annotating HTML.

virtio-fs and krunvm

Something I always disliked about virtualization was the use of images. It added a whole layer of complexity.

virtio-fs is a filesystem that allows sharing the host filesystem with the guest. Unlike virtio-9p (the one used by Windows Subsystem for Linux), it has local semantics.

qemu has support for it, so you can boot a root filesystem.

One tool that takes advantage of virtio-fs is krunvm. It allows to run container images as micro virtual machines. The machines implement a few simple virtio devices enough to run an embedded kernel in libkrun.

krunvm takes virtio-fs to the next level, basically making it invisible, allowing you to mount any host folder into the virtual machine the same way that you do it with container images.

Follow the work Sergio Lopez is doing in this space.

These are my picks. What are yours?