Build Meetup - Jane Street London | Notes from the Windows corner

Stepping into something different today for a Build Meetup hosted by Tweak, EngFlow and Jane Street at Jane Street’s London offices. I was quite involved with Jbuilder development and early work around Dune 1.0 and some early 2.x work, although it’s not a codebase I get to work on much these days. What was interesting for me, spending a lot of time in GNU make for the compiler, was to get some first-hand “big picture” experience from the talks and also a chance to catch-up with various OCaml people who can be remarkably hard to pin down.

This is more a mish-mash of thoughts and memories from the day than anything else - talks were being recorded, so I may try to update some of the details with links to slides, but I don’t have them to hand at the moment.

There were six talks (and in fact a bonus one at the end!).

“Transitioning A Large Codebase to Bazel” (Benedikt Würkner, TTTech Auto). A theme for me started with this talk and continued through others - the day is about build systems for vast repositories within very large companies, but the lessons apply just as readily to disparate smaller systems outside in “public open source”. The talk identified phases of moving a huge codebase maintained by hundreds (or even thousands) of developers. Getting past the envy of being able to work in an environment where one has an entire full-time on just “the build system”, I particularly focussed on the necessary part of “Convince” - especially that that needed to be across the board (Management - QA - Engineers), especially as my feeling of online discussions with dune pkg is that somehow we’ve missed that part. My limited experience talking to people working on these huge codebases has been that there’s often necessarily a huge focus on speed. It was therefore very interesting to me from the “Execute” phase of doing things for the key advice to be not blocking on speed, and indeed the statement that “fast can come later, don’t block on future things which need to be changed” (because I personally think that’s been massively missed in our own efforts - I’ve always prioritise correctness over speed… fast but sometimes not working is for me only fractionally above broken).

“Integrating bazel/pre-commit” (Matt Clarkson, Arm). Quite a few years ago, I added pre-commit linting githook for OCaml (ocaml/ocaml#1148). I find it quite handy, but my impression that there aren’t many others who do. Holy moly, there’s a big infrastructure of githooks out there in use in companies! TIL about pre-commit.com. Integration of this with Bazel was relevant, if not replicable - I vociferously fight to keep our lint script in awk not because I’m mad (well…), but because the point is that the githook has no dependencies. This was a very neat demonstration of work to allow a hermetic environment for having diverse hooks potentially in a different version of Python from the project using them being able to be deployed and updated easily for users (in this case, of course, developers). The main focus resonates with work that has been ongoing and which I hope to be able to continue for the compiler - bringing CI as local as possible, ensuring that the PR is not the first time you discover the problem.

Next up was a talk on Dune advances withinin Jane Street (Andrey Mokhov + ???). They’ve made some changes to allow nix to be used to get external dependencies ((sysdeps ..)) stanza. Jane Street of course get to simplify the world a little (and, given the amount of code, why wouldn’t they!!), but interest to muse how this could be extended out to both multiple-platforms and also to opam metadata in general (and the overlap with some of our own work on multi- ecosystem package solving). The other feature demonstrated was peer-to-peer remote builds. Motivation of this was interesting to me - I’ve previously argued that aspects of Windows support get more easily merged by demonstrating that what’s required is actually critical for something else (as have others: cf. the excellent “NTFS really isn’t that bad”). Remote building always sounds like a nice idea, but hits problems quite quickly (reproducibility, etc., etc.). Of course, it becomes really critical when that remote building involves GPUs - i.e. it’s become something more important by wanting to be able to share and schedule hardware, even though the concept of remote build servers has been being talked about for years. Nice demonstration of “doing the right thing” as well - the p2p aspect is neat, and while it was clear they haven’t to actually benchmark its being better, I liked the subtext that it’s been done this (slightly more complicated) way first because the the simpler centralised system look bottlenecky even without evidence 😊

“Measuring & Improving Build Speeds” (Vaibhav Shah, EngFlow). I’ve been musing on (non-evil) telemetry and more continuous measuring of build performance (both package managers and build systems). I guess the niceish takeaway here is that this affects large companies too… it’s not just projects with a small number of maintainers who end up only looking at build performance regressions when it gets really bad and then forgetting about it for a few months/years until it next gets bad!

“What Makes Buck2 Special?” (Neil Mitchell, Meta). I hope the video of this talk emerges at some point, because it was really great. In particular, this identified for Buck2 a distinction of having a static dependency graph (Bazel, make, etc.) versus a fully dynamic dependency graph (Excel, etc.) as being a spectrum between having a static dependency graph and sections of a dynamic action graph. For example, in OCaml terms, that explains that foo.ml, bar.ml and baz.ml make up awesome.cmxa (static dependencies), but still allow the precise dependencies between those ml files to be dynamically discovered by ocamldep. However, that’s not just the build system - this is similar (probably unsurprisingly, but I was briefly surprised, as it hadn’t occurred to me before) for a package manager where it the distinction between the dependency graph and the action graph. In particular, for Buck2 this can intuitively be the static dependency graph tells you what is strictly needed (and is largely specified in the build description) but then the action graph determines things like parallelism - dynamic, but still guided by the static dependency graph. Which is exactly the package manager model. Wondering how to apply that to my own musings for dynamic/property-based discovery of external dependencies for a future version of opam.

“Extending Buck2” (Andreas Herrmann, Tweag). On the downside - the main subject of this talk is an internship proposal I floated years ago for Dune which never got anywhere. On the plusside - it works beautifully in Buck2, so it’s validated! The idea is to be able to break through the boundaries of libraries to increase build parallelism - in other words, instead of compiling foo.cmxa, bar.cmxa and baz.cmxa in order to link main-program, you actually get to compile exactly the modules which are used in main-program and then link it, potentially then creating those cmxa files in parallel as usable future artefacts. That’s obviously a quite interesting piece of dynamism - in particular, it means on a build that you might choose to the cmxa files if nothing has changed, or you might ignore it completely. Crucially, it provides a more accurate dependency graph - if you change a module in a library which is not linked in the resulting executable, you can avoid rebuilds. TIL that Haskell has a build-system like mode where it can discover dependencies and compile more files as it goes (I have an intern looking at that in OCaml this summer, although I’m more interested in seeing how easy it is retrofit using algebraic effects). And - interestingly, given why I’d come along for the day - the question was asked as to why more compiler authors aren’t in the room with build system authors, because these kinds of optimisations do clearly have to be done in coordination with the compiler. So I polished my halo a bit!