A second foray into agentic coding
Continuing the previous theme of dabbling with matters agentic. Previously, I’d quite assiduously kept my fingers away from files. This time, I wanted to try something exploratory, switching to the agent for things I was actively stuck on.
I was still (very) curious at the latent remaining bug in Lucas’s excellent work. There were some corners which had been cut in the prototype, and I had a brief foray into this problem, with a view this time to ensuring artefact equivalence between what OCaml’s build system would produce and what our altered driver program was doing.
If you have a pre-built compiler and a clean (of binary artefacts) OCaml source tree, you can actually build the bytecode compiler in just three, ahem, short commands (I’m intentionally glossing over all the generated source files):
$ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats -linkall -a -o compilerlibs/ocamlcommon.cma utils/config.mli utils/build_path_prefix_map.mli utils/format_doc.mli utils/misc.mli utils/identifiable.mli utils/numbers.mli utils/arg_helper.mli utils/local_store.mli utils/load_path.mli utils/profile.mli utils/clflags.mli utils/terminfo.mli utils/ccomp.mli utils/warnings.mli utils/consistbl.mli utils/linkdeps.mli utils/strongly_connected_components.mli utils/targetint.mli utils/int_replace_polymorphic_compare.mli utils/domainstate.mli utils/binutils.mli utils/lazy_backtrack.mli utils/diffing.mli utils/diffing_with_keys.mli utils/compression.mli parsing/location.mli parsing/unit_info.mli parsing/asttypes.mli parsing/longident.mli parsing/parsetree.mli parsing/docstrings.mli parsing/syntaxerr.mli parsing/ast_helper.mli parsing/ast_iterator.mli parsing/builtin_attributes.mli parsing/camlinternalMenhirLib.mli parsing/parser.mli parsing/pprintast.mli parsing/parse.mli parsing/printast.mli parsing/ast_mapper.mli parsing/attr_helper.mli parsing/ast_invariants.mli parsing/depend.mli typing/annot.mli typing/value_rec_types.mli typing/ident.mli typing/path.mli typing/type_immediacy.mli typing/outcometree.mli typing/primitive.mli typing/shape.mli typing/types.mli typing/data_types.mli typing/rawprinttyp.mli typing/gprinttyp.mli typing/btype.mli typing/oprint.mli typing/subst.mli typing/predef.mli typing/datarepr.mli file_formats/cmi_format.mli typing/persistent_env.mli typing/env.mli typing/errortrace.mli typing/typedtree.mli typing/signature_group.mli typing/printtyped.mli typing/ctype.mli typing/out_type.mli typing/printtyp.mli typing/errortrace_report.mli typing/includeclass.mli typing/mtype.mli typing/envaux.mli typing/includecore.mli typing/tast_iterator.mli typing/tast_mapper.mli typing/stypes.mli typing/shape_reduce.mli file_formats/cmt_format.mli typing/cmt2annot.mli typing/untypeast.mli typing/includemod.mli typing/includemod_errorprinter.mli typing/typetexp.mli typing/printpat.mli typing/patterns.mli typing/parmatch.mli typing/typedecl_properties.mli typing/typedecl_variance.mli typing/typedecl_unboxed.mli typing/typedecl_immediacy.mli typing/typedecl_separability.mli lambda/debuginfo.mli lambda/lambda.mli typing/typeopt.mli typing/typedecl.mli typing/value_rec_check.mli typing/typecore.mli typing/typeclass.mli typing/typemod.mli lambda/printlambda.mli lambda/switch.mli lambda/matching.mli lambda/value_rec_compiler.mli lambda/translobj.mli lambda/translattribute.mli lambda/translprim.mli lambda/translcore.mli lambda/translclass.mli lambda/translmod.mli lambda/tmc.mli lambda/simplif.mli lambda/runtimedef.mli file_formats/cmo_format.mli middle_end/internal_variable_names.mli middle_end/linkage_name.mli middle_end/compilation_unit.mli middle_end/variable.mli middle_end/flambda/base_types/closure_element.mli middle_end/flambda/base_types/var_within_closure.mli middle_end/flambda/base_types/tag.mli middle_end/symbol.mli middle_end/flambda/base_types/set_of_closures_id.mli middle_end/flambda/base_types/set_of_closures_origin.mli middle_end/flambda/parameter.mli middle_end/flambda/base_types/static_exception.mli middle_end/flambda/base_types/mutable_variable.mli middle_end/flambda/base_types/closure_id.mli middle_end/flambda/projection.mli middle_end/flambda/base_types/closure_origin.mli middle_end/clambda_primitives.mli middle_end/flambda/allocated_const.mli middle_end/flambda/flambda.mli middle_end/flambda/freshening.mli middle_end/flambda/base_types/export_id.mli middle_end/flambda/simple_value_approx.mli middle_end/flambda/export_info.mli middle_end/backend_var.mli middle_end/clambda.mli file_formats/cmx_format.mli file_formats/cmxs_format.mli bytecomp/instruct.mli bytecomp/meta.mli bytecomp/opcodes.mli bytecomp/bytesections.mli bytecomp/dll.mli bytecomp/symtable.mli driver/pparse.mli driver/compenv.mli driver/main_args.mli driver/compmisc.mli driver/makedepend.mli driver/compile_common.mli utils/config.ml utils/build_path_prefix_map.ml utils/format_doc.ml utils/misc.ml utils/identifiable.ml utils/numbers.ml utils/arg_helper.ml utils/local_store.ml utils/load_path.ml utils/clflags.ml utils/profile.ml utils/terminfo.ml utils/ccomp.ml utils/warnings.ml utils/consistbl.ml utils/linkdeps.ml utils/strongly_connected_components.ml utils/targetint.ml utils/int_replace_polymorphic_compare.ml utils/domainstate.ml utils/binutils.ml utils/lazy_backtrack.ml utils/diffing.ml utils/diffing_with_keys.ml utils/compression.ml parsing/location.ml parsing/unit_info.ml parsing/asttypes.ml parsing/longident.ml parsing/docstrings.ml parsing/syntaxerr.ml parsing/ast_helper.ml parsing/ast_iterator.ml parsing/builtin_attributes.ml parsing/camlinternalMenhirLib.ml parsing/parser.ml parsing/lexer.mli parsing/lexer.ml parsing/pprintast.ml parsing/parse.ml parsing/printast.ml parsing/ast_mapper.ml parsing/attr_helper.ml parsing/ast_invariants.ml parsing/depend.ml typing/ident.ml typing/path.ml typing/primitive.ml typing/type_immediacy.ml typing/shape.ml typing/types.ml typing/data_types.ml typing/rawprinttyp.ml typing/gprinttyp.ml typing/btype.ml typing/oprint.ml typing/subst.ml typing/predef.ml typing/datarepr.ml file_formats/cmi_format.ml typing/persistent_env.ml typing/env.ml typing/errortrace.ml typing/typedtree.ml typing/signature_group.ml typing/printtyped.ml typing/ctype.ml typing/out_type.ml typing/printtyp.ml typing/errortrace_report.ml typing/includeclass.ml typing/mtype.ml typing/envaux.ml typing/includecore.ml typing/tast_iterator.ml typing/tast_mapper.ml typing/stypes.ml typing/shape_reduce.ml file_formats/cmt_format.ml typing/cmt2annot.ml typing/untypeast.ml typing/includemod.ml typing/includemod_errorprinter.ml typing/typetexp.ml typing/printpat.ml typing/patterns.ml typing/parmatch.ml typing/typedecl_properties.ml typing/typedecl_variance.ml typing/typedecl_unboxed.ml typing/typedecl_immediacy.ml typing/typedecl_separability.ml typing/typeopt.ml typing/typedecl.ml typing/value_rec_check.ml typing/typecore.ml typing/typeclass.ml typing/typemod.ml lambda/debuginfo.ml lambda/lambda.ml lambda/printlambda.ml lambda/switch.ml lambda/matching.ml lambda/value_rec_compiler.ml lambda/translobj.ml lambda/translattribute.ml lambda/translprim.ml lambda/translcore.ml lambda/translclass.ml lambda/translmod.ml lambda/tmc.ml lambda/simplif.ml lambda/runtimedef.ml bytecomp/meta.ml bytecomp/opcodes.ml bytecomp/bytesections.ml bytecomp/dll.ml bytecomp/symtable.ml driver/pparse.ml driver/compenv.ml driver/main_args.ml driver/compmisc.ml driver/makedepend.ml driver/compile_common.ml
$ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats -a -o compilerlibs/ocamlbytecomp.cma bytecomp/bytegen.mli bytecomp/printinstr.mli bytecomp/emitcode.mli bytecomp/bytelink.mli bytecomp/bytelibrarian.mli bytecomp/bytepackager.mli driver/errors.mli driver/compile.mli driver/maindriver.mli bytecomp/instruct.ml bytecomp/bytegen.ml bytecomp/printinstr.ml bytecomp/emitcode.ml bytecomp/bytelink.ml bytecomp/bytelibrarian.ml bytecomp/bytepackager.ml driver/errors.ml driver/compile.ml driver/maindriver.ml
$ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -compat-32 -o ocamlc -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats compilerlibs/ocamlcommon.cma compilerlibs/ocamlbytecomp.cma driver/main.mli driver/main.ml
I wanted to try a different angle on the Load_path
, and this time produced a
function which predicts the files in the tree. The rules for this were pretty
easy for me to define, and I wasn’t sure I could face watching Claude
special-case everything. 130 lines of verifiably correct hacked OCaml later, I
had my load path function. A little bit more code later, those three commands
above were translated into an OCaml script (based on the ocamlcommon and
ocamlbytecomp libraries) which should exactly the same build. It ran - and it
built the compiler.
ocamlc
was, pleasingly, exactly the same. The .cma files, however, were not.
For ocamlcommon.cma, that turned out to be me being sloppy with my commands.
ocamlcommon.cma
is linked with -linkall
, but
ocamlc -a foo.cma -linkall bar.cmo
is not the same as
ocamlc -a foo.cma -linkall bar.ml
, because -linkall
gets recorded in the
.cmo file as well. Easy fix - but the files were still different. A bit more
tweaking and I could see that actually the .cmo files were different.
A bit more poking and checking with ocamlobjinfo
and a few other flags and
tricks, and I observed that:
$ ocamlc -g -c utils/config.ml
resulted in slightly different debug information from:
$ console -g -c utils/config.mli utils/config.ml
(it’s observably to do with the debug information - omit the -g
and they’re
all identical). Lots to suspect here, but time for…
$ claude
╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code! │
The problem was easy to state, but not quite so quick to come up with a conclusive explanation. Claude, like most of these models, appears not to have been trained on this old cartoon, and very merrily buzzes along for a few rounds of investigation, followed by a highly dubious explanation for how it was probably something to do with marshalling and, mumble mumble, the final binaries are the same so this bug is probably OK.
Hmm. A few rounds of, “no, this needs to be equivalent as otherwise it’s not reproducible” (“You’re so right!”), and we had a lot of test programs, a frequent need for reminders that debugging OCaml’s Marshalling format was possibly not going to help, but we weren’t very much closer to an answer.
Stepping back, I re-framed the problem, instead asking Claude to produce a program which would give a textual dump of the debug information in each file, so we could compare it. This was interesting - especially the occasional hallucinations at having analysed “all the fields”, but we got there.
What was interesting was that we were struggling to perceive differences between
anything. Claude at this point was desperate to delve into the runtime code and
start doing hex-dumps of the marshal format to see what was actually different.
I appear to be a little older than Claude, and was more reticent about this
approach. I suggested we look at the polymorphic hash of some of these fields
instead. At this point, we started to see some differences - Claude’s inferences
at this point were working well, and there was a strong suggestion to add all
sorts of accessor functions into the Types
module to be able to introspect
some of the values in more detail than normally intended (i.e. polymorphic hash
was telling that us that some abstract values were different, but we wanted to
see what the differences really were).
Reader, I told it to use Obj.magic
instead 🫣
However, what happened next was truly fascinating and definitely very efficient.
The value being returned for one of the type IDs was simply not believable. It
was far too high. Claude also correctly observed that it was in fact a block,
and not an integer, which was what we were expecting. The human brain at this
point cuts in, and looks at the type: Types.get_id: t -> int
. No, that
accessor looks right. Brain slowly whirring; look at the code:
let get_id t = (repr t).id
Oh - it’s not an accessor (in another life, I could possibly have performed Claude’s responses…).
All I had to point out was that Types.get_id
was not an accessor, it was
normalising the result (to walk Tlink
members of the type representation), and
Claude was on it, replacing semi-elegant OCaml code with a sea of calls to Obj
functions.
But we had our answer - the type chain was different, if semantically equivalent and, more importantly, Claude then leaped to the problem.
The internal Types.new_id
reference isn’t reset between compilations 💥
A quick rebuild later, and the same debug information was given regardless of
whether utils/config.mli
was compiled at the same time as utils/config.ml
.
Go Claude. My contribution was keeping the explorations looking at relevant
parts of the system, and not disappearing off on sometimes ridiculous and
unbelievable tangents. Maybe it would have got there on its own, but who knows
the tokens required and the GPUs scorched…
Plug that back into my little script. ocamlcommon.cma still different. At this point, a line from Four Weddings and a Funeral could be heard loud and clear in the human mind. It’s the one which follows “Dear Lord, forgive me for what I am about to, ah, say in this magnificent place of worship…”.
The fix was definitely working. But a quick bit of further experimentation revealed that including other .mli files before utils/config.ml (and there are a lot) was causing the information to change.
So:
$ claude -c
As a human of hopefully normal emotional response to situations, the feeling of
being back at square one would normally have meant I’d have at least needed a
coffee before being able to face dusting off all the tools and scripts which had
been constructed in the previous investigations. But here of course the LLM
doesn’t care and was straight into using the tools previously constructed to
look at the revised problem. A lot more Obj.magic
-like investigations later
looking at the shape of some debugging information, and Claude found another bit
to reset, this time in Ctype
. All the level information in the type-checker
isn’t reset between compilations. Not a semantic issue, because the type checker
uses those numbers relatively, but again they leak into the representation of
some of the debugging information.
And it was working 🥳
Next up was trying to put those fixes into something resembling a commit series that might one day be an acceptable PR. What I really wanted was a test. Claude was great for this, although it lacks anything approximating taste (and this is me writing…!). However, with no feelings to be hurt, the pointers were easy to issue and the results impressive - especially constructing a non-trivial ocamltest block. The result is previewable in dra27/ocaml#237 on my GitHub fork, and the test is entirely Claude’s.
Having got to this stage, I extended the compiler with some of Lucas’s patches,
and started passing just the .ml files for compilation, allowing the compiler to
compile the .mli files on demand, as before. With some idle tinkering, I got to
the end of “coreall”, which is the point in OCaml’s build process where
ocamlc
, the bytecode versions of everything in tools/
and ocamllex
have
all been compiled, along with the Standard Library. That was all being done from
a single compiler process, where the OCaml script driving the compiler consisted
mostly of the list of .ml files. Coupled with the predictive load path I’d
already put together, at this stage the “plumbing” needed in the scheduler is
just:
let compile_file source_file () =
Compenv.readenv Format.std_formatter (Before_compile source_file);
let output_prefix = Compenv.output_prefix source_file in
if Filename.extension source_file = ".mli" then
Compile.interface ~source_file ~output_prefix
else
let start_from = Clflags.Compiler_pass.Parsing in
Compile.implementation ~start_from ~source_file ~output_prefix
let rec execute task =
try task ()
with effect (Load_path.Missing path), k ->
let file = Filename.chop_extension path ^ ".mli" in
execute (compile_file file);
execute (Effect.Deep.continue k)
(as an aside, when it goes to being done with Domains I’ll possibly switch it to a shallow handler, because the call stack with the deep handlers isn’t as reasonable as I’d hoped for, but to be honest I just wanted to see it work!)
Fascinatingly, all the artefacts (.cma and binaries) being produced were
identical except for the Lazy
module in the Standard Library!
$ claude -c
Claude was simultaneously amazing and useless at this. Amazing, because I was
prompting some of this while cooking a meal, so being able to bark an
instruction (actually, I hadn’t set it up for voice - I was just quickly
typing) and then leave it to think for a minute or two was strangely efficient,
because investigating this on my own would have taken too much continuous
concentration. It was useless because we didn’t get anywhere near a believable
explanation, despite various efforts at resetting things. Sometimes you just
have to say /exit
(and eat a meal…).
However, after the aforementioned meal, I dug into it a bit further. The issue here was clearly to do with some state in the compiler - if ocamlcommon.cma or ocamlmiddleend.cma were compiled, then the Lazy module differed. Incidentally, at this point this wasn’t debug information which varied, it was the actual module, but it was still semantically the same. Claude had correctly identified that it was to do with the marshalling, and we had identified that there was a difference in string sharing (so not entirely useless, in fairness). I carried on poking and, with a little bit of jerry-rigging, managed to determine the relatively small set of files in flambda and in ocamlcommon whose compilation caused the change in Lazy. I was highly suspicious it was to do with compilation of lazy values.
$ claude -c
Feeding this information to Claude was a much better trick - the reasoning at
this point would contradict its own tangents (“I should look at … but wait,
the user has given me the list of affected files”). Impressively, we did hone in
on the much more complex explanation for this third issue, which is to do with
lazy values used in globals in the Matching
module. In this particular case,
if the compiler has compiled a file which matched on a lazy, causing
Matching.code_force_lazy_block
to be forced in the compiler and thus the
CamlinternalLazy
identified to be added to the current persistent environment,
then a subsequent module (in this case lazy.ml
in the Standard Library) which
both pattern matches on a lazy and which also refers to CamlinternalLazy
ends up with two extern’d string representations of CamlinternalLazy
instead
of one. The reason is that the forced code block in Matching
still refers to a
string used in a previous persistent environment. It’s not a semantic issue at
all, but it manifests itself because the string is not shared when the
subsequent file looks up the CamlinternalLazy
identifier.
It was a battle to update the test to show this behaviour, but in fairness that would have been a battle anyway! However, we got there too.
Three reproducibility issues identified, and a viable PR produced - with tests!