Skip to content

Structured diagnostics for the compiler #13766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: trunk
Choose a base branch
from

Conversation

Octachron
Copy link
Member

See also ocaml/RFCs#45 and https://icfp24.sigplan.org/details/ocaml-2024-papers/16/Structured-diagnostics-for-the-OCaml-compiler for previous discussions .

A limitation of the current compiler diagnostics (error messages, warnings, time profile information and other compiler developer debugging output) is that it cannot be parsed easily and reliably by developer tools. This creates an impedance mismatch between the compiler and those tools that tend to resort to trying to parse manually compiler error message.

This PR proposes to fix this problem by adding to the compiler a machinery for building structured compiler diagnostics before displaying them in various formats.

This PR is probably best reviewed commit by commit:

  • the first eight commits slowly introduce the new machinery part by part
  • commit 9 to 14 converts the current compiler diagnostics to this new machinery (going through the error messages, the debugging outputs, then the toplevel, and finally the compiler configuration)
  • commit 15 adds a new tools for instrospecting the compiler diagnostics, and the following commit uses this new tools
    to export json-schema and adds the corresponding tests in the testsuite
  • the last commits 4 commits update the manual and the man pages.

@Octachron
Copy link
Member Author

Note that I have tested some quick prototypes for deserializers and a bridge between the proposed compiler diagnostic and the grace library at https://github.com/Octachron/sdiag-prototypes .

@johnyob
Copy link
Contributor

johnyob commented Jan 31, 2025

Note that I have tested some quick prototypes for deserializers and a bridge between the proposed compiler diagnostic and the grace library at https://github.com/Octachron/sdiag-prototypes .

Its lovely to see grace being used here :) I look forward to playing with this

@Octachron Octachron force-pushed the structured_diagnostics_full branch from 8edb2d8 to 8e07bd5 Compare February 3, 2025 12:17
@gasche gasche self-assigned this Feb 5, 2025
@Octachron Octachron force-pushed the structured_diagnostics_full branch from a3ede5d to 355143d Compare February 18, 2025 15:46
@Octachron
Copy link
Member Author

I have pushed a commit that hide the change of API to the toplevel library beyond a compatibility layer.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first round of looking at the code, rather shallow as a review.

  • I did not look at the manual yet, as I read in commit order; this was a mistake and I'll start there on my next round.
  • I did not look at the new modules (diagnostics, diagnostics_history) in depth, only quickly at the interface (assessment: too hard to follow for me, I need to read the manual first). It's a lot of non-critical code and I plan to declare that it is reasonable and go by the tests mostly, instead of trying to review everything line-by-line.
  • I looked at the code that adapts the compiler codebase. This is as I expected, invasive but reasonable.


| List: 'a typ -> 'a list typ (** Combinators *)
| Pair: 'a typ * 'b typ -> ('a * 'b) typ
(** Specialize (2,3,4)-tuples to avoid defining tuples as heterogeneous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: clearly(?!) this should be {2,3,4}-tuples rather than (2,3,4)-tuples.

record at update [u]. The field is optional if [opt] is [Some true] *)

val new_field_opt: vl update -> string -> 'a typ -> 'a field
(** [new_field_opt] is a short-hand for [new_field ~opt:true] *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect either two functions or an optional argument, do we really want to offer both?

type point =
| Inception
(** constructor only: derived constructor are incepted before being
published.*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "inception" mean.

(Also: derived constructors are incepted ...)

(** Last version present in the history *)
val current_version: 'a t -> version

(** An ['id update] is a version registered in the history ['id History.t].*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

['id Diagnostic_history.t], I suppose?

(** An ['id update] is a version registered in the history ['id History.t].*)
type 'a update
val new_version: 'a t -> version -> 'a update
(** [new_version h v] unconditionnally creates an update but register an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registers

@@ -425,8 +452,8 @@ let loading_hint_printer ppf cu =
let find_with_ext ext =
try Some (Load_path.find_normalized (cu ^ ext)) with Not_found -> None
in
fprintf ppf
"@.Hint: @[\
fprintf ppf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra spacing of fprintf looks unintentional.


let print log =
log_variables log;
Log.flush log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the new -config outputs on my machine and I observe a bug:

  • ocamlc -config works exactly as before
  • ocamlc -log-format {json,sexp} -config work as expected (nice!)
    (nitpick: the sexp format will sometimes fit several fields on the same line, I would rather have a vertical printing with at most one field per line)
  • ocamlc -log-format stdout -config prints only the values and not the keys/variables, so that output is incorrect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Octachron points out that in fact stdout is not supposed to be the same as not setting anything, currently there is stdout and stdout-with-keys and the latter is not exposed. It would be nice if both options were exposed, and maybe they could be named stdout-light and stdout-full, or maybe stdout-values and just stdout.

(* the GNU Lesser General Public License version 2.1, with the *)
(* special exception on linking described in the file LICENSE. *)
(* *)
(**************************************************************************)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .mli would be an okay place to include some documentation of how to use diaginfo.

Help output

The documentation of -schema-format is weird.

  -schema-format {json|adt}
                <name> print the schema <name> in <name> format

I don't understand what it means. (Does this command expect two parameters, or just one that is either json or adt? I assume the latter.)

How to get a list of schemas?

On a first try using the tool, I failed to understand how to get a list of schemas. I think that there should be an explicit option, for example --list, to list all schemas. When I printed the --help output to write the comment about I noticed the part about *, but this does not seem to work easily on my machine:

$ ./runtime/ocamlrun ./tools/ocamldiaginfo -schema *
Unknown schema name: aclocal.m4

$ ./runtime/ocamlrun ./tools/ocamldiaginfo -schema "*"

The first comes from the fact that my shell interpolates the *, and the second produces empty output.

History output

A trailing newline seems to be missing.

$ ./runtime/ocamlrun ./tools/ocamldiaginfo -history | tail -n 5
      New label backtrace, ?l structured_text
      New label compiler, ?compiler
      New label errors, ?l l structured_text
      New label trace, ?l l structured_text
      Seal[gasche@framawork github-trunk (structured_diagnostics_full)]$ 

"properties" :
{
"metadata" : { "$ref" : "#/$defs/metadata"},
"linear_magic_number" : { "type" : "string"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: it's slightly weird that there is a space before the key ({ "type") but not after the value ("string"}).

Fun.protect (fun () -> f log x)
~finally:(fun () -> Log.flush log)

let dir_load = with_log V2.dir_load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is confusing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants