Skip to content

draft: augment yaml definitions with test schema #1659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: v2.x
Choose a base branch
from

Conversation

nirinchev
Copy link

This is very much a POC, but I figured I'd open up a PR to get early feedback on the direction we're heading in.

For context, the devtools team wants to use the yaml definitions from the PHP driver to build an autocomplete experience for the aggregation pipelines in mongosh/compass. We've built a tool to convert the yaml to typescript definitions and it seems to be working well for the base case, but to polish it, we'd need to enhance the yaml definitions to allow us to generate more precise type definitions.

Here's a list of the high-level changes we're proposing (will be updated over time):

  • Update schema.json to remove duplicate $comment fields (since most json parsers are not happy about it).
  • Update all yaml definitions to add schema information for the test cases.
    • The schema shape matches SimplifiedSchema from the mongodb-schema package. This is a little bit verbose since it supports multiple types per field, even though the docs rarely have heterogeneous schemas.
    • Schema is generated automatically by crawling the docs website at the link pointed to by the test and then using the mongodb-schema package to infer a probabilistic schema from the example collection.
  • Removed a few tests that no longer seem to point to anything in the docs - e.g. firstN:Null and Missing Values.

Copy link
Member

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good idea to have the schema of the source collection.

@@ -1,6 +1,6 @@
# $schema: ../schema.json
name: $accumulator
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/accumulator/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to keep quotes around strings? This style changes make the diff hard to read. Also, I use quotes by default around every string to prevent unexpected parsing rule from Yaml.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, looks like the quotes usage is inconsistent in these files - I'd be inclined to define a prettier rule and reformat them in a separate PR to make this one less noisy if that'd make sense to you. The reason for these getting removed is that the way we're adding the schema information is parsing the yaml, fetching the schema from the docs, then writing it, which results in the file getting reformatted based on the tool conventions (we use https://www.npmjs.com/package/js-yaml here).

@@ -25,7 +25,7 @@ arguments:
-
name: p
type:
- resolvesToArray # of resolvesToNumber
Copy link
Member

@GromNaN GromNaN Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping this comments as a note for later, as we would like to extend type definition for generic lists and hashes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense - same as above, those got removed due to the yaml parser not preserving comments, but will try and find a way to avoid that.

pipeline:
-
$sample:
size: 100
-
$group:
_id: ~
_id: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that both ~ and null can be used according to the spec.

Comment on lines +48 to +49
onNull:
bytes: !!binary AAAAAAAAAAAAAAAAAABAMA==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new notation seems less legible to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will look at tweaking the generator to ensure it preserves the old notation.

@@ -24,99 +24,122 @@ arguments:
A positive integral expression that is either a constant or depends on the _id value for $group.
tests:
-
name: 'Null and Missing Values'
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/firstN/#null-and-missing-values'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a few tests that no longer seem to point to anything in the docs - e.g. "firstN:Null and Missing Values".

This example is in the "Behavior" part of the docs.

https://www.mongodb.com/docs/manual/reference/operator/aggregation/firstN/#null-and-missing-values

"type": "string",
"enum": [
"array",
"object",
"single",
"search"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this value is not used.

Comment on lines +323 to +345
"enum": [
"Array",
"Binary",
"Boolean",
"Code",
"CodeWScope",
"Date",
"Decimal128",
"Double",
"Int32",
"Int64",
"MaxKey",
"MinKey",
"Null",
"ObjectId",
"BSONRegExp",
"String",
"BSONSymbol",
"Timestamp",
"Undefined",
"Document",
"Number"
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to converge on the same type names for operator return types and argument types. I'm fine with changing the names in expressions.php to be consistent with other tools.

"Binary",
"Boolean",
"Code",
"CodeWScope",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CodeWScope type specific to mongosh?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the deprecated code_w_s bson type. Realistically, I don't expect any of the more advanced types to be found in the docs, but I generated the schema from the type definitions and opted not to manually edit it to make it easier to maintain if it ever changes. If we wanted to clean it up, we can definitely remove the deprecated types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: For additional context, PHPC conditionally encodes Code or CodeWScope based on whether the MongoDB\BSON\Javascript object was constructed with a non-null $scope parameter (see: php_phongo_javascript_init).

@@ -11,7 +11,7 @@ arguments:
name: specification
type:
- expression
variadic: object
variadic: object # XXX: This should somehow allow explicit typings for { fieldName: 1|0 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is "expression", because any expression is accepted to compute a value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants