Skip to content

Add builtin replacements for some Json functions #5706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 18, 2025

Conversation

dolio
Copy link
Contributor

@dolio dolio commented May 17, 2025

  • Replacements are for Json.toText and Json.unconsText, using Aeson and some related libraries
  • Expanded function replacement system to allow for specific indexes into mutual recursion, because one of the Json functions is not index 0
  • Fixed some other infrastructure
    • matchBoolVal and BoolVal pattern were backwards
    • decodeVal for Seq Val was implemented wrong, uncaught due to unsafe coercion and a lack of use

Requires @unison/json version 1.3.2 to operate.

In @pchiusano/misc-benchmarks

Json parsing (per document)
344.841µs ==> 35.046µs

So, roughly 10x faster.

dolio and others added 2 commits May 17, 2025 02:40
- Replacements are for Json.toText and Json.unconsText, using Aeson
  and some related libraries
- Expanded function replacement system to allow for specific indexes
  into mutual recursion, because one of the Json functions is not
  index 0
- Fixed some other infrastructure
  * `matchBoolVal` and `BoolVal` pattern were backwards
  * `decodeVal` for `Seq Val` was implemented wrong, uncaught due to
    unsafe coercion and a lack of use
@dolio
Copy link
Contributor Author

dolio commented May 17, 2025

Fixes #5691

@dolio dolio force-pushed the topic/json-replacement branch 2 times, most recently from 5342499 to f677b41 Compare May 17, 2025 07:22
@dolio dolio force-pushed the topic/json-replacement branch from f677b41 to 489275d Compare May 17, 2025 07:30
Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!!

You can merge once CI passes, can you just verify that the tests all pass in unison/json if you haven't already?

@@ -581,7 +581,7 @@ pattern IntVal i <- (matchIntVal -> Just i)

matchBoolVal :: Val -> Maybe Bool
matchBoolVal = \case
(BoxedVal (Enum r t)) | r == Ty.booleanRef -> Just (t == TT.falseTag)
(BoxedVal (Enum r t)) | r == Ty.booleanRef -> Just (t == TT.trueTag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was flipped, how come it wasn't noticed before? Was this code just unused until now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably not used in this mode.

@pchiusano
Copy link
Member

An annoying thing we just noticed is different handling of duplicate keys. Aeson sticks the keys of an object into a map, which orders and dedupes them. But the current Unison library doesn't do this. We're not yet sure how we'd like to handle.

The reordering is okay IMO but I'm not a fan of the silent deduping behavior (even though the spec allows it). It's one of those things where if you have an application that needs it, silent deduping is the worst thing since the duplicate is gone before your decoder even sees it. So your application cannot react to the situation in any way.

@dolio
Copy link
Contributor Author

dolio commented May 17, 2025

Okay, I replaced the aeson usage with a port of the @unison/json code, so that it doesn't do the normalizing and such that aeson does.

@pchiusano/misc-benchmarks is even faster now, actually. Around 20μs. It could be because aeson does some extra work, but I think it's more likely because the aeson version had double traversals of everything–one to translate between our version and aeson's version, and one to actually do the operation.

I'm going to add a replacement for tryUnconsText, too, since I can now.

@dolio
Copy link
Contributor Author

dolio commented May 17, 2025

Okay, with @unison/json version 1.3.3 Json.tryUnconsText will also be replaced, which makes the benchmark very slightly faster from less indirection.

@unison/json tests pass, but you should probably look through my port to see if anything looks weird.

@dolio
Copy link
Contributor Author

dolio commented May 17, 2025

One idea is: I don't see many completely randomly generated tests. Maybe it'd be a good idea to write something that randomly generates both good and bad strings, and tests:

tryUnconsText str === toEither (do unconsTextOrThrow str)

I.E. test the replacement against the original implementation, both good and bad cases.

@pchiusano
Copy link
Member

Nice!!

I added some property-based tests to main of @unison/json, including both valid and invalid JSON.

Everything looks good, but a property test based on the tryUnconsText str === toEither (do unconsTextOrThrow str) found some minor inconsistencies between the two implementations. You can see these if you just pull and run the test command.

  • It looks like the Haskell implementation has a few different error messages. It's generating "expected ',' or '}' where the Unison version only generates "expected ','". I like the Haskell treatment if you want to update the Unison version.
  • Slightly different treatment of the non-parsing input n ull. The Haskell implementation reports a position of 0 (it doesn't mark the n as consumed) but the pure Unison implementation does. I like the Haskell version better here, too, if you want to update the Unison to match.
  • There might be others that appear after fixing the above, and you can switch either the Haskell or Unison implementation to match, whichever you think has better behavior. I don't care much about error messages differing but I'd try to make sure the error position and remainder matches. (The "relaxedRuntimeConsistency" test below checks for this so if you get tired of getting error messages to match exactly, the other test can be removed)
    18. core.Json.tests.relaxedRuntimeConsistency       ✗ 
                                                        n  ullM:
                                                          elements not equal
                                                          ((0, "n  ullM"), (1, "  ullM"))
                                                      
    19. core.Json.tests.runtimeReplacementConsistency   ✗ 
                                                        {"WR 5":-4381041150.1595          9,"vXj7CszP":"sR569n7Lq3"}C0eZ9y
                                                      :
                                                          elements not equal
                                                          ( Left
                                                          (ParseError.ParseError
                                                            "expected ',' or '}'"
                                                            27
                                                            "9,\"vXj7CszP\":\"sR569n7Lq3\"}C0eZ9y\n")
                                                      , Left
                                                          (ParseError.ParseError
                                                            "expected ','"
                                                            27
                                                            "9,\"vXj7CszP\":\"sR569n7Lq3\"}C0eZ9y\n")
                                                      )

Be sure to debug.clear-cache in between changes!

@dolio
Copy link
Contributor Author

dolio commented May 18, 2025

Okay, yeah. I actually intentionally made both those changes because I thought the unison one didn't make sense. I also changed the array parser to say it was parsing an array, not an object.

I'll update the unison implementation.

@dolio
Copy link
Contributor Author

dolio commented May 18, 2025

Okay, I added a contribution to @unison/json that updates the implementation with my intentional changes above. I added the new replacement hashes here.

The tests didn't find any other inconsistencies.

@pchiusano
Copy link
Member

Sweet! Thanks @dolio

You can cut a release of @unison/json based on this

@pchiusano pchiusano merged commit f4a9b0b into trunk May 18, 2025
31 checks passed
@pchiusano pchiusano deleted the topic/json-replacement branch May 18, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants