Skip to content

Regression? deserialize_u64 (etc.) no longer allow visitors with non-numeric branches #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
emk opened this issue Dec 20, 2017 · 5 comments

Comments

@emk
Copy link

emk commented Dec 20, 2017

We make heavy use of serde at work, and it's been a reliable workhorse for a long time. Thank you!

When upgrading from serde 1.0.17 to 1.0.24, our open source bigml bindings hit a regression in how the numeric deserialize_$NUMERIC_TYPE handlers work.

Here's some example code (playground):

extern crate serde;
extern crate serde_json;

use serde::{Deserialize, Deserializer};
use serde::de;
use std::fmt;

struct MyNum(u64);

impl<'de> Deserialize<'de> for MyNum {
    fn deserialize<D>(deserializer: D) -> Result<MyNum, D::Error>
        where D: Deserializer<'de>
    {
        // If we're parsing a binary format, require a u64. If we're parsing
        // JSON or YAML, though, use the visitor to handle other
        // representations.
        deserializer.deserialize_u64(MyVisitor).map(MyNum)
    }
}

struct MyVisitor;

impl<'de> de::Visitor<'de> for MyVisitor {
    type Value = u64;

    fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "an integer or a string containing an integer")
    }

    fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
    where
        E: de::Error, 
    {
        Ok(v)
    }
    
    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
    where
        E: de::Error,
    {
        match v.parse() {
            Ok(i) => Ok(i),
            Err(_) => {
                Err(de::Error::custom("could not parse int"))
            }
        }
    }
}

fn main() {
    let _: MyNum = serde_json::from_str("10")
        .expect("error deserializing number");
    let _: MyNum = serde_json::from_str("\"10\"")
        .expect("error deserializing string containing number");
}

What we expected: Before, this code would parse the second input "10" as a number, using the visitor.

What happens: We get the error:

ErrorImpl { code: Message("invalid type: string \"10\", expected an integer or a string containing an integer"), line: 1, column: 4 }

Probable cause: This code looks suspicious (from the debugger):

   │243         fn deserialize_number<V>(&mut self, visitor: V) -> Result<V::Value>    │
   │244         where                                                                  │
   │245             V: de::Visitor<'de>,                                               │
   │246         {                                                                      │
   │247             let peek = match try!(self.parse_whitespace()) {                   │
   │248                 Some(b) => b,                                                  │
   │249                 None => {                                                      │
   │250                     return Err(self.peek_error(ErrorCode::EofWhileParsingValue)│
   │251                 }                                                              │
   │252             };                                                                 │
   │253                                                                                │
   │254             let value = match peek {                                           │
   │255                 b'-' => {                                                      │
   │256                     self.eat_char();                                           │
   │257                     try!(self.parse_integer(false)).visit(visitor)             │
   │258                 }                                                              │
   │259                 b'0'...b'9' => try!(self.parse_integer(true)).visit(visitor),  │
  >│260                 _ => Err(self.peek_invalid_type(&visitor)),                    │
   │261             };                                                                 │

Here, on line 260, it looks like if we don't see [-0-9], the code immediately gives up without giving the visitor a shot at handle the alternative types.

But if we consult the docs for deserialize_u64, they say:

Hint that the Deserialize type is expecting a u64 value.

In the past, this was treated strictly as a hint, but now it seems to enforce a numeric type.

Thank you for looking into this!

emk added a commit to faradayio/bigml-rs that referenced this issue Dec 20, 2017
This code has absolutely zero reason to be calling `deserialize_i32`,
and doing so triggers serde-rs/serde#1125

Let's not do that.
@Marwes
Copy link
Contributor

Marwes commented Dec 20, 2017

This is the same issue as paritytech/jsonrpc#222 which was caused by serde-rs/json#389. In that case it was easily fixed by calling deserialize_any instead since that crate are only for JSON and so it wasn't really any issue. It is harder to fix for Deserialize impls that should work for multiple formats however.

A hacky way to sort of fix your immediate issue may be to detect https://docs.serde.rs/serde/de/trait.Deserializer.html#method.is_human_readable and switch between _any and _i64 with that.

@dtolnay
Copy link
Member

dtolnay commented Dec 20, 2017

Thanks for the detailed report, and I am sorry about the breakage. This changed in serde_json 1.0.8 for what I believe is a justifiable reason (explained in the link). The workaround is to use deserialize_any which gleans type information from the input data rather than the deserializer hint.

If there is a use case we aren't hitting, I would be open to supporting it better. For example we could add a cfg in serde_json that ~doubles the compile time of every Deserialize impl in exchange for always faithfully calling the right Visitor method when your hint is wrong. Is that something you might want in bigml?

@emk
Copy link
Author

emk commented Dec 20, 2017

Thanks for the detailed report, and I am sorry about the breakage. This changed in serde_json 1.0.8 for what I believe is a justifiable reason (explained in the link). The workaround is to use deserialize_any which gleans type information from the input data rather than the deserializer hint.

Ah, thank you!

I apparently badly misunderstood what "hint" meant in all the docs. I figured that it meant, "When you have no type information (such as for a binary format), here's the type to assume by default. But if you do have type information in the format, call the visitor." And indeed, for a surprising number of combinations, it may still actually work like that?

Anyway, I wrote a lot of code against serde using the assumption that "hint" meant a default, non-binding hint, including:

...and most of this seems to be working fine, last I checked, even though it was supposedly broken in 1.0.8.

Now, the good news is that I can probably fix most of this code in a few hours. But just as a data point from one user of the library, this (personally, subjectively) felt like it was either:

  1. A breaking API change in a patch-level release, or
  2. Documentation that I somehow failed to understand, and that the API didn't protect me against. (It seems a bit odd for deserialize_i64 to take a full-fledged Visitor object as an argument when many combinations of overrides make no sense.)

Maybe it would be worth calling this out in the docs? That certainly would have helped me avoid this mistake. I don't think a code change is necessary here—I can certainly track down and fix most of the affected code without too much hassle, and bigml-rs in particular should have been using deserialize_any from the beginning. But this is now two serde users who've regressed on this change, so maybe it's worth a documentation callout?

Anyway, thank you very much for your followup and your clarifications!

emk added a commit to faradayio/bigml-rs that referenced this issue Dec 20, 2017
The discussion at serde-rs/serde#1125 suggests
that any vistor which might take multiple types should always use
`deserialize_any`. This seems like either an API or a documentation
issue to me, and I'm going to update other libraries as well.
@dtolnay
Copy link
Member

dtolnay commented Dec 25, 2017

I agree that "hint" is poorly explained in the documentation. I filed #1128 to follow up. The word was intended to mean something that is allowed to be ignored, but is not allowed to be wrong. This is the same meaning of "hint" as in "What is the capital city of France? Hint: the name begins with Z."

I also filed #1127 to track your idea of enforcing this in the Deserializer API by having different Deserializer methods take different Visitor traits.

Deserialize impls that require type information from the input should be using deserialize_any.

Going forward, I will avoid making similar changes in other formats except in a breaking release.

@emk
Copy link
Author

emk commented Dec 25, 2017

Great, thank you! I've fixed most of my affected code, and you can close this issue if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants