Skip to content

Remove thread local, replacing with DeserializeSeed #5259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 2, 2018

Conversation

Mark-Simulacrum
Copy link
Member

@dtolnay -- is this the right way to implement this? It seems very verbose, but I suppose that's to be expected...

r? @alexcrichton

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is on the right track!

if links.is_some() {
return Err(de::Error::duplicate_field("links"));
}
links = Some(map.next_value()?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write links = map.next_value()? and then use RegistryPackage { ..., links } below without an unwrap_or(None), but this is fine if you prefer the symmetry with the other fields.

where
D: de::Deserializer<'de>,
{
return deserializer.deserialize_map(Visitor(self.0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing DeserializeSeed I typically reuse the same type for the DeserializeSeed impl and the Visitor impl, so this would be deserializer.deserialize_map(self).

where
D: de::Deserializer<'de>,
{
return deserializer.deserialize_seq(Visitor);
return deserializer.deserialize_seq(Visitor(self.0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, I would use the same struct as both the DeserializeSeed and the Visitor.

drop(slot.borrow_mut().take());
res
})?;
} = RegistryPackageDefault(self.source_id.clone()).deserialize(&mut json_de)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what SourceId is but this can be RegistryPackageDefault(&self.source_id) if clone is slow.

@dtolnay
Copy link
Member

dtolnay commented Mar 29, 2018

Is there a way for us to evaluate how much of a performance improvement this change would be?

@sfackler
Copy link
Member

I seem to remember that someone had a derive for DeserializeSeed - does that actually exist?

@dtolnay
Copy link
Member

dtolnay commented Mar 29, 2018

The serde_state crate has that derive. @Mark-Simulacrum did you get a chance to evaluate whether it would be worth using? I am curious what you think.

@Mark-Simulacrum
Copy link
Member Author

That crate seems to re-export a different interface over Serde's, and I personally found it rather difficult to use in my short attempt. It seemed far less polished than Serde's normal derives, but that could be because of my own flawed expectations. I'd sort of want something like this:

#[derive(Deserialize)]
pub struct RegistryPackage<'a> {
    name: Cow<'a, str>,
    vers: Version,
    #[serde(seed = "SourceId", seed_borrowed)]
    deps: DependencyList,
    features: BTreeMap<String, Vec<String>>,
    cksum: String,
    yanked: Option<bool>,
    links: Option<String>,
}

#[derive(Deserialize)]
#[serde(seed = "SourceId", seed_borrowed)]
struct DependencyList {
    #[serde(deserialize_with = "de_deps")]
    inner: Vec<Dependency>,
}

fn de_deps<'de, D>(deserializer: D, seed: &SourceId) -> Result<Vec<Dependency>, _>;

As far as I can tell, this is difficult/impossible to do with serde_state, as it effectively wants me to use it's own DeserializeState traits, which is non-ideal IMO. I'm also hesitant to add a dependency on it from Cargo as it feels unstable.

However, it's worth noting that I don't think there's a good or easy way to do what I want automatically. What Serde really lacks today is a way to do what I want, but without duplicating all of the fields' deserializers except for the one I want to call next_value_seeded for...

@dtolnay
Copy link
Member

dtolnay commented Mar 29, 2018

Thanks Mark! ^ @Marwes in case this feedback is helpful to you.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me. Would it be possible to get an idea of the performance improvement to determine whether all the new code is worth it?

@Mark-Simulacrum
Copy link
Member Author

I'd love to get a benchmark, but I only did this because @alexcrichton told me that this is a relatively perf-critical part of Cargo and I have a vague suspicion that thread locals might be slower than this. If @alexcrichton proposes a benchmark though I'd be happy to run it.

@Marwes
Copy link

Marwes commented Mar 29, 2018

@dtolnay

Thanks Mark! ^ @Marwes in case this feedback is helpful to you.

Thanks for pinging, I jotted down some explanations about why the crate looks like it does below which should hopefully clarify it a bit.

@Mark-Simulacrum

That crate seems to re-export a different interface over Serde's, and I personally found it rather difficult to use in my short attempt.

Sorry to hear that. I implemented the whole thing for https://github.com/gluon-lang/gluon where it serializes a fairly huge object graph (across multiple crates that need different states). After I got it working I were a bit exhausted and largely left the crate as-is. It has been working without problems since so I never revisited it.

It seemed far less polished than Serde's normal derives, but that could be because of my own flawed expectations. I'd sort of want something like this:

Yeah, it is a bit rough for the above mentioned reasons. The actual derive implementation itself is actually just a fork of serde_derive with the state attributes added on top. I try to keep it in sync with the real serde_derive but it was a while since last time.

Writing the example you wrote about would be something like this with serde_state (haven't tested it)

#[derive(DeserializeState)]
#[serde(de_parameters = "'b")]
#[serde(state = "&'b SourceId")]
pub struct RegistryPackage<'a> {
    name: Cow<'a, str>,
    vers: Version,
    #[serde(state)]
    deps: DependencyList,
    features: BTreeMap<String, Vec<String>>,
    cksum: String,
    yanked: Option<bool>,
    links: Option<String>,
}

#[derive(Deserialize)]
#[serde(de_parameters = "'b")]
#[serde(state = "&'b SourceId")]
struct DependencyList {
    #[serde(deserialize_state_with = "de_deps")]
    inner: Vec<Dependency>,
}

fn de_deps<'de, D>(deserializer: D, seed: &SourceId) -> Result<Vec<Dependency>, _>;

I think that is fairly similar to what you had in mind, just a bit more explicit in places? It just moves the seed type specification to the struct (having it on the field causes problems when we try to have multiple different seeds, by having it on the struct it can only be specified once).

As far as I can tell, this is difficult/impossible to do with serde_state, as it effectively wants me to use it's own DeserializeState traits, which is non-ideal IMO.

So I originally built the crate on top of the regular DeserializeSeed and I did have a working implementation of that at a point. However, what I found was that it did not scale when using it on multiple types. Since DeserializeSeed has Value as an associated type one must define a new Seed type for every type derives seeded (de)serialization.

This seed type can of course be generated but if that is done, then there is no way for the derive to refer to the seeds that it's fields has. This can sort of be solved by introducing another trait.

trait DeserializeStateAssoc {
    type Seed: DeserializeSeed<Value = Self>;
}

impl DeserializeStateAssoc for Value {
    type Seed = ValueSeed;
}

This sort of works (I tried it) but falls apart as there is no way for the seed type to have type parameters that do not exist on the Value type.

impl<T> DeserializeStateAssoc for Value {
    type Seed = ValueSeed<T>;
}

A parameter on the trait could maybe solve that issue, but when I tried it I eventually ran into lifetime issues or coherence issues. (I don't quite remember the exact reason this failed, it kinda seems like it would work looking at it now.)

impl<T> DeserializeStateAssoc<T> for Value {
    type Seed = ValueSeed<T>;
}

So eventually I resigned myself to adding a new set of traits which had the exact properties that were needed.

FWIW, it is possible to use (De)SerializeState implementations in seeded (de)serialization using the [Seed](https://docs.rs/serde_state/0.4.3/serde_state/de/struct.Seed.html] and Seeded types (and these types are used in the generated code to be able to use serde's the _seed methods) .

@bors
Copy link
Contributor

bors commented Mar 29, 2018

☔ The latest upstream changes (presumably #5261) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

Thanks @Mark-Simulacrum! This is indeed perf-sensitive in the sense that we parse JSON and registries on cargo build when you've got a lock file, so the bigger the lock file the more we parse on all builds. AFAIK thread locals are quite speedy and don't cause a slowdown here, but I have nothing against removing them!

My "benchmark" has typically been to take Servo's checkout and do something like cargo fetch which only generates a lock file and doesn't do any builds. That, run repeatedly, should be very quick (although it isn't always...), but can also be a good before/after test. (Servo has tons of crates.io crates in the lock file)

@Mark-Simulacrum
Copy link
Member Author

Local benchmarks seem to show no performance difference.

@alexcrichton
Copy link
Member

Ok no worries! I'm fine removing the thread local with this regardless, so let's...

@bors: r+

@bors
Copy link
Contributor

bors commented Apr 2, 2018

📌 Commit 0f362cf has been approved by alexcrichton

bors added a commit that referenced this pull request Apr 2, 2018
Remove thread local, replacing with DeserializeSeed

@dtolnay -- is this the right way to implement this? It seems *very* verbose, but I suppose that's to be expected...

r? @alexcrichton
@bors
Copy link
Contributor

bors commented Apr 2, 2018

⌛ Testing commit 0f362cf with merge 85671e7...

@bors
Copy link
Contributor

bors commented Apr 2, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 85671e7 to master...

@bors bors merged commit 0f362cf into rust-lang:master Apr 2, 2018
@ehuss ehuss added this to the 1.27.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants