Skip to content

Implement write_serializable_content on element writer #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/se/content.rs
Original file line number Diff line number Diff line change
Expand Up @@ -668,7 +668,7 @@ pub(super) mod tests {
let ser = ContentSerializer {
writer: String::new(),
level: QuoteLevel::Full,
indent: Indent::Owned(Indentation::new(b' ', 2)),
indent: Indent::Owned(Indentation::new(b' ', 2, 0)),
write_indent: false,
};

Expand All @@ -688,7 +688,7 @@ pub(super) mod tests {
let ser = ContentSerializer {
writer: &mut buffer,
level: QuoteLevel::Full,
indent: Indent::Owned(Indentation::new(b' ', 2)),
indent: Indent::Owned(Indentation::new(b' ', 2, 0)),
write_indent: false,
};

Expand Down
4 changes: 2 additions & 2 deletions src/se/element.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1486,7 +1486,7 @@ mod tests {
ser: ContentSerializer {
writer: String::new(),
level: QuoteLevel::Full,
indent: Indent::Owned(Indentation::new(b' ', 2)),
indent: Indent::Owned(Indentation::new(b' ', 2, 0)),
write_indent: false,
},
key: XmlName("root"),
Expand All @@ -1509,7 +1509,7 @@ mod tests {
ser: ContentSerializer {
writer: &mut buffer,
level: QuoteLevel::Full,
indent: Indent::Owned(Indentation::new(b' ', 2)),
indent: Indent::Owned(Indentation::new(b' ', 2, 0)),
write_indent: false,
},
key: XmlName("root"),
Expand Down
12 changes: 11 additions & 1 deletion src/se/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,17 @@ impl<'r, W: Write> Serializer<'r, W> {

/// Configure indent for a serializer
pub fn indent(&mut self, indent_char: char, indent_size: usize) -> &mut Self {
self.ser.indent = Indent::Owned(Indentation::new(indent_char as u8, indent_size));
self.indent_with_len(indent_char, indent_size, 0)
}

/// Set initial indent level for a serializer
pub fn indent_with_len(
&mut self,
indent_char: char,
indent_size: usize,
indents_len: usize,
) -> &mut Self {
self.ser.indent = Indent::Owned(Indentation::new(indent_char as u8, indent_size, indents_len));
self
}

Expand Down
95 changes: 92 additions & 3 deletions src/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ use crate::encoding::UTF8_BOM;
use crate::errors::{Error, Result};
use crate::events::{attributes::Attribute, BytesCData, BytesStart, BytesText, Event};

#[cfg(feature = "serialize")]
use {crate::de::DeError, serde::Serialize};

/// XML writer. Writes XML [`Event`]s to a [`std::io::Write`] implementor.
///
/// # Examples
Expand Down Expand Up @@ -72,7 +75,7 @@ impl<W: Write> Writer<W> {
pub fn new_with_indent(inner: W, indent_char: u8, indent_size: usize) -> Writer<W> {
Writer {
writer: inner,
indent: Some(Indentation::new(indent_char, indent_size)),
indent: Some(Indentation::new(indent_char, indent_size, 0)),
}
}

Expand Down Expand Up @@ -330,6 +333,49 @@ impl<'a, W: Write> ElementWriter<'a, W> {
.write_event(Event::End(self.start_tag.to_end()))?;
Ok(self.writer)
}

/// Serialize an arbitrary value inside the current element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be valuable to add a test as doc example here. Pay particular attention to which tag name the type will be serialized with

#[cfg(feature = "serialize")]
pub fn write_serializable_content<T: Serialize>(
self,
content: T,
) -> std::result::Result<&'a mut Writer<W>, DeError> {
use crate::se::Serializer;

self.writer
.write_event(Event::Start(self.start_tag.borrow()))?;
self.writer.write_indent()?;

let indent = self.writer.indent.clone();
let mut serializer = Serializer::new(ToFmtWrite(self.writer.inner()));

if let Some(indent) = indent {
serializer.indent_with_len(
indent.indent_char as char,
indent.indent_size,
indent.indents_len,
);
}

content.serialize(serializer)?;

self.writer
.write_event(Event::End(self.start_tag.to_end()))?;
Ok(self.writer)
}
}

#[cfg(feature = "serialize")]
struct ToFmtWrite<T>(pub T);

#[cfg(feature = "serialize")]
impl<T> std::fmt::Write for ToFmtWrite<T>
where
T: std::io::Write,
{
fn write_str(&mut self, s: &str) -> std::fmt::Result {
self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
}
}

#[derive(Clone)]
Expand All @@ -342,13 +388,13 @@ pub(crate) struct Indentation {
}

impl Indentation {
pub fn new(indent_char: u8, indent_size: usize) -> Self {
pub fn new(indent_char: u8, indent_size: usize, indents_len: usize) -> Self {
Self {
should_line_break: false,
indent_char,
indent_size,
indents: vec![indent_char; 128],
indents_len: 0,
indents_len,
}
}

Expand Down Expand Up @@ -613,4 +659,47 @@ mod indentation {
</outer>"#
);
}

#[cfg(feature = "serialize")]
#[test]
fn element_writer_serialize() {
#[derive(Serialize)]
struct Foo {
bar: Bar,
val: String,
}

#[derive(Serialize)]
struct Bar {
baz: usize,
bat: usize,
}

let mut buffer = Vec::new();
let mut writer = Writer::new_with_indent(&mut buffer, b' ', 4);
let content = Foo {
bar: Bar { baz: 42, bat: 43 },
val: "foo".to_owned(),
};

writer
.create_element("paired")
.with_attribute(("attr1", "value1"))
.with_attribute(("attr2", "value2"))
.write_serializable_content(content)
.expect("failure");

assert_eq!(
std::str::from_utf8(&buffer).unwrap(),
r#"<paired attr1="value1" attr2="value2">
<Foo>
<bar>
<baz>42</baz>
<bat>43</bat>
</bar>
<val>foo</val>
</Foo>
</paired>"#
);
Comment on lines +694 to +703
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect

<paired attr1="value1" attr2="value2">
    <bar>
        <baz>42</baz>
        <bat>43</bat>
    </bar>
    <val>foo</val>
</paired>

here. This also means that you need to test that the attribute lists will be merged correctly. It also would be valuable to move the entire test to doctests for write_serializable_content

Copy link
Collaborator

@dralley dralley Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun I would expect it to produce the same XML it would if you were to serialize it in a different context, but I don't think that would be the case if it produced the XML you suggest?

If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I asked myself this question and I can see both use cases being valid. Maybe we should leave this as an option for that function? Although I'm not sure how to tell the serializer to not produce the <foo> tags if we don't want them.

Copy link
Collaborator

@dralley dralley Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it works as it does in your test case and in my description, it just feels extremely niche. The ElementWriter construct only allows you to write one inner value wrapped in a pair of tags, and I can't think of many XML structures where you'd want 2 levels of nesting like that for only one possible value inner value. The construct is basically just too limiting to make it broadly useful.

Wheras with the other approach of only supporting it on the writer, you can still accomplish the same thing, and it would be more flexible - if slightly more verbose for that one strange case.

        writer
            .create_element("paired")
            .with_attribute(("attr1", "value1"))
            .with_attribute(("attr2", "value2"))
            .write_inner_content(|writer| {
                writer.write_serializable(&content1).unwrap();
                writer.write_serializable(&content2).unwrap();
                writer.write_serializable(&content3).unwrap();
            })?;

If it were to work the way @Mingun describes then none of this necessarily applies. I'm just not sure I understand why the outer tags wouldn't be included when on the deserialization side you would expect some struct fields to potentially be sourced from attributes. Shouldn't it work the same in the opposite direction?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dralley, I proceed from the following considerations:

  1. We expect that in <foo>...</foo> foo could be mapped to a field of a struct. This is a basic expectation.
  2. We also expect to get the inner content of <foo>...</foo>, so we could in the result a type which content is mapped to the ... in our XML. This is a second basic expectation
  3. Ok, so we expect that
    <...>
      <!-- a far-far inside some xml... -->
      <foo>...</foo>
    </...>
    
    is mapped to
    struct {
      foo: ???,
    }
    
  4. To understand what is ??? let's expand our example:
    ...
      <foo>
        <bar>...</bar>
      </foo>
    ...
    
    To access bar content we can use path foo.bar. That can be naturally expressed with nested structs:
    struct {
      foo: FooType,
    }
    struct FooType {
      bar: ???,
    }
    
  5. Because we want to build a mapping, which can be applied to XML with any nesting depth, obviously, bar should have a BarType similar to FooType. Our rules should be recursive.
  6. Now take a closer look at BarType and how it is presented in XML:
    <bar>...</bar>
    
    struct BarType {
      /*...*/,
    }
    
    We may notice that struct BarType is a <bar> element. Remember, that this XML can be described with following XSD:
    <xs:schema>
      <xs:complexType name="BarType">
        <xs:sequence>
           <!--...-->
        </xs:sequence>
      </xs:complexType>
    
      <xs:element name="bar" type="BarType"/>
    </xs:schema>
    
    Quite obvious, that rust's BarType can be mapped to XSD's BarType.
  7. The example above gives an interesting question -- where is the bar? XML responds by saying that the bar is part of the document. For example, XmlBeans implements this concept by generation a class BarDocument, which is different from class BarType that represents type of <bar> element and contains actual data.
  8. serde is very bond to json format, and json has no dedicated conception of a document -- each piece of JSON is JSON. XML is opposite -- not all pieces of XML is a document (but each document is XML piece). XmlBeans names that pieces is a Fragment. So JSON in terms of XML is a fragment.
  9. Returning to Rust, we have to admit that the Rust types will model fragments. We could introduce a special methods / types to work with documents (now we can say, that this is an object, that holds the root element name and a technical XML stuff -- namespace definitions, DTD and so on), but they will outside serde API.

I think, now it is obvious why I think, that serialization of a Rust type should not generate an element name -- because that name is not part of an xml-fragment (a type). There is, however, one significant semi-exception to this rule. enum at the top level is convenient to represent as a document. But note that even in that case the (root) element name mapped to the variant name, not the enum's type name.

I'll plan to write a doc about that in #369.


Although I must say that there are several subtle moments that may not be quite obvious. For example, serialization of sequences and enums. It is best to create tests and imagine what result we expect in each case? Will the selected mapping option be consistent?


If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

This depends on whether Foo is represent the root element type or not. In the latter case, you most likely expect that the name of the element will have a name of struct field of type Foo as I shown in the beginning of that comment.

The Deserializer and Serializer are worked as I described. For convenience, the Foo type can simultaneously represent both the data type and the document type -- the root tag can be derived from the type name, this is the only place in the serializer where the type name is used. The deserializer never uses a type name.

Copy link
Collaborator

@dralley dralley Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun So let's that Foo had some fields mapped to attributes, how does that work? If the Foo element is erased and only its fields are serialized, any fields mapped to attributes will have nowhere to be serialized to.

I can appreciate that there are limitations with the serde model which may limit our ability to achieve certain outcomes, but I would rather not do something than to do it in a way that is incorrect or illogical.

Copy link
Collaborator

@Mingun Mingun Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[derive(Serialize)]
struct Foo {
    #[serde(rename = "@attribute")]
    attribute: &'static str,

    element: Bar,
    list: Vec<&'static str>,

    #[serde(rename = "$text")]
    text: &'static str,
}
let content = Foo {
    attribute: "attribute",
    element: Bar { baz: 42, bat: 43 },
    list: vec!["first element", "second element"],
    text: "text",
};

writer
    .create_element("paired")
    .with_attribute(("attr1", "value1"))
    .with_attribute(("attr2", "value2"))
    .write_serializable_content(content)
    .expect("failure");

I would expect (indentation depends on settings):

<paired attr1="value1" attr2="value2" attribute="attribute">
  <element>
    <baz>42</baz>
    <bat>43</bat>
  </element>
  <list>first element</list>
  <list>second element</list>
  text
</paired>

}
}