Skip to content

bodyTree vs contentTree #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adgad opened this issue May 14, 2025 · 3 comments
Open

bodyTree vs contentTree #84

adgad opened this issue May 14, 2025 · 3 comments

Comments

@adgad
Copy link
Collaborator

adgad commented May 14, 2025

Content Tree

In Content Tree, we have represented the output as a single root node, which (for now) has a body property. The intention was to allow for further expansion as we model the rendering data for other parts of the article - such as toppers (#73), maybe other future things.

interface Root extends Node {
	type: "root"
	body: Body
}
interface Body extends Parent {
	type: "body"
	version: number
	children: BodyBlock[]
}

The build currently outputs three different JSON schemas:

  • content-tree.schema.json - this expects the Root node with all external properties (e.g. ContentTree.full.Recommended)
  • transit-tree.schema.json - this expects the Root node without external properties (e.g. ContentTree.transit.Recommended)
  • body-tree.schema.json - this expects the Body node without external properties (e.g. ContentTree.transit.Recommended)

C&M representation

current state

Note

This section assumes a future topper is part of content-tree, to illustrate the point.

Currently in C&M we have a bodyTree field that validates against the transit-tree schema. This means the data published would look like this:

{
    "id": "https://api.ft.com/content/1234",
    "bodyTree": {
        "type": "root",
        "body": { "type": "body", "children": [...] }
        "topper": { "type": "topper", "headline": "blah", asset: {} }
    }    
}

This is somewhat counter-intuitive to someone reading the API. It also means, if/when we add a topper to the root, this would need to appear in bodyTree - which also wouldn't make sense.

option 1

  • Content Tree continues to have the root node with body and topper properties
  • bodyTree field validates against the body-tree schema
  • topperTree field validates against a future topper-tree schema
{
      "id": "https://api.ft.com/content/1234",
      "bodyTree": { "type": "body", "children": [...] }
      "topperTree": { "type": "topper", "headline": "blah", asset: {} }
}

This was our original intention when planning to bring content-tree into C&M. The intention was that bodyTree would be analogous to bodyXML, which would lead to a more straightforward migration. topperTree doesn't necessarily have an equivalent in the existing API, so does not have the same consideration. Keeping them as separate fields in C&M may also make things simpler for use cases where a consumer does not need the entire content (e.g. RSS feeds only need the body and wouldn't really need a topper)

The downside is that the content-tree root becomes somewhat irrelevant, as nothing would use it in reality.

It also means FT.com consumers need to validate multiple separate fields.

option 2

  • Content Tree continues to have the root node with body and topper properties
  • contentTree field validates against the transit-tree schema
{
    "id": "https://api.ft.com/content/1234",
    "contentTree": {
        "type": "root",
        "body": { "type": "body", "children": [...] }
        "topper": { "type": "topper", "headline": "blah", asset: {} }
    }    
}

Users that require just the body would need to access contentTree.body

This option does mean there is a clearer relationship between the content-tree spec and the C&M field. And that we don't need to generate separate schemas just for the body and topper.

It is perhaps slightly more involved for consumers expecting just one or the other, but maybe not in a bad way?

It also might be nicer if we for example have several other properties we might add to the root. An off-the-cuff example might be something like colourScheme or pageLayout, which could be properties affecting both bodies and toppers. Having a single field means we don't need to add them individually to the C&M schema.

@apaleslimghost
Copy link
Member

what if the entire response is the root node? would that work? i think it makes most sense to me conceptually, and it seems like it would be easiest in terms of validation

{
    "type": "root",
    "id": "https://api.ft.com/content/1234",
    "body": { "type": "body", "children": [...] }
    "topper": { "type": "topper", "headline": "blah", asset: {} }
}

@epavlova
Copy link
Contributor

After some discussion in the team around the two options, we’d prefer to go with option 1 — having one field for bodyTree and a separate field for topperTree. Some thoughts we have:

  • There are a lot of places in the C&M platform which work specifically with the body part of the content and they will need some tweaking. Those small tweaks are the kind of transformations of published fields values - which we are hoping to avoid as much as possible in the future. Even though they are small, working with the body only is widespread in our codebase.
  • With option 2 the Go representation of the tree would need to be extended to understand toppers.
  • With option 2 migrating historical content will be more complex for us.
  • We’ll need separate validation schemas anyway — for example, fields like the summary in ContentPackage will only use the bodyTree schema.

@adgad
Copy link
Collaborator Author

adgad commented May 16, 2025

I'd be okay with that, agree it will make the migration easier. Also I hadn't considered other content types

@apaleslimghost's suggestion is interesting, but I think would only work if the new /content-tree api was just for content tree? Currently it's I guess an eventual replacement for /internalcontent - which means we'd have to add all the other fields (annotations, byline, whatever) into the schema. Maybe not a terrible thing?! But would be quite a change, and I dunno how it scales to other content types

@adgad adgad closed this as completed May 16, 2025
@adgad adgad reopened this May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants