bodyTree vs contentTree #84

adgad · 2025-05-14T16:08:56Z

Content Tree

In Content Tree, we have represented the output as a single root node, which (for now) has a body property. The intention was to allow for further expansion as we model the rendering data for other parts of the article - such as toppers (#73), maybe other future things.

interface Root extends Node {
	type: "root"
	body: Body
}

interface Body extends Parent {
	type: "body"
	version: number
	children: BodyBlock[]
}

The build currently outputs three different JSON schemas:

content-tree.schema.json - this expects the Root node with all external properties (e.g. ContentTree.full.Recommended)
transit-tree.schema.json - this expects the Root node without external properties (e.g. ContentTree.transit.Recommended)
body-tree.schema.json - this expects the Body node without external properties (e.g. ContentTree.transit.Recommended)

C&M representation

current state

Note

This section assumes a future topper is part of content-tree, to illustrate the point.

Currently in C&M we have a bodyTree field that validates against the transit-tree schema. This means the data published would look like this:

{
    "id": "https://api.ft.com/content/1234",
    "bodyTree": {
        "type": "root",
        "body": { "type": "body", "children": [...] }
        "topper": { "type": "topper", "headline": "blah", asset: {} }
    }    
}

This is somewhat counter-intuitive to someone reading the API. It also means, if/when we add a topper to the root, this would need to appear in bodyTree - which also wouldn't make sense.

option 1

Content Tree continues to have the root node with body and topper properties
bodyTree field validates against the body-tree schema
topperTree field validates against a future topper-tree schema

{
      "id": "https://api.ft.com/content/1234",
      "bodyTree": { "type": "body", "children": [...] }
      "topperTree": { "type": "topper", "headline": "blah", asset: {} }
}

This was our original intention when planning to bring content-tree into C&M. The intention was that bodyTree would be analogous to bodyXML, which would lead to a more straightforward migration. topperTree doesn't necessarily have an equivalent in the existing API, so does not have the same consideration. Keeping them as separate fields in C&M may also make things simpler for use cases where a consumer does not need the entire content (e.g. RSS feeds only need the body and wouldn't really need a topper)

The downside is that the content-tree root becomes somewhat irrelevant, as nothing would use it in reality.

It also means FT.com consumers need to validate multiple separate fields.

option 2

Content Tree continues to have the root node with body and topper properties
contentTree field validates against the transit-tree schema

{
    "id": "https://api.ft.com/content/1234",
    "contentTree": {
        "type": "root",
        "body": { "type": "body", "children": [...] }
        "topper": { "type": "topper", "headline": "blah", asset: {} }
    }    
}

Users that require just the body would need to access contentTree.body

This option does mean there is a clearer relationship between the content-tree spec and the C&M field. And that we don't need to generate separate schemas just for the body and topper.

It is perhaps slightly more involved for consumers expecting just one or the other, but maybe not in a bad way?

It also might be nicer if we for example have several other properties we might add to the root. An off-the-cuff example might be something like colourScheme or pageLayout, which could be properties affecting both bodies and toppers. Having a single field means we don't need to add them individually to the C&M schema.

The text was updated successfully, but these errors were encountered:

apaleslimghost · 2025-05-14T16:18:41Z

what if the entire response is the root node? would that work? i think it makes most sense to me conceptually, and it seems like it would be easiest in terms of validation

{
    "type": "root",
    "id": "https://api.ft.com/content/1234",
    "body": { "type": "body", "children": [...] }
    "topper": { "type": "topper", "headline": "blah", asset: {} }
}

epavlova · 2025-05-15T15:03:03Z

After some discussion in the team around the two options, we’d prefer to go with option 1 — having one field for bodyTree and a separate field for topperTree. Some thoughts we have:

There are a lot of places in the C&M platform which work specifically with the body part of the content and they will need some tweaking. Those small tweaks are the kind of transformations of published fields values - which we are hoping to avoid as much as possible in the future. Even though they are small, working with the body only is widespread in our codebase.
With option 2 the Go representation of the tree would need to be extended to understand toppers.
With option 2 migrating historical content will be more complex for us.
We’ll need separate validation schemas anyway — for example, fields like the summary in ContentPackage will only use the bodyTree schema.

adgad · 2025-05-16T09:12:52Z

I'd be okay with that, agree it will make the migration easier. Also I hadn't considered other content types

@apaleslimghost's suggestion is interesting, but I think would only work if the new /content-tree api was just for content tree? Currently it's I guess an eventual replacement for /internalcontent - which means we'd have to add all the other fields (annotations, byline, whatever) into the schema. Maybe not a terrible thing?! But would be quite a change, and I dunno how it scales to other content types

adgad closed this as completed May 16, 2025

adgad reopened this May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bodyTree vs contentTree #84

bodyTree vs contentTree #84

adgad commented May 14, 2025 •

edited

Loading

apaleslimghost commented May 14, 2025

epavlova commented May 15, 2025

adgad commented May 16, 2025

bodyTree vs contentTree #84

bodyTree vs contentTree #84

Comments

adgad commented May 14, 2025 • edited Loading

Content Tree

C&M representation

current state

option 1

option 2

apaleslimghost commented May 14, 2025

epavlova commented May 15, 2025

adgad commented May 16, 2025

adgad commented May 14, 2025 •

edited

Loading