Limit to_device EDU size to 65536 #18416

MatMaul · 2025-05-09T14:15:27Z

If a set of messages exceeds this limit, the messages are splitted across several EDUs.

Should fix #17035.

There is currently no official specced limit for EDUs, but the consensus seems to be that it would be useful to have one to avoid this bug by bounding the transaction size.

As a side effect it also limits the size of a single to-device message to a bit less than 65536.

This should probably be added to the spec similarly to the message size limit.

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file.
Code style is correct

MadLittleMods · 2025-05-20T15:21:51Z

synapse/api/constants.py

@@ -28,6 +28,7 @@

 # the max size of a (canonical-json-encoded) event
 MAX_PDU_SIZE = 65536
+MAX_EDU_SIZE = 65536


Suggested change

MAX_EDU_SIZE = 65536

# This isn't spec'ed but is our own reasonable default to play nice with Synapse's

# `max_request_size`/`max_request_body_size`. We chose the same as `MAX_PDU_SIZE` as our

# `max_request_body_size` math is currently limited by 200 `MAX_PDU_SIZE` things. The

# spec for a `/federation/v1/send` request sets the limit at 100 EDU's and 50 PDU's

# which is below that 200 `MAX_PDU_SIZE` limit (`max_request_body_size`).

#

# Allowing oversized EDU's results in failed `/federation/v1/send` transactions (because

# the request overall can overrun the `max_request_body_size`) which are retried over

# and over and prevent other outbound federation traffic from happening.

MAX_EDU_SIZE = 65536

synapse/handlers/devicemessage.py

MadLittleMods · 2025-05-20T15:43:29Z

synapse/handlers/devicemessage.py

+            edu_contents = get_device_message_edu_contents(
+                sender_user_id, message_type, messages, context
+            )
+            remote_edu_contents[destination] = edu_contents


Instead of changing the structure of remote_edu_contents (was a map from destination to EDU meta) (to a map from destination to multiple EDU meta), could we just call add_messages_to_device_inbox(...) multiple times?

The multi version should have some gain performance side, since it's in an unique transaction, and pre-allocate all the stream ids.

I am fine if we decide to keep it simple and sacrifice some perf for that, but I am not sure it's worth it here it's not overly complicated.

MadLittleMods · 2025-05-20T15:51:02Z

synapse/handlers/devicemessage.py

+        "type": message_type,
+        "message_id": random_string(16),
+    }
+    # This is the size of the full EDU without any messages and without the opentracing context


Why is the BASE_EDU_SIZE calculated without BASE_EDU_CONTENT["org.matrix.opentracing_context"]?

To my understanding it's some data helping reporting to the good opentracing context through the code, and it's stripped out of the payload before sending the transaction.

@MatMaul Thanks for linking the context! We should explain this here in the comment

MadLittleMods · 2025-05-20T15:58:34Z

synapse/handlers/devicemessage.py

+        if current_edu_size + message_entry_size > MAX_EDU_SIZE:
+            edu_contents.append(current_edu_content)
+            logger.debug(
+                "Splitting %d device messages from %s into EDU msgid %s, %d EDUs queued",


Suggested change

"Splitting %d device messages from %s into EDU msgid %s, %d EDUs queued",

"Splitting %d to-device messages from %s into EDU (message_id=%s), (total EDUs so far: %d)",

MadLittleMods · 2025-05-20T16:15:01Z

synapse/handlers/devicemessage.py

+
+    edu_contents = []
+
+    current_edu_content: JsonDict = deepcopy(BASE_EDU_CONTENT)


Instead of this cloning, perhaps it's easier to understand if we just have a little helper (maybe performs better as well 🤷):

def create_new_to_device_edu_content() -> JsonDict: """Create a new `m.direct_to_device` EDU `content` object with a unique message ID.""" content = { "messages": {}, "sender": sender_user_id, "type": message_type, "message_id": random_string(16), "org.matrix.opentracing_context": json_encoder.encode(context) } return content

MadLittleMods · 2025-05-20T16:18:05Z

synapse/handlers/devicemessage.py

+) -> List[JsonDict]:
+    """
+    This function takes a dictionary of messages and splits them into several EDUs if needed.
+


Could use a docstring for the args and return.

And context of why we care to split similar to how we explain it for MAX_EDU_SIZE above.

MadLittleMods · 2025-05-20T16:19:35Z

synapse/handlers/devicemessage.py

+        logger.debug(
+            "Queuing last %d device messages from %s into EDU msgid %s, %d EDUs queued",
+            len(current_edu_content["messages"]),
+            sender_user_id,
+            current_edu_content["message_id"],
+            len(edu_contents),
+        )


Suggested change

logger.debug(

"Queuing last %d device messages from %s into EDU msgid %s, %d EDUs queued",

len(current_edu_content["messages"]),

sender_user_id,

current_edu_content["message_id"],

len(edu_contents),

)

logger.debug(

"Splitting remaining %d device messages from %s into EDU (message_id=%s), (total EDUs so far: %d)",

len(current_edu_content["messages"]),

sender_user_id,

current_edu_content["message_id"],

len(edu_contents),

)

MadLittleMods · 2025-05-20T16:21:14Z

tests/rest/client/test_sendtodevice.py

+
+        mock_send_transaction.reset_mock()
+
+        # 2 messages, each just big enough to fit in an EDU


Suggested change

# 2 messages, each just big enough to fit in an EDU

# 2 messages, each just big enough to fit into their own EDU

MadLittleMods · 2025-05-20T16:23:54Z

tests/rest/client/test_sendtodevice.py

+
+        self.assertEqual(mock_send_transaction.call_count, 2)
+
+        # A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes


For my own understanding, this happens at

synapse/synapse/federation/sender/per_destination_queue.py

Line 747 in 99cbd33

) = await self.queue._get_to_device_message_edus(edu_limit - 10)

It would be good to label this magic value as a constant which we could also cross-reference here.

erikjohnston · 2025-05-23T09:06:59Z

synapse/handlers/devicemessage.py

+
+    for recipient, message in messages.items():
+        # We remove 2 for the curly braces and add 1 for the colon
+        message_entry_size = len(encode_canonical_json({recipient: message})) - 2 + 1


Drive-by thought: instead of trying to work out the lengths and calculate the number of messages we can add, it might be easier to just generate the EDU and then check the size of it. If its too big you half the number of messages and try again.

The common case will be that we don't need to split up the EDU, at the expense of duplicating some work. It feels a bit hacky, but I think might be a little less brittle?

I am not sure it will be that much simpler for comprehension TBH.

If we think we can eat the perf cost, the simpler is probably to call encode_canonical_json on the whole EDU for each added message, and remove it and create a new EDU if it's larger than the max.

My calculation tricks were to avoid doing a full serialization on each added message.

And add a special case tried first where we try to put everything in one ?

I don't know TBH, the idea of spitting in 2 is nice too but I feel like it is going to be quite annoying to implement and hence not simpler.

Limit to_device EDU size to 65536

61fa1b9

MatMaul force-pushed the edu-limit-size branch from 8add186 to 61fa1b9 Compare May 9, 2025 14:19

MatMaul marked this pull request as ready for review May 9, 2025 14:38

MatMaul requested a review from a team as a code owner May 9, 2025 14:38

MatMaul and others added 10 commits May 12, 2025 01:43

Increment to_device stream for each EDU otherwise we loose some

c80f24d

Simplify

eda00e1

Add comment

6627bed

Cosmetic

c6bc691

Cosmetic

57ab541

Add logs

a0e6dc3

Improve logs

9ce1488

fix bug

5e86c59

Improve logs

35d98b6

Merge remote-tracking branch 'origin/develop' into edu-limit-size

6be7bcc

MadLittleMods reviewed May 20, 2025

View reviewed changes

erikjohnston reviewed May 23, 2025

View reviewed changes

-MAX_EDU_SIZE = 65536
+# This isn't spec'ed but is our own reasonable default to play nice with Synapse's
+# `max_request_size`/`max_request_body_size`. We chose the same as `MAX_PDU_SIZE` as our
+# `max_request_body_size` math is currently limited by 200 `MAX_PDU_SIZE` things. The
+# spec for a `/federation/v1/send` request sets the limit at 100 EDU's and 50 PDU's
+# which is below that 200 `MAX_PDU_SIZE` limit (`max_request_body_size`).
+#
+# Allowing oversized EDU's results in failed `/federation/v1/send` transactions (because
+# the request overall can overrun the `max_request_body_size`) which are retried over
+# and over and prevent other outbound federation traffic from happening.
+MAX_EDU_SIZE = 65536

	"Splitting %d device messages from %s into EDU msgid %s, %d EDUs queued",
	"Splitting %d to-device messages from %s into EDU (message_id=%s), (total EDUs so far: %d)",


		edu_contents = []

		current_edu_content: JsonDict = deepcopy(BASE_EDU_CONTENT)


		mock_send_transaction.reset_mock()

		# 2 messages, each just big enough to fit in an EDU


		self.assertEqual(mock_send_transaction.call_count, 2)

		# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes

Limit to_device EDU size to 65536 #18416

Are you sure you want to change the base?

Limit to_device EDU size to 65536 #18416

Uh oh!

Conversation

MatMaul commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MatMaul commented May 9, 2025 •

edited

Loading