Skip to content

System workqueue: Prevent blocking API calls #87522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/kernel/services/threads/workqueue.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,15 @@ operations that are potentially blocking (e.g. taking a semaphore) must be
used with care, since the workqueue cannot process subsequent work items in
its queue until the handler function finishes executing.

.. warning::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this warning re system workqueue is being added here when we have a section dedicated to the system workqueue that already touches on the subject?

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not see this warning, it lacks the warning regarding deadlocking which is the crucial one :) I can move it to this section if we decide on continuing with this PR :)

Copy link
Collaborator

@teburd teburd Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the deadlocking issue true of any work queue though? There's nothing particularly special about the system work queue other than it tends to get used by default a lot?

I also think its worth pointing out a simple scenario where this occurs as well.

E.g. one work item is taking a semaphore a subsequent work item is giving. Work queue is now dead locked.

Blocking calls aren't inherently the issue here either I'd note, its a possible symptom but not the cause of the deadlock.

A call to i2c_transfer() for example in a work queue item is a blocking call, and may cause the work queue thread to pend. Just because it blocks doesn't inherently mean there will be a deadlock!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the deadlocking issue true of any work queue though? There's nothing particularly special about the system work queue other than it tends to get used by default a lot?

I also think its worth pointing out a simple scenario where this occurs as well.

E.g. one work item is taking a semaphore a subsequent work item is giving. Work queue is now dead locked.

Blocking calls aren't inherently the issue here either I'd note, its a possible symptom but not the cause of the deadlock.

A call to i2c_transfer() for example in a work queue item is a blocking call, and may cause the work queue thread to pend. Just because it blocks doesn't inherently mean there will be a deadlock!

The "which is available to any application or kernel code" part it what makes it true especially for the sys workq, given an owner of the queue would know all work passed to the queue, so can prevent deadlocks and manage latencies :)


The system workqueue can not safely be used to perform operations which are
potentially blocking, as there is no guarantee that work items submitted to
it do not depend on subsequent work items in the queue to unblock them.

:kconfig:option:`CONFIG_SYSTEM_WORKQUEUE_NO_BLOCK` enforces that no work
items submitted to the system workqueue perform any blocking operations.

The single argument that is passed to a handler function can be ignored if it
is not required. If the handler function requires additional information about
the work it is to perform, the work item can be embedded in a larger data
Expand Down
13 changes: 13 additions & 0 deletions include/zephyr/kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -1181,6 +1181,19 @@ void k_thread_time_slice_set(struct k_thread *th, int32_t slice_ticks,
*/
bool k_is_in_isr(void);

/**
* @brief Determine if code is running from system work item
*
* This routine allows the caller to customize its actions, depending on
* whether it is running from a system workqueue item.
*
* @funcprops \isr_ok
*
* @return false if not invoked from a system workqueue item.
* @return true if invoked from a system workqueue item.
*/
bool k_is_in_sys_work(void);

/**
* @brief Determine if code is running in a preemptible thread.
*
Expand Down
8 changes: 8 additions & 0 deletions kernel/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,14 @@ config SYSTEM_WORKQUEUE_NO_YIELD
cooperative and a sequence of work items is expected to complete
without yielding.

config SYSTEM_WORKQUEUE_NO_BLOCK
bool "Select whether system work queue enforces non-blocking work items"
help
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe default y if ASSERT or something similar? This is a cheap check with clear value, probably wants to be on any time CONFIG_ASSERT=y

By default, the system work queue does not enforce work items
passed to it to not perform blocking operations. Selecting this
enforces that blocking operations are not performed by invoking
a kernel oops if such operations are attempted.

endmenu

menu "Barrier Operations"
Expand Down
4 changes: 4 additions & 0 deletions kernel/sched.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,10 @@ static inline void z_vrfy_k_thread_resume(k_tid_t thread)

static void unready_thread(struct k_thread *thread)
{
if (IS_ENABLED(CONFIG_SYSTEM_WORKQUEUE_NO_BLOCK) && k_is_in_sys_work()) {
k_oops();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me. "Ready" and "running" aren't the same thing. A thread can be ready but lower priority than _current. Basically: my guess is that this code will oops if you try to k_thread_suspend() a runnable thread out of a work queue item, which would be expected to be legal and work.

You need to add a test for thread == _current at least, but it would probably be better to move this test to reschedule() instead.

Also: probably want a panic here and not an oops. An oops in userspace will kill only the current thread, but a misuse of the system workqueue (which obviously is a kernel thread anyway) is a global failure.

And finally: neither oops nor panic give any feedback to the poor user whose code blew up. Probably wants a printk() here (or to be expressed as an __ASSERT() when available).

}

if (z_is_thread_queued(thread)) {
dequeue_thread(thread);
}
Expand Down
6 changes: 6 additions & 0 deletions kernel/work.c
Original file line number Diff line number Diff line change
Expand Up @@ -1175,3 +1175,9 @@ bool k_work_flush_delayable(struct k_work_delayable *dwork,
}

#endif /* CONFIG_SYS_CLOCK_EXISTS */

bool k_is_in_sys_work(void)
{
return k_current_get() == k_work_queue_thread_get(&k_sys_work_q) &&
flag_test(&k_sys_work_q.flags, K_WORK_QUEUE_BUSY_BIT);
}
2 changes: 1 addition & 1 deletion subsys/bluetooth/host/att.c
Original file line number Diff line number Diff line change
Expand Up @@ -732,7 +732,7 @@ static struct net_buf *bt_att_chan_create_pdu(struct bt_att_chan *chan, uint8_t
default: {
k_tid_t current_thread = k_current_get();

if (current_thread == k_work_queue_thread_get(&k_sys_work_q)) {
if (k_is_in_sys_work()) {
/* No blocking in the sysqueue. */
timeout = K_NO_WAIT;
} else if (current_thread == att_handle_rsp_thread) {
Expand Down
2 changes: 1 addition & 1 deletion subsys/bluetooth/host/classic/hfp_ag.c
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ static struct bt_ag_tx *bt_ag_tx_alloc(void)
* so if we're in the same workqueue but there are no immediate
* contexts available, there's no chance we'll get one by waiting.
*/
if (k_current_get() == &k_sys_work_q.thread) {
if (k_is_in_sys_work()) {
return k_fifo_get(&ag_tx_free, K_NO_WAIT);
}

Expand Down
3 changes: 1 addition & 2 deletions subsys/bluetooth/host/conn.c
Original file line number Diff line number Diff line change
Expand Up @@ -1606,8 +1606,7 @@ struct net_buf *bt_conn_create_pdu_timeout(struct net_buf_pool *pool,
*/
__ASSERT_NO_MSG(!k_is_in_isr());

if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) &&
k_current_get() == k_work_queue_thread_get(&k_sys_work_q)) {
if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) && k_is_in_sys_work()) {
LOG_WRN("Timeout discarded. No blocking in syswq.");
timeout = K_NO_WAIT;
}
Expand Down
2 changes: 1 addition & 1 deletion subsys/bluetooth/host/hci_core.c
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ int bt_hci_cmd_send_sync(uint16_t opcode, struct net_buf *buf,
/* Since the commands are now processed in the syswq, we cannot suspend
* and wait. We have to send the command from the current context.
*/
if (k_current_get() == &k_sys_work_q.thread) {
if (k_is_in_sys_work()) {
/* drain the command queue until we get to send the command of interest. */
struct net_buf *cmd = NULL;

Expand Down
3 changes: 1 addition & 2 deletions subsys/bluetooth/host/l2cap.c
Original file line number Diff line number Diff line change
Expand Up @@ -674,8 +674,7 @@ struct net_buf *bt_l2cap_create_pdu_timeout(struct net_buf_pool *pool,
size_t reserve,
k_timeout_t timeout)
{
if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) &&
k_current_get() == k_work_queue_thread_get(&k_sys_work_q)) {
if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) && k_is_in_sys_work()) {
timeout = K_NO_WAIT;
}

Expand Down
3 changes: 1 addition & 2 deletions subsys/input/input.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,7 @@ int input_report(const struct device *dev,
#ifdef CONFIG_INPUT_MODE_THREAD
int ret;

if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) &&
k_current_get() == k_work_queue_thread_get(&k_sys_work_q)) {
if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) && k_is_in_sys_work()) {
LOG_DBG("Timeout discarded. No blocking in syswq.");
timeout = K_NO_WAIT;
}
Expand Down
1 change: 1 addition & 0 deletions tests/bluetooth/host/conn/mocks/kernel.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,6 @@ DEFINE_FAKE_VALUE_FUNC(void *, k_heap_alloc, struct k_heap *, size_t, k_timeout_
DEFINE_FAKE_VOID_FUNC(k_heap_free, struct k_heap *, void *);
DEFINE_FAKE_VOID_FUNC(k_sched_lock);
DEFINE_FAKE_VOID_FUNC(k_sched_unlock);
DEFINE_FAKE_VALUE_FUNC(bool, k_is_in_sys_work);

struct k_work_q k_sys_work_q;
2 changes: 2 additions & 0 deletions tests/bluetooth/host/conn/mocks/kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
FAKE(k_heap_free) \
FAKE(k_sched_lock) \
FAKE(k_sched_unlock) \
FAKE(k_is_in_sys_work) \

DECLARE_FAKE_VALUE_FUNC(bool, k_is_in_isr);
DECLARE_FAKE_VALUE_FUNC(int, k_poll_signal_raise, struct k_poll_signal *, int);
Expand All @@ -56,3 +57,4 @@ DECLARE_FAKE_VALUE_FUNC(void *, k_heap_alloc, struct k_heap *, size_t, k_timeout
DECLARE_FAKE_VOID_FUNC(k_heap_free, struct k_heap *, void *);
DECLARE_FAKE_VOID_FUNC(k_sched_lock);
DECLARE_FAKE_VOID_FUNC(k_sched_unlock);
DECLARE_FAKE_VALUE_FUNC(bool, k_is_in_sys_work);
11 changes: 9 additions & 2 deletions tests/kernel/workq/work_queue/src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ LOG_MODULE_REGISTER(test);
#define WORK_ITEM_WAIT_ALIGNED \
k_ticks_to_ms_floor64(k_ms_to_ticks_ceil32(WORK_ITEM_WAIT) + _TICK_ALIGN)

/*
* System work queue is not allowed to unready threads. k_busy_wait() is
* used to simulate work. It is higly inprecise, use way lower wait to
* account for this.
*/
#define WORK_ITEM_BUSY_WAIT ((WORK_ITEM_WAIT * USEC_PER_MSEC) / 4)

/*
* Wait 50ms between work submissions, to ensure co-op and prempt
* preempt thread submit alternatively.
Expand Down Expand Up @@ -97,7 +104,7 @@ static void work_handler(struct k_work *work)
CONTAINER_OF(dwork, struct delayed_test_item, work);

LOG_DBG(" - Running test item %d", ti->key);
k_msleep(WORK_ITEM_WAIT);
k_busy_wait(WORK_ITEM_BUSY_WAIT);

results[num_results++] = ti->key;
}
Expand Down Expand Up @@ -211,7 +218,7 @@ static void resubmit_work_handler(struct k_work *work)
struct delayed_test_item *ti =
CONTAINER_OF(dwork, struct delayed_test_item, work);

k_msleep(WORK_ITEM_WAIT);
k_busy_wait(WORK_ITEM_BUSY_WAIT);

results[num_results++] = ti->key;

Expand Down
Loading