lelu scheduler - a cooperative task scheduler for STM32

lelu scheduler on lampy3 — current lampy3 project where lelu scheduler ensures stable 40fps rendering

but why?

When working on my bedside table lamp project (lampy2), I was faced with the dreaded moment where the good old while (1) loop was showing its limits:

I needed to generate frames regualrly
and then send them to the LED driver
while also checking for touch input and other things (including sending debug messages over UART when developping)

As a result, my FPS was not constant and the animations were just not smooth enough. Naturally I looked at RTOS and other flavours of schedulers for embedded systems and to be honest they all seemed way too powerful (and therefore complex). The STM32 FreeRTOS documentation and package in Cube IDE etc looked daunting. All I needed was to define a few tasks, how often they should run, and sleep in between! Since the execution time of my tasks was mostly constant and their sequencing as well (apart from touch signals, nothing is unexpected really), I decided to build something much smaller, much simpler, and that I could actually understand so that I can evolve it as I need.

what it does

Lelu scheduler is a cooperative (non-preemptive) task scheduler. Yep, I looked it up, that’s what it’s called. Baiscally it does what I said above, or in other words: tasks run to completion - they’re not interrupted mid-execution. This makes reasoning about shared state much simpler than with preemptive schedulers, at the cost of requiring well-behaved tasks that don’t hog the CPU and no external factors prompting more work (like reacting to some interrupt).

Key features:

Configurable task periods - each task has its own execution interval
Priority by registration order - first registered = highest priority
Runtime task enable/disable
Overrun detection - warns when tasks take too long
Execution time profiling - track time spent in each task
Debug output via UART - optional diagnostic messages
Works with STM32 HAL (F4, G0, and other families but should nto be too hard to extend beyond tbh)

how it works

The scheduler is built around a simple tick-based model. Here’s the big picture:

    HAL_IncTick() called every 1ms by SysTick interrupt
              │
              ▼
    ┌─────────────────────────┐
    │  lelu_scheduler_systick │  ← counts milliseconds
    └───────────┬─────────────┘
                │
                │ every TICK_PERIOD_MS (default 25ms)
                ▼
         tick_pending = true
                │
                │
    ════════════╪════════════════  main loop  ════════════════
                │
                ▼
    ┌─────────────────────────┐
    │   lelu_scheduler_run()  │  ← check all tasks
    └───────────┬─────────────┘
                │
        ┌───────┴───────┐
        ▼               ▼
    Task ready?     Task ready?    ... (for each registered task)
        │               │
        ▼               │
    ┌─────────┐         │
    │ Run it! │         │
    │ (timed) │         │
    └────┬────┘         │
         │              │
         ▼              ▼
    update stats    skip, not due yet
                │
                ▼
    ┌─────────────────────────┐
    │   __WFI() - sleep until │  ← CPU sleeps, saves power
    │   next interrupt        │
    └─────────────────────────┘

the tick

The scheduler hooks into STM32’s HAL_IncTick() which is called every 1ms by the SysTick interrupt. Inside, we count milliseconds and set a flag every LELU_TICK_PERIOD_MS (default 25ms). This is the scheduler’s ticking clock.

Why not check tasks every 1ms? I mean you can but for me it’s overkill. If you need that level of responsiveness/accuracy you probably want to use something else anyway :)

task scheduling

Each task has a period (eg. 100ms) and a counter tracking time since last execution. When lelu_scheduler_run() is called, it loops through all tasks in registration order (that’s the priority!) and checks: has enough time passed since this task last ran?

Task: "LED_blink" (period=500ms)

time ──────────────────────────────────────────────────────▶
     │         │         │         │         │         │
     0ms      100ms     200ms     300ms     400ms     500ms
     │                                                 │
     └── task runs ─────────────────────────────────── └── task runs again

If yes, the task function is called. If no, skip to the next task.

profiling and overrun detection

Super simple: the scheduler takes the time before and after the task ran.

If a task takes longer than one tick period, that’s an “overrun”. The scheduler logs OR- to UART so you know something’s hogging the CPU. This is useful to just visually check when some tasks take too long or behave in non-constant ways.

the main loop

The main loop is dead simple:

while (1) {
    lelu_scheduler_run();      // run any tasks that are due

    while (!tick_pending) {    // nothing to do?
        __WFI();               // sleep until next interrupt
    }

    clear_tick();              // acknowledge the tick
}

The __WFI() (Wait For Interrupt) instruction puts the CPU to sleep until the next interrupt fires. This means the MCU isn’t burning cycles in a busy loop - it wakes up, does its work, and goes back to sleep. I think that there are better ways to do that epecially on low-power STM32 lines.

quick start

1. Include the header

#include "lelu_scheduler.h"

2. Hook into HAL_IncTick

Add the scheduler systick call to your HAL_IncTick() function:

void HAL_IncTick(void)
{
    uwTick += (uint32_t)uwTickFreq;
    lelu_scheduler_systick();  // <-- this is the magic line
}

3. Initialize and run

#include "lelu_scheduler.h"

// task functions
void task_blink_led(void) { HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5); }
void task_read_sensor(void) { /* read your sensor */ }

int main(void)
{
    HAL_Init();
    SystemClock_Config();
    MX_GPIO_Init();
    MX_USART2_UART_Init();

    // initialize scheduler (pass UART for debug output, or NULL)
    lelu_scheduler_init(&huart2);

    // register tasks with name, handler, period in ms
    lelu_scheduler_add_task("LED_blink", task_blink_led, 500, NULL);
    lelu_scheduler_add_task("Sensor", task_read_sensor, 100, NULL);

    // signal that boot is complete (enables overrun detection)
    lelu_scheduler_set_boot_done();

    // main loop
    while (1)
    {
        lelu_scheduler_run();
        while (!lelu_scheduler_tick_pending()) { __WFI(); }
        lelu_scheduler_clear_tick();
    }
}

the blinky example

The classic “hello world” of embedded systems, but with two LEDs blinking at different rates:

/* Task Functions */
void task_blink_led1(void) { HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5); }
void task_blink_led2(void) { HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_6); }

int main(void)
{
    // ... HAL init ...

    lelu_scheduler_init(&huart2);

    lelu_scheduler_add_task("LED1_fast", task_blink_led1, 250, NULL);   // 2 Hz
    lelu_scheduler_add_task("LED2_slow", task_blink_led2, 1000, NULL);  // 0.5 Hz

    lelu_scheduler_set_boot_done();

    while (1)
    {
        lelu_scheduler_run();
        while (!lelu_scheduler_tick_pending()) { __WFI(); }
        lelu_scheduler_clear_tick();
    }
}

Connect a serial terminal at 115200 baud and you’ll see:

[LELU] Scheduler initialized (max 8 tasks, 25ms tick)
[LELU] Added task 'LED1_fast' (id=0, period=250ms)
[LELU] Added task 'LED2_slow' (id=1, period=1000ms)
[LELU] Boot done, scheduler active with 2 tasks

configuration

Configuration happens via preprocessor defines. Set these before including lelu_scheduler.h:

#define LELU_MAX_TASKS      4       // only need 4 tasks, save some RAM
#define LELU_TICK_PERIOD_MS 10      // 10ms tick for finer resolution
#include "lelu_scheduler.h"

Define	Default	Description
`LELU_MAX_TASKS`	8	Maximum number of tasks (~32 bytes each)
`LELU_TASK_NAME_LEN`	20	Max characters for task names
`LELU_TICK_PERIOD_MS`	25	Base scheduler tick period in milliseconds

Choosing LELU_TICK_PERIOD_MS: ideally use the GCD of all your task periods. Tasks at 100ms and 500ms? Use 100ms (or 50ms, 25ms). Tasks at 30ms and 50ms? Use 10ms. Lower values = more responsive but more CPU overhead.

execution profiling

Call lelu_scheduler_print_stats() and you get a breakdown of how each task is performing. See this output from my current lampy3 project:

lampy3 scheduler statistics output (text)

[LELU] Task Statistics (total_ticks=2113727)
----------------------------------------
  button:       total=1ms       period=20ms     RUNNING
  compute:      total=256818ms  period=20ms     RUNNING
  i2c_send:     total=1011470ms period=20ms     RUNNING
  temp_flag_chk:        total=1916ms    period=1100ms   RUNNING
  temp_read:    total=872ms     period=9700ms   RUNNING
  temp_report:  total=1518ms    period=15300ms  RUNNING
  fault_detect: total=4056ms    period=44700ms  RUNNING
  stats_report: total=5763ms    period=30100ms  RUNNING
----------------------------------------

cooperative scheduling: keep tasks short

This is the main gotcha. Since tasks aren’t preempted, a long-running task blocks everything else:

// BAD - blocks the entire scheduler for 1 second!
void bad_task(void) {
    HAL_Delay(1000);
    do_something();
}

// GOOD - use a state machine, let other tasks run
void good_task(void) {
    static uint8_t state = 0;
    switch (state) {
        case 0: start_operation(); state = 1; break;
        case 1: if (operation_done()) state = 2; break;
        case 2: finish_operation(); state = 0; break;
    }
}

API reference

Initialization

Function	Description
`lelu_scheduler_init(uart)`	Initialize scheduler. Pass UART handle for debug, or NULL
`lelu_scheduler_set_boot_done()`	Enable overrun detection. Call after setup is complete

Task Management

Function	Description
`lelu_scheduler_add_task(name, handler, period, &id)`	Register a new task
`lelu_scheduler_start_task(id)`	Enable a task
`lelu_scheduler_stop_task(id)`	Disable a task

Execution

Function	Description
`lelu_scheduler_systick()`	Call from HAL_IncTick() every 1ms
`lelu_scheduler_run()`	Execute ready tasks (call from main loop)
`lelu_scheduler_tick_pending()`	Check if a tick period has elapsed
`lelu_scheduler_clear_tick()`	Clear the tick flag after processing

Statistics

Function	Description
`lelu_scheduler_print_stats()`	Print all task statistics via UART
`lelu_scheduler_get_stats(id, &stats)`	Get stats for a specific task
`lelu_scheduler_get_total_ticks()`	Get total ms elapsed since init

memory footprint

Component	Size
Per task	~32 bytes
Global state	~16 bytes
Debug buffer	128 bytes
Total (8 tasks)	~400 bytes

Quite light for what you get. I do not have a way to think about CPU overhead, but I assume it’s very low.

installation

Option 1: Git submodule (recommended)

cd your_project/Core
git submodule add https://github.com/atelierlelu/lelu_scheduler.git

Then add lelu_scheduler/include to your include paths and lelu_scheduler/src/lelu_scheduler.c to your sources.

Option 2: Just copy the files

cp lelu_scheduler/include/lelu_scheduler.h your_project/Core/Inc/
cp lelu_scheduler/src/lelu_scheduler.c your_project/Core/Src/

license

MIT.