OTA is not a feature.
It is the most dangerous thing your firmware can do.Step 9 – OTA Updates with MCUboot (Controlled Self-Modification)
After Step 8, the device is:
- reliable
- observable
- power-aware
- outage-tolerant
Step 9 adds the most dangerous capability a device can have:
The ability to modify its own firmware.
This step is about doing that safely, predictably, and explainably.
1. Purpose of Step 9
OTA is not a feature. It is risk management.
Step 9 answers:
“How do we update firmware without bricking devices in the field?”
MCUboot exists to answer exactly this question.
2. Why MCUboot Is a Subsystem, Not a Library
MCUboot is:
- a separate executable
- with its own state machine
- with strict ownership of flash
This matches everything you have learned so far.
Key idea:
The application never directly controls boot decisions.
3. Roles and Responsibilities
MCUboot (Bootloader)
Owns:
- flash layout
- image validation
- rollback decisions
- swap logic
Guarantees:
- a valid image always boots
- failed updates roll back
Application (Your firmware)
Owns:
- downloading new firmware
- storing it in the update slot
- requesting an upgrade
- reporting status
Never decides how boot happens.
4. Mental Model (Two Images)
Typical MCUboot layout:
+------------------+
| MCUboot |
+------------------+
| Image A (running)| <-- currently active
+------------------+
| Image B (update) | <-- candidate
+------------------+
Only MCUboot decides which image is active.
5. OTA Is a Separate Failure Domain
OTA introduces new failures:
- download interruption
- power loss mid-update
- corrupted image
- incompatible firmware
Therefore OTA must:
- integrate with recovery logic
- not interfere with normal operation
6. Where OTA Fits in the Existing Architecture
OTA does not change the core state machine.
It is treated as a background task:
- Uses network subsystem
- Uses buffering concepts
- Uses WAIT for pacing
No new top-level states are required.
7. OTA Context
struct ota_ctx {
bool update_available;
bool download_in_progress;
size_t bytes_written;
};
OTA state is owned by the application, not MCUboot.
8. OTA Flow (High Level)
- Check for update (periodically)
- Download firmware image in chunks
- Write image to MCUboot secondary slot
- Mark image as pending
- Reboot
- MCUboot validates and swaps
- Application confirms success
Each step can fail safely.
9. Integration with State Machine
OTA activity happens during:
- IDLE
- WAIT
Never during:
- SENSOR_INIT
- SEND
- critical paths
This avoids disrupting core functionality.
10. Failure Handling During OTA
Key rule:
OTA failure must never affect current firmware execution.
Examples:
- Download fails → retry later
- Power loss → boot old image
- Validation fails → rollback
All failures are recoverable.
11. Requesting an Upgrade (Application Side)
Conceptually:
boot_request_upgrade(BOOT_UPGRADE_TEST);
reboot();
After reboot:
- MCUboot tests new image
- Rollback if not confirmed
12. Confirming an Update
On successful boot:
boot_write_img_confirmed();
This tells MCUboot:
“This image is good. Keep it.”
Without confirmation, rollback occurs automatically.
13. Interaction with Retry & Power Policies
OTA obeys existing rules:
- backoff on failures
- sleep during WAIT
- log transitions
OTA never introduces special-case timing.
14. Logging OTA Events
Log only milestones:
- Update available
- Download started
- Download completed
- Upgrade requested
- Image confirmed
Never log every chunk.
15. What Step 9 Deliberately Avoids
- Delta updates
- A/B/C images
- Compression
- Encryption details
Those are advanced topics added later if needed.
16. Success Criteria for Step 9
Step 9 is complete when:
- Device can update itself safely
- Power loss during OTA is harmless
- Rollback works automatically
- OTA logic is explainable
17. Architectural Status After Step 9
After Step 9, the device:
- can evolve after deployment
- protects itself from bad updates
- matches industry best practice
This is the baseline for modern IoT products.
18. Final Note
MCUboot works well here because:
- you already think in subsystems
- you already centralize policy
- you already respect ownership
This is not an accident.