Step 6 answers a deceptively simple question: When things fail repeatedly, how should a device behave over time?
Part A – English Version
Why Step 6 Exists
After Step 5, our firmware:
- can fail safely
- can explain failures
- can recover structurally
But it still behaves like an impatient human.
Without an explicit retry policy, systems tend to:
- retry immediately
- retry too often
- retry forever
This hurts:
- power consumption
- network infrastructure
- backend services
- device credibility
Step 6 turns retries into policy, not accidents.
The Core Insight
Retries are not a technical detail. They are a behavioral decision.
How often you retry says something about:
- how valuable the data is
- how patient the device is
- how respectful it is to the network
This must be explicit.
The Anti-Pattern: Inline Retries
A common mistake:
for (int i = 0; i < 5; i++) {
if (send() == 0) {
return 0;
}
k_sleep(K_SECONDS(1));
}
Problems:
- retry count is hidden
- delay is arbitrary
- power impact is unclear
- logs are noisy
Most importantly: policy is buried in code.
Centralizing Retry Policy
Retries belong to the application, not subsystems.
We extend the context:
struct app_ctx {
enum app_state state;
enum app_state recovery_state;
uint32_t retry_count;
int last_error;
k_timeout_t retry_delay;
};
Retry behavior is now visible state.
Designing a Backoff Strategy
We start simple and deterministic:
static k_timeout_t calc_backoff(uint32_t retry)
{
if (retry < 3) {
return K_SECONDS(10);
} else if (retry < 6) {
return K_MINUTES(1);
} else {
return K_MINUTES(5);
}
}
Why this works:
- predictable
- debuggable
- power-friendly
Randomization can be added later.
Applying Backoff in ERROR
case APP_STATE_ERROR:
ctx->retry_delay = calc_backoff(ctx->retry_count);
LOG_ERR("Error %d, retry %u, wait %lld ms, recover to %s",
ctx->last_error,
ctx->retry_count,
ctx->retry_delay.ticks,
state_str(ctx->recovery_state));
ctx->retry_count++;
ctx->state = APP_STATE_WAIT;
break;
The ERROR state now:
- decides delay
- logs policy
- remains centralized
WAIT Becomes Meaningful
WAIT is no longer a dumb sleep.
case APP_STATE_WAIT:
k_sleep(ctx->retry_delay);
ctx->state = ctx->recovery_state;
break;
All waiting behavior flows through one state.
Why This Is Predictable
Given logs like:
Error -110, retry 4, wait 60000 ms, recover to NET_INIT
An engineer can immediately answer:
- how long the device will be quiet
- what it will try next
- why it behaved that way
This is professionalism.
Avoiding Retry Storms
Without backoff:
- thousands of devices retry together
- servers get overloaded
- outages cascade
With explicit policy:
- retries spread over time
- systems degrade gracefully
This matters at scale.
What Step 6 Does NOT Do
Step 6 does not:
- add randomness
- distinguish error types
- persist retry state
Those are product-level decisions.
We start with clarity.
A Reviewer’s Perspective
A reviewer can now see:
- retry behavior clearly
- delays explicitly
- no hidden loops
And can reason about fleet behavior.
Final Thought (English)
A polite device retries thoughtfully. An impatient one becomes noise.
Step 6 teaches patience.
Part B – Phiên bản tiếng Việt
Vì sao cần Step 6
Sau Step 5, hệ thống đã:
- xử lý lỗi
- log rõ ràng
Nhưng nếu không có retry policy:
- hệ thống trở nên nóng vội
- tốn pin
- phá backend
Step 6 biến retry thành hành vi có chủ ý.
Insight cốt lõi
Retry là quyết định hành vi, không phải chi tiết kỹ thuật.
Anti-pattern: retry inline
Retry trong code con:
- khó thấy
- khó review
- khó thay đổi
Tập trung retry policy
Retry thuộc về application.
Context mở rộng rõ ràng.
Backoff đơn giản nhưng hiệu quả
Deterministic backoff:
- dễ debug
- dễ giải thích
ERROR quyết định policy
ERROR:
- tính delay
- log
- chuyển WAIT
Không nơi nào khác làm việc này.
WAIT có ý nghĩa
WAIT là nơi duy nhất được ngủ.
Tránh retry storm
Retry có kiểm soát giúp:
- hệ thống ổn định
- backend sống sót
Step 6 KHÔNG làm gì
- không random
- không phân loại lỗi
Giữ rõ ràng trước.
Lời kết (Tiếng Việt)
Thiết bị lịch sự retry có suy nghĩ. Thiết bị nóng vội trở thành gánh nặng.
Step 6 dạy thiết bị kiên nhẫn.