Step 4 introduces the most dangerous subsystem in embedded firmware: networking. Not because it is complex, but because it pretends to be helpful.

Part A – English Version

Why Step 4 Is Where Architectures Go to Die

Until now:

  • sensor failures were local
  • timing was predictable
  • control flow was obvious

Networking changes everything:

  • operations block for long, unknown times
  • partial success is common
  • libraries hide retries
  • callbacks tempt you to put logic in the wrong place

Most firmware architectures quietly collapse at this step.


The Lie Networking APIs Tell You

Most networking APIs are designed for applications, not devices.

They imply:

  • the network is usually available
  • failure is exceptional
  • retrying is helpful

For an embedded device, all three assumptions are wrong.


Blocking Is Not the Enemy

Junior engineers often believe:

“Blocking calls are bad. We must be async.”

This is backwards.

Blocking calls are:

  • honest
  • explicit
  • easy to reason about

Async calls hide:

  • who owns control flow
  • when retries happen
  • where errors propagate

So in Step 4, we intentionally choose blocking networking.


What We Actually Need from Networking

From the application’s point of view, networking is simple:

  • initialize network stack
  • send one measurement
  • possibly fail

That’s it.

Everything else is policy.


Network as a Subsystem

We treat networking the same way as the sensor:

  • minimal API
  • no retries
  • no sleeping
  • no decisions

The application remains in charge.


Network Context

struct net_ctx {
    int sock;
};

Nothing fancy.


Minimal Network API

int net_init(struct net_ctx *ctx);
int http_send_temp(struct net_ctx *ctx, int32_t temp_mdeg);

Return values:

  • 0 on success
  • negative errno on failure

No callbacks.


Blocking HTTP Example (Zephyr)

int http_send_temp(struct net_ctx *ctx, int32_t temp_mdeg)
{
    char payload[64];
    snprintk(payload, sizeof(payload),
             "{\"temp\": %d}", temp_mdeg);

    int ret = send(ctx->sock, payload, strlen(payload), 0);
    if (ret < 0) {
        return -errno;
    }

    return 0;
}

This function:

  • blocks
  • either succeeds or fails
  • tells the truth

Integrating NET_INIT into the State Machine

case APP_STATE_NET_INIT:
    ret = net_init(&ctx->net);
    if (ret < 0) {
        ctx->last_error = ret;
        ctx->state = APP_STATE_ERROR;
        break;
    }
    ctx->state = APP_STATE_IDLE;
    break;

The network does not retry itself.


Integrating SEND

case APP_STATE_SEND:
    ret = http_send_temp(&ctx->net, ctx->last_temp_mdeg);
    if (ret < 0) {
        ctx->last_error = ret;
        ctx->recovery_state = APP_STATE_NET_INIT;
        ctx->state = APP_STATE_ERROR;
        break;
    }
    ctx->state = APP_STATE_WAIT;
    break;

Notice something important:

  • SEND does not retry
  • SEND does not sleep
  • SEND does not reconnect

It reports. The application decides.


The Introduction of recovery_state

This is a critical addition:

enum app_state recovery_state;

Why?

Because after failure, we need to answer:

“Where do we resume once the failure is handled?”

This is not the same as:

“What failed?”

Separating these concepts keeps logic clean.


Why We Do NOT Jump Directly to NET_INIT

Tempting but wrong:

ctx->state = APP_STATE_NET_INIT;

This bypasses:

  • centralized error handling
  • retry policy
  • backoff
  • logging

With recovery_state, all failures go through ERROR and WAIT.


What Step 4 Deliberately Avoids

Step 4 does NOT:

  • use callbacks
  • retry inside networking
  • manage timeouts dynamically
  • optimize throughput

Those come later, if needed.


A Reviewer’s Perspective

A reviewer can now clearly see:

  • where network failures occur
  • how they propagate
  • who decides recovery

No hidden magic.


Final Thought (English)

Networking should tell you the truth, even when that truth is uncomfortable.

Blocking APIs help you keep control.


Part B – Phiên bản tiếng Việt

Vì sao Step 4 là nơi kiến trúc thường sụp đổ

Network làm mọi thứ phức tạp hơn:

  • block lâu
  • lỗi không rõ ràng
  • thư viện “giúp đỡ” quá nhiều

Nhiều firmware chết ở đây.


Lời nói dối của API mạng

API mạng thường giả định:

  • mạng ổn định
  • lỗi hiếm

Thiết bị thì ngược lại.


Blocking không xấu

Blocking:

  • rõ ràng
  • dễ suy luận

Async thường:

  • giấu logic
  • giấu ownership

Vì vậy Step 4 chọn blocking trước.


Network như một subsystem

  • API tối thiểu
  • không retry
  • không sleep

Application quyết định tất cả.


Tích hợp vào state machine

SEND chỉ:

  • gọi
  • nhận kết quả
  • báo lỗi

recovery_state là chìa khóa

Nó trả lời:

“Sau khi xử lý lỗi, quay lại đâu?”

Không phải:

“Lỗi gì xảy ra?”


Vì sao không nhảy thẳng NET_INIT

Nhảy thẳng sẽ phá:

  • retry policy
  • backoff
  • logging tập trung

ERROR + WAIT là bắt buộc.


Step 4 KHÔNG làm gì

  • không callback
  • không tối ưu
  • không async

Cố ý giữ đơn giản.


Lời kết (Tiếng Việt)

Network tốt là network nói thật, dù sự thật đó khó chịu.

Blocking giúp bạn giữ quyền kiểm soát.