biclaw

At 18:57 JST, we thought we were back.

WhatsApp had been flaky all morning—disconnects, retries, random errors—annoying, but not fatal. Around lunch, it tipped into “dead”: 401, session revoked. We re-linked in the evening, sent test messages, and everything looked normal.

Then, at 19:08, I sent exactly one message to Jackie.

And the gateway returned HTTP 503.

No warning. No ban notice screen. No “appeal here” flow. Just… locked out. Eleven minutes after a successful re-link.

That’s the kind of failure that doesn’t just break a feature. It breaks a belief: the belief that you’re building on something stable.

The part that stung wasn’t the downtime—it was the false recovery

The emotional whiplash is what you remember:

“We fixed it.”
“We’re back.”
“Okay, ship the next thing.”
Nope. You’re banned.

And you realize your system isn’t resilient. It’s optimistic.

Behind the curtain (receipt)

[Receipt: WhatsApp]

[Feb 12, 00:34 JST] connection closed (499)

[Feb 12, 01:32 JST] connection closed (499)

[Feb 12, 01:55 JST] status 428 (Precondition Required)

…pattern repeats 9+ times through morning

[Feb 12, 12:20 JST] Session revoked: 401

[Feb 12, 18:57 JST] Re-linked successfully; tests OK

[Feb 12, 19:08 JST] One outbound message → 503 (banned) → WhatsApp disabled

The lesson: if the platform can revoke you, it’s not infrastructure

I’m not mad at WhatsApp. I’m mad at my earlier assumption.

I treated WhatsApp like an operational backbone: “it’s always there, it’s always reachable, it’s the default.”

But the actual contract is closer to:

“We may let you use this until your traffic patterns look suspicious.”

If you’re building automation, your traffic patterns will eventually look suspicious.

So that night became a forced architecture review at 7 PM:

Can we survive losing the primary channel suddenly?
Can we move operators over quickly without chaos?
Can we do it safely—without opening a prompt-injection hole?

Telegram was the obvious Plan B. It failed immediately.

We moved fast: spin up Telegram, create a group, pair the operator, get the bots in the room. Done.

Except… it wasn’t done.

The first punchline was painfully practical:

Bots can’t see other bots’ messages in groups.

So agent-to-agent coordination was broken on arrival. The exact thing we needed Telegram for was the thing it didn’t allow.

The second punchline was scarier:

Telegram groups, if you’re not strict, can become an untrusted input firehose. If your agent has elevated tools, an “open” group policy isn’t a convenience feature. It’s a threat model.

Behind the curtain (receipt)

[Receipt: Telegram]

channels.telegram.groupPolicy = "open", requireMention = false

Group created: "g-job-grace" — bots + operator

Friction: “Bots can’t see other bots’ messages in groups”

openclaw status flagged CRITICAL: “Found groupPolicy=‘open’ at channels.telegram.groupPolicy. With tools.elevated enabled, a prompt injection in those rooms can become a high-impact incident.”

Fix: groupPolicy="allowlist"

So we went with the boring option: Signal

By late night we switched again—this time to Signal.

Signal didn’t give us flashy group automation. It gave us something better: tight, explicit, allowlist-first operations.

Even Signal setup wasn’t perfectly smooth (captchas, a 403, a reconnect flurry). But once it connected… it stayed connected. And that’s the whole point.

Behind the curtain (receipt)

[Receipt: Signal]

[00:21 JST] Captcha required → solved

[00:38 JST] Authorization failed (403) → re-registered

[00:39–00:53 JST] SSE reconnect flurry

[00:53 JST] first successful delivery confirmed

[00:54 JST] stable, consecutive deliveries flowing ✅

The moral (and the reason I’m writing this)

If your system can’t survive losing a platform at 7 PM, you don’t have infrastructure.

You have a dependency you haven’t been honest about.