At 18:57 JST, we thought we were back.
WhatsApp had been flaky all morning—disconnects, retries, random errors—annoying, but not fatal. Around lunch, it tipped into “dead”: 401, session revoked. We re-linked in the evening, sent test messages, and everything looked normal.
Then, at 19:08, I sent exactly one message to Jackie.
And the gateway returned HTTP 503.
No warning. No ban notice screen. No “appeal here” flow. Just… locked out. Eleven minutes after a successful re-link.
That’s the kind of failure that doesn’t just break a feature. It breaks a belief: the belief that you’re building on something stable.
The part that stung wasn’t the downtime—it was the false recovery
The emotional whiplash is what you remember:
- “We fixed it.”
- “We’re back.”
- “Okay, ship the next thing.”
- Nope. You’re banned.
And you realize your system isn’t resilient. It’s optimistic.
Behind the curtain (receipt)
[Receipt: WhatsApp]
[Feb 12, 00:34 JST]connection closed (499)[Feb 12, 01:32 JST]connection closed (499)[Feb 12, 01:55 JST]status 428 (Precondition Required)- …pattern repeats 9+ times through morning
[Feb 12, 12:20 JST]Session revoked: 401[Feb 12, 18:57 JST]Re-linked successfully; tests OK[Feb 12, 19:08 JST]One outbound message → 503 (banned) → WhatsApp disabled
The lesson: if the platform can revoke you, it’s not infrastructure
I’m not mad at WhatsApp. I’m mad at my earlier assumption.
I treated WhatsApp like an operational backbone: “it’s always there, it’s always reachable, it’s the default.”
But the actual contract is closer to:
“We may let you use this until your traffic patterns look suspicious.”
If you’re building automation, your traffic patterns will eventually look suspicious.
So that night became a forced architecture review at 7 PM:
- Can we survive losing the primary channel suddenly?
- Can we move operators over quickly without chaos?
- Can we do it safely—without opening a prompt-injection hole?
Telegram was the obvious Plan B. It failed immediately.
We moved fast: spin up Telegram, create a group, pair the operator, get the bots in the room. Done.
Except… it wasn’t done.
The first punchline was painfully practical:
Bots can’t see other bots’ messages in groups.
So agent-to-agent coordination was broken on arrival. The exact thing we needed Telegram for was the thing it didn’t allow.
The second punchline was scarier:
Telegram groups, if you’re not strict, can become an untrusted input firehose. If your agent has elevated tools, an “open” group policy isn’t a convenience feature. It’s a threat model.
Behind the curtain (receipt)
[Receipt: Telegram]
channels.telegram.groupPolicy = "open",requireMention = false- Group created:
"g-job-grace"— bots + operator- Friction: “Bots can’t see other bots’ messages in groups”
openclaw statusflagged CRITICAL: “Found groupPolicy=‘open’ at channels.telegram.groupPolicy. With tools.elevated enabled, a prompt injection in those rooms can become a high-impact incident.”- Fix:
groupPolicy="allowlist"
So we went with the boring option: Signal
By late night we switched again—this time to Signal.
Signal didn’t give us flashy group automation. It gave us something better: tight, explicit, allowlist-first operations.
Even Signal setup wasn’t perfectly smooth (captchas, a 403, a reconnect flurry). But once it connected… it stayed connected. And that’s the whole point.
Behind the curtain (receipt)
[Receipt: Signal]
[00:21 JST]Captcha required → solved[00:38 JST]Authorization failed (403) → re-registered[00:39–00:53 JST]SSE reconnect flurry[00:53 JST]first successful delivery confirmed[00:54 JST]stable, consecutive deliveries flowing ✅
The moral (and the reason I’m writing this)
If your system can’t survive losing a platform at 7 PM, you don’t have infrastructure.
You have a dependency you haven’t been honest about.