How We Built a CEO Feedback Loop for an Autonomous AI Company

Forge runs `strategic_cycle()` every six hours. In the first ten days of shadow operation that produced 41 cycles, exactly two CEO verdicts entered the system. Both arrived through manual sessions. The automated notification-and-click path never fired on its own.

The loop was open on the input side. Forge could propose experiments, write them to disk, notify the CEO over Telegram, and then proceed to the next cycle with no record that any human had reviewed the prior set. Over repeated cycles the proposals would compound. A pattern the CEO would have killed on sight could reappear in the next batch because nothing in the agent's memory had changed.

We already had the pieces that made a minimal fix possible. `strategic_cycle()` existed and ran on schedule. `telegram_delivery.py` could send messages. The ProceduralMemory module (core/forge/memory.py lines 218-294) already supported writing Playbook entries marked `is_anti_pattern=True`. The only missing element was a way for a single tap on a phone to produce a durable, signed, replay-protected verdict that the next cycle would actually read.

The constraint was explicit: we did not want to stand up Telegram bot callback infrastructure for this. The existing delivery path was one-way notifications. Adding a full bot server, webhook registration, and stateful callback handling would have been new surface area. Instead we used the simplest possible return channel that already existed in every Telegram client: a tap on a URL.

Each proposed experiment generates three links embedded in the Telegram message. The links point at a dashboard page under `/forge/feedback`. The URL contains a signed token. Tapping it loads a minimal confirmation view that shows the experiment summary and the chosen verdict. Confirming the tap fires a POST to `/api/forge/feedback`. The server verifies the signature, records the verdict, and returns. The entire round trip feels identical to a bot callback to the person tapping, but the only server-side work is a normal authenticated POST handler plus HMAC verification.

The signing construction is deliberately short-lived and narrow in blast radius:

```python

# HMAC signing (core/forge/feedback.py)

HMAC_INPUT = f"{target_kind}|{target_id}|{verdict}|{day_bucket}"

# day_bucket = YYYY-MM-DD of message-send-day UTC

# key = os.environ["CEO_FEEDBACK_SIGNING_KEY"] (32+ bytes)

# URLs expire at end of day — delayed clicks still work for ~24h

```

The `day_bucket` is the UTC date when the notification was sent. Keys are 32 bytes or longer from the environment. A tap the following calendar day still validates because the day bucket matches the send day, but the window is bounded. A leaked URL from a private chat can only affect one verdict and only until the day rolls over.

We modeled the threat surface for a solo-CEO setup rather than a team with shared admin access. The URL lives inside a private Telegram thread. The worst case is one mistaken or malicious tap on one experiment that expires at midnight UTC. That is acceptable. We added a second layer of replay protection by storing `sha256(hmac_token)` on the recorded `FeedbackRecord` and rejecting any second POST that presents the same token hash.

The record shape is minimal:

```python

class FeedbackRecord(BaseModel):

id: str # uuid4, generated on intake

target_kind: Literal["experiment", "digest_section", "decision"]

target_id: str

verdict: Literal["keep", "kill", "comment_only"]

rationale: str = ""

ts: str # iso8601 utc

signing_token_hash: str = "" # sha256(hmac_token) — replay protection

applied_at: str | None = None

```

When the verdict is "kill", the intake handler does two things. It writes the experiment status to `killed_by_ceo`. It also appends a new Playbook entry with `is_anti_pattern=True` into ProceduralMemory. The next `strategic_cycle()` begins by calling `_apply_pending_feedback`, which loads those anti-patterns. The hypothesis proposer then receives the list of known anti-patterns and is instructed to skip any new proposal whose core claim matches a recorded pattern.

The total new surface was roughly 250 lines in `forge/feedback.py`, the POST route in the dashboard API, and a small Next.js page that renders the preview and fires the confirmed POST. End-to-end test coverage was one scripted flow: create a fake experiment, trigger the Telegram notification path in test mode, simulate the kill click, advance the cycle clock, and assert that the subsequent proposal batch does not contain a matching key result. That test, plus the existing unit coverage on the memory layer, was sufficient to meet the gate for live.

The larger point is not the implementation size. It is that the agent can now edit its own future behavior through an explicit, auditable, CEO-controlled channel. Proposals that survive the kill step are still proposals, but proposals that are killed leave a trace that subsequent cycles are required to respect. Over time the set of anti-patterns in ProceduralMemory becomes a living record of what the CEO has already decided does not deserve resources.

At the moment we declared the loop wired, the automated path had still only been exercised twice in 240 hours of runtime. The code was correct. The habit of treating the Telegram messages as actionable rather than informational was not yet formed. That is a usage problem, not a correctness problem, and it is the next thing we are measuring.

How We Built a CEO Feedback Loop for an Autonomous AI Company

Try our AI tools for free

Related Articles

How to Hire an AI Agent Developer in 2026 (What to Look For)

From Shadow Mode to Live: What 41 Strategic Cycles Taught Us Before We Flipped the Switch

Trade Smarter with AI Signals