Open sourcePure TypeScriptAgent first

Agent-first verification for end-to-end API workflows

Your agent says it's tested.
Make that claim defensible.

Don't trust another testing pitch. Ask your agent to compare Glubean with the API workflows, tests, CI, clients, auth, and failure diagnosis already in your repo.

A green run is useful. Proof is what your team can review, trace, and keep.

POST /auth/loginPOST /cartPOST /cart/promoPOST /checkout

real endpoints · shared state · structured failure evidence · CI promotion

Repo comparison prompt

Use Glubean skill.

Compare Glubean with our API workflow testing setup.
Look at tests that hit real endpoints, multi-step flows, auth/session, and state.
Tell me where it adds flow evidence, repair, or CI promotion; where our current unit, integration, or API-client tests are already enough; and the smallest end-to-end flow worth trying.
Do not recommend migration unless the benefit is clear.
Run this first — it teaches your agent what Glubean is:
npx skills add glubean/skill
Pick a starting prompt

No local agent? Ask the hosted Glubean GPT — skill and docs already loaded.

Open Glubean GPT

Green Is Not Proof

A passing log is not
enough proof.

Agents can already write tests and make them pass. The missing layer is what teams can inspect: what was covered, what evidence came back, what changed during repair, and what should become permanent verification.

A green run may not prove coverage

Agents can generate and run tests now. The hard question is whether they covered auth boundaries, negative paths, state flows, schemas, and business rules.

Terminal logs make agents guess

Pass/fail text is useful for humans, but agents need failed steps, request/response context, traces, and actual-vs-expected values to repair from facts.

Self-repair can weaken proof

If the agent broadens matchers, deletes negative cases, or turns schema checks into status-only tests, the next green result becomes less defensible.

Evidence, Not Logs

Logs make agents guess. Failure objects make repair defensible.

A terminal log tells the agent that something failed. A failure object tells it where, why, and what changed. Glubean keeps the request, response, assertion, trace, and actual-vs-expected values as structured evidence the agent can query, group, and repair against.

endpointfailed steprequest / responseexpected / actualassertion / trace

Raw CI transcript

The agent knows something failed, then has to infer the cause from noisy framework output.

FAIL auth/me.test.ts › returns profile
  AssertionError: expected 200, received 401
    at auth/me.test.ts:14:23
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

  Body: {"error":"Unauthorized"}
  Headers: {"content-type":"application/json"}
  ...11 more lines

Raw transcript → the agent guesses

Failure event JSON

The same failure becomes readable by people and structured enough for an agent to repair without weakening the test.

{
  "status": "failed",
  "runId": "clr_8f32",
  "testId": "auth.get-me",
  "step": "GET /users/me",
  "reason": { "kind": "assertion", "expected": 200, "actual": 401 },
  "events": [
    { "type": "http.request", "method": "GET", "url": "/users/me" },
    { "type": "http.response", "status": 401, "bodyShape": { "error": "string" } },
    { "type": "assert.failed", "path": "status", "expected": 200, "actual": 401 }
  ]
}

Failure object → the agent knows where, why, and what changed

Same failure. Only one gives your agent a repair path without quietly weakening the test.

The Defensibility Loop

Plan. Run. Diagnose.
Repair. Promote.

Plan defines what "tested" should cover, and Run executes real, multi-step API flows against live endpoints. Promote means only evidence-backed tests become long-term verification the team owns. The middle of the loop keeps repair from weakening the claim.

Agent-written test to defensible verification

One loop connects the plan, evidence, repair, and permanent test.

Same artifact

Plan defines what "tested" must cover before the agent fills in code.

explore/checkout.test.ts
test("checkout-flow")
  .step("login", async ({ http }) =>
    http.post("/auth/login").json()
  )
  .step("create-cart", async ({ http }, { token }) =>
    http.post("/cart", { json: { token } }).json()
  )
  .step("apply-promo", async ({ http }, { cartId }) =>
    http.post("/cart/promo", {
      json: { cartId, code: "AMBER10" }
    })
  )
  .step("checkout", async ({ http, expect }) => {
    const order = await http.post("/checkout");
    expect(order).toHaveStatus(201);
    expect(order.body.webhookDelivered).toBe(true);
  });

Reviewable test shape

Auth, state flows, negative cases, schemas, and business rules are named before code fills the gaps.

Not a single endpoint check

The first artifact explains what "tested" should mean, then turns it into runnable TypeScript.

The important shift

Generation becomes governable because the test can be inspected and changed.

Not just generated. Planned, diagnosed, repaired, and promoted into team-owned proof.

Starting Slices

Pick the first API workflow worth testing.

The hero prompt compares Glubean with your whole repo. Here, start narrow: pick the one multi-step API flow where defensible evidence would matter most, and let your agent prove a single slice first.

Agent-first quick start

Install the Skill, then ask your agent.

npx skills add glubean/skill

First, choose what you want your agent to answer.

Start broad, compare familiar tools, inspect your repo, or find one migration slice.

Compare Glubean with familiar API testing tools

Use this when you are deciding between Postman, Vite/Supertest, and Glubean.

Use Glubean skill.

Compare Glubean with Postman, Vite/Supertest, and common API test setups.
Focus on:
1. agent-written tests and how easy they are to audit;
2. failure diagnosis from logs versus structured evidence;
3. CI promotion and long-term ownership;
4. migration cost and when the existing stack is enough.
Give me the tradeoffs, not a sales pitch.

Then use the same Skill for the slice worth trying

Existing API

Existing API? Turn behavior into owned workflow tests.

Point your agent at source, OpenAPI, or live endpoints. Glubean helps produce TypeScript workflows with auth, assertions, and structured failure evidence.

Advanced mode

No API yet? Make the boundary executable first.

Start from requirements. The agent writes executable contracts before implementation, and reviewers read the generated surface before code lands.

Failed run

CI failed? Diagnose from evidence, not logs.

Give the agent the failed step, request, response, assertion, and trace instead of a raw CI transcript.

Migration

Already have API assets? Prove one slice before migration.

Already have Postman, OpenAPI, or legacy tests? Start with one representative slice, not a blind conversion.

Shared outcome

Same install

Shared outcome

Same runtime

Shared outcome

Same path to CI and Cloud

Trust Note

Secrets stay local. Evidence can go to Cloud.

.env and .env.secrets stay separated
22 built-in redaction rules run before upload
glubean redact previews what Cloud receives

What Glubean Is Not

Opinionated enough to say no.

Glubean is not another AI wrapper. It is a verification system for teams that need generated tests to be planned, inspected, repaired, and owned. Keep unit tests where they work; reach for Glubean when API behavior crosses endpoints, auth, state, and CI.

Not another test generator
Not self-healing theater
Not a log summarizer
Not blind migration
Not a unit test replacement
Not a green check your team cannot defend