ToolLab — honest comparisons between tools we actually use

ToolLab is the comparison half of Leefan Reports. Three pages today: Claude Code vs Cursor for pSEO ops; Firecrawl vs Browserless for structure analysis; Gumroad vs BOOTH vs Stripe for productized assets. Each one is a comparison we ran for ourselves, on a real piece of work, with the criteria written down before the result came in. None of the three pages is a vendor-supplied take.

This hub exists to set expectations about how we compare tools before you read any individual page. The methodology is more important than any one verdict.

1. What ToolLab is, and what it deliberately isn’t

ToolLab is operator-grade orientation. Small comparisons done on a single operator’s machine, against a fixed input, with criteria stated up-front and a counter-case section that names the situation in which the losing tool is the right choice.

ToolLab is not a procurement benchmark. If you are buying a tool for a 200-engineer team and need a controlled cross-vendor benchmark on production-grade load, read the page anyway to set up your own criteria, then run your own benchmark on your own workload.

ToolLab is also not sponsored. No vendor has paid for placement; no link on a ToolLab page is an affiliate link; no vendor reviewed or approved any text.

2. The site-wide benchmark posture

Each comparison page declares, at the top, which posture it takes:

Single-operator measurement. A real run on the operator’s machine, with raw inputs and dated run logs committed to the launch evidence pack.
Rubric-only synthesis. No numeric table. Weighted rubric with qualitative ratings.
Hybrid. Indicative numeric table captioned as a single-operator, free-tier, n=1 run, with raw logs available by request.

Page	Posture	Preview status
T01 — Claude Code vs Cursor for pSEO ops	Single-operator measurement	numeric rerun pending — rerun pending; no placeholder numbers shown.
S01 — Firecrawl vs Browserless for structure analysis	Hybrid	numeric rerun pending — free-tier rerun pending; no placeholder numbers shown.
M01 — Gumroad vs BOOTH vs Stripe for productized assets	Rubric-only	rubric-only — synthetic net-of-fee numbers stripped; vendor-fee table replaced with `[fee table pending current public-fee check]` until current public fee pages are re-checked.

3. The standard benchmark-posture disclosure block

Benchmark posture: Each comparison states whether it is a single-operator measurement, hybrid, or rubric-only synthesis, last verified 2026-05-21. Raw inputs and run logs can be requested at contact@leefan.co.jp. The numbers are orientation, not a procurement-grade benchmark. If you are buying this software for your team, rerun the input yourself on your own workload before deciding.

In this preview, {date} renders as {T01_RERUN_DATE} / {S01_RERUN_DATE} / {M01_LAST_VERIFIED} literally; none of those values is bound.

4. The three pages, in the order a new reader should take them

#	Page	What it gives you (qualitatively)	Verdict shape
1	T01 — Claude Code vs Cursor for pSEO ops numeric rerun pending	Card-×-iteration comparison framing on a fixed input. Output-shape distinction (wrote-file-to-disk vs wrote-file-into-chat). Explicit counter-case for where Cursor is the right tool.	Claude Code wins on file-driven, repeatable, auditable operations; Cursor wins on live-edit-in-IDE, conversational debug.
2	S01 — Firecrawl vs Browserless for structure analysis numeric rerun pending	10-URL run on free-tier of each (rerun pending). Load-bearing artifact: the “structure-only, NOT body” framing.	Firecrawl wins on structure extraction with zero JS execution; Browserless wins on full-page rendering for JS-heavy sites.
3	M01 — Gumroad vs BOOTH vs Stripe for productized assets rubric-only	Rubric-only comparison; vendor-fee table held at `[fee table pending current public-fee check]` in preview; JP-specific framing (BOOTH audience overlap, payout cadence, 印紙税 applicability).	Depends — Gumroad on cross-border simplicity; BOOTH on JP-domestic audience overlap for niche assets; Stripe on margin once SKU volume is high and you have your own checkout.

If you only have time for one, read T01 once it is rerun — it is the page with the clearest “we ran this, here is what happened” shape.

5. The criteria we use across ToolLab

Family	What it captures	When it dominates
Reproducibility	Can a stranger rerun the comparison and check our claim?	Always at least 20%; load-bearing for honesty.
Operator-fit	Does the tool fit a small-team / single-operator workflow?	Heavy when the audience is a 1–5-person practice.
Trust contract	Does the tool tell you when it has done something dangerous (overwritten a file, fired a network request, billed you)?	Heavy for any tool that can spend money or modify state on your behalf.
Time-to-first-real-output	How long from install to first useful output?	Heavy for new-tool evaluation.
JP applicability	For JP-domestic operators: payout cadence, tax form support, language coverage, audience overlap.	Heavy whenever the artifact is a JP-domestic operation.
Vendor exit cost	If we stop paying, what do we lose?	Heavy for any tool we propose to add to the `Leefan Reports` stack.

6. What ToolLab does not compare (and why)

Cloud-side coding agents at large scale. We do not run them at the scale that would produce a useful comparison.
Closed-vendor “AI content platforms” that gate evaluation behind sales-team demos.
Headless CMSes against each other. Out of scope.
General-purpose ASP / affiliate networks. Compliance and eligibility differ enough that a fair public comparison is not a useful artifact.
Anything we have not paid for ourselves on the current pricing tier.

7. Where these pages plug into the rest of the site

From ToolLab → OpsLab: T01’s “card primitive this benchmark assumes” link resolves to the OpsLab hub, with A1 as the canonical landing.
From ToolLab → DiagKit: S01 and M01 point at the DiagKit family for “what do you measure once you have picked the tool”.
From ToolLab → /about: the benchmark posture in §2 is restated as a tighter sentence in /about.

We do not link from ToolLab to vendor partner programs, affiliate networks, or trial sign-up CTAs.

8. What’s not here yet

T02 / T07 — Trust-contract comparison of agentic coding tools. Planned.
M02 — Free vs paid affiliate networks (JP-domestic). Planned.
A rerun of T01 and S01. Required before public launch (per the benchmark honesty decision).
A non-English version. Out of scope at v1.