# Dariusz Kowalski - Full Content (bilingual)

Context before LLM.

> This file contains the full textual content of the brand portfolio for
> LLM consumption. Bilingual - every entry tagged with **Language: en|pl**
> and pointing to its canonical primary URL (EN on portfolio.sdet.it,
> PL on portfolio.sdet.pl). Individual HTML pages remain canonical for humans.
>
> Last updated: 2026-06-03T06:41:07.801Z
> This file: https://portfolio.sdet.pl/llms-full.txt (bilingual feed served identically on both domains)

---

# Author

**Dariusz Kowalski** - AI Engineer · Test Automation Architect · Platform Builder

15+ years in IT. Started on a Commodore 64. Now builds AI-powered systems
for software testing and developer workflow automation. Creator of the CDAT
Pattern (Components-Data-Actions-Tests) for Playwright. Multi-agent QA
platform Jarvis (private, 34K LOC, 9 microservices, 15 production pipelines).

**Contact:**
- Email: darek@sdet.it
- LinkedIn: https://linkedin.com/in/darco81
- GitHub: https://github.com/darco81

**Thesis:** Context before LLM. AI in QA and automation does not start
with "write me a test" or "generate a report". It starts with deterministic,
pre-processed context that the LLM receives only after the data has been
cleaned. AI is the second step. The first is precise pipeline engineering
that delivers useful, deterministic data to the LLM.

**Availability:** Full-remote. B2B consulting or employment. EU+US timezones.

---

# Articles (27)


## skills-radar - lazy-loading skill discovery for Claude Code

**URL:** https://portfolio.sdet.it/articles/skills-radar
**Published:** 2026-05-11
**Language:** en
Tags: ai-tooling, mcp, claude-code, skills, mlx, apple-silicon, devtools

I ran /doctor and saw 6,000 tokens already gone - skill descriptions preloaded into the system prompt. Built a Two-Tier Discovery MCP server in a day. 68% reduction, local Apple Silicon, MIT.

## Hook

I ran `/doctor` in Claude Code on a Friday morning. Before I'd typed a single character, my prompt was 6,000 tokens deep. That's not the model thinking. That's not my project context. That's just **skill descriptions** - every installed skill across personal, project, and plugin scopes, preloaded into the system prompt at session start.

I had ~80 skills. Not because I was hoarding them - Claude Code's marketplace makes installing skills frictionless, and a sprawling skill library is genuinely useful. The cost is invisible until you measure it.

This is the story of how I built `skills-radar` - an open-source MCP server that fixes the problem - in a single day, why the obvious approach doesn't work, and what the production-grade solution actually requires.

## The thing nobody fixed

Late 2025, Anthropic shipped **Tool Search Tool** for the API. Tools marked `defer_loading: true` are invisible until Claude calls a built-in `tool_search_tool`. Their internal numbers: 85% token reduction, Opus 4.5 accuracy 79.5% → 88.1% on large tool libraries. They shipped the same idea for MCP servers in Claude Code shortly after.

But Tool Search is for **MCP tools**. Skills are a different mechanism - files in `~/.claude/skills/`, loaded via the Skill tool, not via MCP. Anthropic hasn't shipped the equivalent for skills yet. GitHub issues #16160 and #19105 sit open.

So I built it. Not because nobody else has tried - there are several `mcp-skill-server` projects in the wild - but because **none of them solve the core problem**.

## The discovery dilemma

Naive RAG over skills fails at the first hurdle:

> If the agent doesn't see the skills exist, it never queries the index. If it never queries the index, the lazy loading is pointless.

Most community projects ship a single MCP tool - `find_relevant_skill` - and assume Claude will query it on every turn. It doesn't. Without a Tier-1 surface signal telling the agent *"these skills exist and roughly do X, Y, Z"*, Tier-2 retrieval is invisible. The MCP server stays unused.

This is the lesson I learned from reading prior art (`bobmatnyc/mcp-skillset`, `back1ply/agent-skill-loader`, `gotalab/skillport`). Each got pieces right, but none combined: (a) Anthropic's own search-then-load pattern, (b) hot-reload, (c) trust-tiered threat model, (d) air-gapped install path, (e) multi-client support.

## Two-Tier Discovery - the architecture

`skills-radar` splits discovery in two complementary signals:

**Tier 1 - Mini-index, ~1k tokens, always preloaded.** A flat list of `name + 1-line summary` per skill, written to `~/.claude/SKILLS-INDEX.md` and imported into the global `CLAUDE.md`. Tells the agent *what exists*.

**Tier 2 - On-demand load via MCP.** Two tools:
- `search_skills(query, top_k=5, tags=None)` - hybrid retrieval (BM25 + dense embeddings, 70/30 weighted) over `description + when_to_use`. Returns ranked matches with name / description / trust / score / scope.
- `load_skill(name)` - full sanitized SKILL.md when the agent commits to acting.

Body of SKILL.md is **never** indexed for retrieval - only loaded on `load_skill`. This keeps the index small, focused, and accurate.

Result on my own 60-skill corpus: **6,000 tokens → 1,900 tokens preloaded**. ~68% reduction. The cost stays roughly flat as you scale to 500 skills.

Why two tools, not one? Because skills are more discrete than tools. When a user says "use wcag-toolkit-lead", the name is obvious - call `load_skill` directly. When they say "audit my a11y", the intent is fuzzy - call `search_skills` first. Anthropic's Tool Search ships one tool because tools are typically called by exact name; skills earn the second tool because their use is more declarative.

## Threat model - non-negotiable, day-one

A SKILL.md file is loaded directly into a host agent's context window as instructions. From the model's perspective, the difference between a system prompt and a skill body is mostly nominal - both shape behavior. **A malicious skill is a system-prompt injection vector.**

Common attack surfaces:
1. **Open-source skill collections** (e.g., the 1000+ in `awesome-agent-skills`) - anyone can submit, quality control varies
2. **Plugin marketplaces** - a hijacked maintainer account ships a malicious skill
3. **Project-cloned skills** - clone a repo, suddenly its `.claude/skills/` are part of your scan paths
4. **Your own future mistakes** - paste something you didn't sanity-check

Naive RAG loads any of these as authoritative instructions. We refuse to do that.

skills-radar ships four layers of defense, applied at ingest:

**Trust tier assignment.** Every skill is tagged at ingest with TRUSTED (config-explicit) > VERIFIED (Anthropic-official plugin cache) > USER (~/.claude/skills) > UNTRUSTED (anything else). Tier surfaced in `load_skill` response so downstream agents can refuse UNTRUSTED.

**Frontmatter validation.** Reserved-word rejection (`anthropic`, `claude`), name format (≤64 chars, lowercase + hyphens), required fields, max size 64KB.

**Body sanitization.** XML injection tag stripping (`<system>`, `<override>`, `<jailbreak>`, ...), prompt-injection regex catalog (configurable), optional live-execution syntax stripping for non-Claude-Code clients.

**Size cap.** UTF-8 byte-length cap per SKILL.md. Skills exceeding the cap are rejected entirely.

These don't make community skills safe to run blindly - they make them **measurable**, with surface-area visible to the agent. Combined with explicit trust tiers, downstream agents can implement policies like "refuse UNTRUSTED skills by default; require explicit user opt-in".

## Tech stack - three paths, one repo

The default install runs cross-platform on a single machine. Two optional paths add power: a 100% local Apple Silicon stack (zero network, zero cloud, zero Ollama), and a cross-platform LLM-augmented stack via Ollama.

| Layer | Default (cross-platform) | Mac 100% local (MLX) | Cross-platform LLM (Ollama) |
|---|---|---|---|
| Runtime | Python 3.11+ | - | - |
| MCP SDK | `mcp` (FastMCP) | - | - |
| Transport | stdio (Claude Code) / Streamable HTTP (production) | - | - |
| **Embedder** | `sentence-transformers/all-MiniLM-L6-v2` (90 MB, 384-dim, CPU-fast) | **MLX `Qwen3-Embedding-8B-4bit-DWQ`** (4096-dim, Apple Silicon) | - |
| Lexical | BM25 via `rank_bm25` | - | - |
| **Vector store** | ChromaDB (embedded, zero deps) | **Qdrant** (production, reusable across projects - share an instance with sdet-brain) | Qdrant |
| File watcher | `watchdog` 250 ms debounce | - | - |
| **Query rewriter** | NoOp | **MLX `Qwen3-Coder-30B-A3B-Instruct-4bit`** - lazy load + LRU cache | Ollama (`gemma4:e4b`), HTTP fallback |
| **Reranker** | NoOp | **MLX `Qwen3-Coder-30B-A3B`** - single-pass batch scoring | Ollama, per-pair scoring |
| Telemetry | Off (strict opt-in) | Same | Same |

Why this exact split: defaults are **light** (90 MB model, zero infrastructure) so the open-source community can install with `pip install skills-radar` and run immediately. Two power-user paths are **opt-in** so they don't bloat the base install but are wired up cleanly when you flip the config flags.

The **100% local Apple Silicon stack** is the design highlight. Set:

```yaml
embedder:
  backend: mlx
  model: mlx-community/Qwen3-Embedding-8B-4bit-DWQ
retrieval:
  rewriter:
    enabled: true
    backend: mlx
    model: mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
  reranker:
    enabled: true
    backend: mlx
    model: mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
```

…and the entire pipeline - embedding, query rewriting, reranking - runs on your M-series GPU + Neural Engine. No Ollama. No HTTP. No network. Repeated identical queries hit a per-instance LRU cache; warm queries cost ~3 seconds in an MCP server (model in memory) instead of 6+ on cold CLI invocations.

## Quality of retrieval - the numbers that matter

The Polish fuzzy query is the cleanest demonstration of the tradeoff. Same 60-skill corpus, same query, three configurations:

**Query:** `napisz mi post na LinkedIn o WCAG` (mixed-language, ambiguous intent - could be "write a LinkedIn post about WCAG" or "find me a WCAG-related skill").

| Config | Top 5 (name, score) | Verdict |
|---|---|---|
| Default sentence-transformers, NoOp rewriter | content-writing-lead (0.54), **ffcss-migrate (0.49 false positive)**, wcag-toolkit-lead (0.42), wcag-dynamic-test (0.32), wcag-report (0.29) | top-1 correct but margin razor-thin; #2 is a coincidental Tailwind→FFCSS migration skill |
| Default + Ollama rewriter (gemma4:e4b) | content-writing-lead (cleaner top), other a11y skills surface | margin widens, +100-300 ms latency |
| **MLX rewriter (Qwen3-Coder-30B-A3B-Instruct-4bit)** | a11y-audit (**0.71**), a11y-orchestrator (0.65), a11y-fix (0.61), wcag-static-analyze (0.56), wcag-fix (0.44) | **all 5 hits are a11y/WCAG**, top-1 above 0.7, ~9 s on CLI cold / ~3 s warm in MCP server |

Two things to notice. First - the rewriter doesn't preserve the original surface intent. With MLX rewriter on, `content-writing-lead` doesn't appear at all. The rewriter normalized "post o WCAG" to keywords like *web accessibility wcag accessibility standards accessibility standards* - which is fine for "find me a WCAG-related skill" intent, but a miss if you actually wanted writing help. Tradeoff is real and documented; rewriter is **off by default**.

Second - for English technical queries, the default backend is already solid (top-1 above 0.6 with clean separation). The MLX stack pays off mainly for fuzzy / multilingual / casual phrasing where the small embedder struggles. If your team writes queries in English engineering speak, you might never need the MLX path. If you're me and half your prompts are Polish, the MLX stack is the difference between "good enough most of the time" and "right every time."

## Local opt-in usage telemetry

I added a SQLite event log at `~/.local/share/skills-radar/stats.db`. Three event kinds - search, load, index - each with relevant fields (latency_ms, top1_score, trust tier, etc.). Strict opt-in: default disabled, no remote telemetry ever.

The `skills-radar stats` CLI surfaces:
- **Top loaded skills** (most actually fetched, not just searched - strong signal of usefulness)
- **Top queries** with frequency
- **Miss rate** - searches where top-1 score < 0.4 (calibrated from observation: below this, ranking is unreliable)
- **Recent events** with per-event detail

A miss rate above 30% is the signal to enable the Ollama rewriter or upgrade to MLX. Below 15%, the default stack is good enough.

## TUI dashboard

`skills-radar tui` starts a `rich.Live` real-time read-only dashboard with four panels:

```
┌─ skills-radar v0.3.0a0 · 60 skills · 5/5 paths · embedder=mlx · store=qdrant ─┐
├─ Trust tier breakdown ────────────────────────────────────────────────────────┤
│ TRUSTED    ████████░░░░░░░░  31                                                │
│ VERIFIED   ███████████░░░░░  38                                                │
│ USER       ░░░░░░░░░░░░░░░░   0                                                │
├─ Top queries · miss 12% of 17 ─┬─ Recent events · live ─────────────────────┤
│ wcag accessibility audit    3  │ 18:42  search  0.79  wcag audit  (132ms)    │
│ napisz post na linkedin     2  │ 18:41  load    perf-vue-runtime  (48ms)     │
│ vue memory leak             2  │ 18:40  search  0.67  vue memory leak  ...   │
├─ Top loaded skills ────────────┤                                              │
│ a11y-orchestrator           4  │                                              │
│ perf-vue-runtime            3  │                                              │
│ content-writing-lead        2  │                                              │
└────────────────────────────────┴──────────────────────────────────────────────┘
```

Recent events stream is color-coded: green for top-1 ≥ 0.6, yellow ≥ 0.4, red < 0.4. The miss-rate badge in the top-queries panel uses the same scheme. You can have this open on a second monitor while you work and watch the search quality in real time - invaluable for tuning hub-tags or deciding when to enable the rewriter.

## Hot reload

`watchdog` watches all configured paths. Each created / modified / deleted / moved SKILL.md triggers a single-record update in the index, debounced 250 ms to coalesce editor save bursts. Add a new SKILL.md, save, query through Claude Code immediately - no restart, no reindex command.

This is the differentiator vs every prior art I evaluated. Nobody else gets it right. `back1ply/agent-skill-loader` gets close but uses substring search that doesn't scale. `bobmatnyc/mcp-skillset` doesn't have it.

## Production deployment

For shared / multi-client / Docker deployments, `skills-radar serve --transport http` runs Streamable HTTP per MCP Python SDK guidance: `stateless_http=True, json_response=True` for horizontal scalability behind a load balancer.

The bundled `Dockerfile` is multi-stage: builder stage installs deps and pre-bakes the embedding model so the runtime stage starts in ~2 seconds instead of doing a 30-60 second first-run model download. Runtime is non-root (uid 1000), with offline HF Hub flags so it works in air-gapped environments. Defaults inside the container are strict: UNTRUSTED tier + `strip_live_exec=true` - community skills mounted via Docker shouldn't be allowed to run host-level commands.

`docker-compose.yml` mounts your `~/.claude/skills` and plugin cache read-only and persists the ChromaDB store as a named volume. Healthcheck POSTs a real MCP `initialize` handshake (not just GET - Streamable HTTP requires JSON-RPC body) and reports healthy in ~1 second.

## What I'd do differently

Four things, in retrospect:

1. **Start with the threat model, not the retrieval.** I wrote sanitization day one but spent 60% of effort on retrieval first. The retrieval problem is interesting; the threat model is what makes the tool actually deployable. If I were starting over I'd write `sanitize.py` and `trust tiers` first, tests for both, then build retrieval on top.

2. **Test BM25 before assuming hybrid is necessary.** My intuition was that pure BM25 would miss too many semantic matches. With short technical descriptions (50-300 chars), BM25 alone gets you 70-80% of the way. The hybrid retrieval pays for itself only in fuzzy / multilingual queries. For a hyper-minimal version, BM25-only would have shipped a week earlier.

3. **The `disable-model-invocation: true` flag is more useful than I initially thought.** Skills marked manual-only get filtered from `search_skills` automatically - turns out a non-trivial fraction of skills are templates / reference docs that shouldn't auto-trigger. Honoring this flag from day one is cheap; retrofitting is annoying.

4. **MLX latency is the wrong thing to optimize prematurely.** First-pass MLX rewriter implementation went straight for "score every candidate one at a time" pattern (mirroring Ollama). For a 20-candidate rerank, that's 20 separate inferences - minutes per query. Single-pass batch (one prompt enumerating all candidates, model returns `N=score` lines, parse with regex) collapses it to one inference, ~5-15 s for the whole pool. Cost: a regex parser. Reward: usable latency. Same lesson applies elsewhere - when working with local LLMs, **batch as much as the context window allows** before you start tuning model size.

## Scale economics

For a user with 80 skills:

| Strategy | Per-session cost | Worst case (use 1 skill) |
|---|---|---|
| Native Claude Code skill listing | ~6,000 tokens | ~6,000 tokens |
| skills-radar Two-Tier Discovery | ~1,000 tokens (mini-index) | ~1,000 + ~2,000 = ~3,000 tokens |
| skills-radar - multiple loads (5 skills) | ~1,000 + 5 × 2,000 = ~11,000 tokens | (rare - most sessions load 0-2) |

Net: in the realistic case (1-2 skills loaded per session), skills-radar saves ~3-5k tokens per session. At scale (500 skills), the native approach becomes unworkable; skills-radar's cost stays roughly flat.

This isn't just a cost story - it's a **quality** story. Anthropic's research on transformer attention shows that long context degrades response quality. A leaner prompt gives the model a fighting chance to stay focused.

## What's open

Several pieces consciously left for the next milestone:

- **FAISS store backend.** Lighter than ChromaDB (one file, zero schema), useful as fallback for restrictive environments where embedding even a SQLite-backed vector DB is too much.
- **Voyage / OpenAI embedder backends.** Cloud BYOK option for power users who want best-in-class embedding quality without a local 4 GB model.
- **Auto-discovery from GitHub repos** (e.g., `awesome-agent-skills`). One CLI command pulls a public skill collection into the UNTRUSTED tier with explicit per-skill confirm.
- **Crypto signing for VERIFIED tier.** Today VERIFIED is path-based (Anthropic-official plugin cache). Cryptographically signed skills with a trust manifest is the natural next step for community skill ecosystems.
- **LLM-based prompt-injection scanner.** Extends the regex catalog. A small local model (e.g. `gemma4:e4b` via Ollama) classifies suspicious bodies that the regex misses. Opt-in.
- **Sandbox `bundled_files`.** Today bundled files are enumerated in the `load_skill` response. A future version could optionally read-only sandbox files referenced one level deep so agents can pull them safely.
- **Multi-language hub-tags taxonomy.** Recommended `hub-tags` vocabulary (`a11y`, `perf`, `qa`, etc.) needs to be published and adopted to make filtered search useful at corpus scale.

## Repo + install

```bash
pip install skills-radar
skills-radar config-init
skills-radar index
claude mcp add skills-radar -- skills-radar serve --transport stdio --watch
```

Restart Claude Code. `/mcp` shows skills-radar connected. Run `skills-radar mini-index` and import `~/.claude/SKILLS-INDEX.md` into your global `CLAUDE.md`. That's the full setup.

Source: **github.com/darco81/skills-radar**. MIT license. Built one Friday in May 2026 between other things.

---


## skills-radar - leniwe ładowanie skili dla Claude Code

**URL:** https://portfolio.sdet.pl/articles/skills-radar
**Published:** 2026-05-11
**Language:** pl
Tags: ai-tooling, mcp, claude-code, skills, mlx, apple-silicon, devtools

Odpaliłem /doctor i widzę 6000 tokenów zjedzonych - same opisy skili w system prompt. MCP server z Two-Tier Discovery w jeden dzień. 68% redukcji, lokalny stack na Apple Silicon, MIT.

## Hook

W piątek rano odpaliłem `/doctor` w Claude Code. Zanim wpisałem jeden znak, prompt miał już 6000 tokenów. To nie model myślał. To nie był mój kontekst projektu. To były same **opisy skili** - wszystkie zainstalowane skille z personal, project i plugin scope, preloaded w system prompt na starcie sesji.

Miałem ~80 skili. Nie dlatego, że je zbieram - marketplace Claude Code sprawia, że instalowanie skili jest bezbolesne, a rozbudowana biblioteka realnie się przydaje. Koszt jest niewidzialny dopóki go nie zmierzysz.

To historia o tym, jak zbudowałem `skills-radar` - open-source serwer MCP, który rozwiązuje ten problem - w jeden dzień, dlaczego oczywiste podejście nie działa, i czego naprawdę wymaga produkcyjnie sensowne rozwiązanie.

## Czego nikt nie naprawił

Pod koniec 2025 Anthropic wypuścił **Tool Search Tool** dla API. Toole oznaczone `defer_loading: true` są niewidzialne dopóki Claude nie zawoła wbudowanego `tool_search_tool`. Ich wewnętrzne liczby: 85% redukcji tokenów, accuracy Opusa 4.5 z 79.5% do 88.1% przy dużych bibliotekach toolów. Krótko potem to samo zostało wypuszczone dla MCP serverów w Claude Code.

Tylko że Tool Search jest dla **MCP toolów**. Skille to inny mechanizm - pliki w `~/.claude/skills/`, ładowane przez Skill tool, nie przez MCP. Anthropic nie wypuścił jeszcze ekwiwalentu dla skili. Issue'y na GitHubie #16160 i #19105 wiszą otwarte.

Więc zbudowałem to. Nie dlatego, że nikt nie próbował - kilka projektów typu `mcp-skill-server` jest w obiegu - ale dlatego, że **żaden z nich nie rozwiązuje problemu u korzenia**.

## Dylemat discovery

Naiwny RAG po skilach pada na pierwszej przeszkodzie:

> Jeśli agent nie widzi, że skille istnieją, nigdy nie zapyta indeksu. Jeśli nigdy nie zapyta indeksu, lazy loading jest bezsensowny.

Większość community-owych projektów dostarcza jeden tool MCP - `find_relevant_skill` - i zakłada, że Claude zapyta go w każdej turze. Nie zapyta. Bez Tier-1 surface signala mówiącego agentowi *"te skille istnieją i mniej więcej robią X, Y, Z"*, retrieval na Tier-2 jest niewidzialny. MCP server zostaje nieużywany.

Tę lekcję wyniosłem z czytania prior artu (`bobmatnyc/mcp-skillset`, `back1ply/agent-skill-loader`, `gotalab/skillport`). Każdy łapał kawałki, ale żaden nie łączył: (a) wzorca search-then-load Anthropica, (b) hot-reloadu, (c) trust-tiered threat modelu, (d) air-gapped install path, (e) wsparcia dla wielu klientów.

## Two-Tier Discovery - architektura

`skills-radar` rozdziela discovery na dwa komplementarne sygnały:

**Tier 1 - mini-index, ~1k tokenów, zawsze preloaded.** Płaska lista `name + jednolinijkowy summary` per skill, zapisywana do `~/.claude/SKILLS-INDEX.md` i importowana w globalnym `CLAUDE.md`. Mówi agentowi *co istnieje*.

**Tier 2 - load on-demand przez MCP.** Dwa toole:
- `search_skills(query, top_k=5, tags=None)` - hybrid retrieval (BM25 + dense embeddingi, waga 70/30) po `description + when_to_use`. Zwraca rankingowane matche z name / description / trust / score / scope.
- `load_skill(name)` - pełny zsanityzowany SKILL.md, gdy agent zdecyduje się działać.

Body SKILL.md **nigdy** nie jest indeksowane do retrievalu - ładowane wyłącznie przy `load_skill`. Indeks zostaje mały, focused i accurate.

Wynik na moim 60-skilowym korpusie: **6000 tokenów → 1900 tokenów preloaded**. ~68% redukcji. Koszt zostaje płaski przy skali do 500 skili.

Czemu dwa toole, a nie jeden? Bo skille są bardziej dyskretne niż toole. Gdy user mówi "use wcag-toolkit-lead", nazwa jest oczywista - zawołaj `load_skill` bezpośrednio. Gdy mówi "audit my a11y", intencja jest fuzzy - najpierw `search_skills`. Tool Search Anthropica daje jeden tool, bo toole zwykle wołane są po dokładnej nazwie; skille zarabiają drugi tool, bo ich użycie jest bardziej deklaratywne.

## Threat model - bez kompromisów, od pierwszego dnia

Plik SKILL.md jest ładowany bezpośrednio do context window agenta jako instrukcje. Z perspektywy modelu różnica między system promptem a body skila jest w zasadzie nominalna - i jedno, i drugie kształtuje zachowanie. **Złośliwy skill to wektor system-prompt injection.**

Typowe surface'y ataku:
1. **Open-source kolekcje skili** (np. 1000+ w `awesome-agent-skills`) - każdy może coś wrzucić, kontrola jakości różna
2. **Marketplace pluginów** - przejęte konto maintainera shippuje złośliwy skill
3. **Skille z klonowanego repo** - klonujesz repo i nagle jego `.claude/skills/` są w twoich scan paths
4. **Twoje własne przyszłe pomyłki** - wkleiłeś coś bez weryfikacji

Naiwny RAG ładuje którekolwiek z tych jako autorytatywne instrukcje. Tego nie chcemy.

skills-radar dostarcza cztery warstwy obrony, nakładane przy ingest:

**Trust tier assignment.** Każdy skill jest tagowany przy ingest jako TRUSTED (config-explicit) > VERIFIED (Anthropic-official plugin cache) > USER (~/.claude/skills) > UNTRUSTED (cokolwiek innego). Tier widoczny w response z `load_skill`, więc downstream agent może odmówić UNTRUSTED.

**Walidacja frontmattera.** Reserved-word rejection (`anthropic`, `claude`), format nazwy (≤64 znaków, lowercase + hyphens), wymagane pola, max size 64KB.

**Sanityzacja body.** Strip XML injection tagów (`<system>`, `<override>`, `<jailbreak>`, ...), katalog regexów dla prompt-injection (configurable), opcjonalne stripowanie składni live-execution dla klientów spoza Claude Code.

**Size cap.** Cap UTF-8 byte-length per SKILL.md. Skille przekraczające cap są odrzucane całkowicie.

To nie sprawia, że community skille są bezpieczne do uruchamiania na ślepo - sprawia, że są **mierzalne**, z surface area widoczną dla agenta. W połączeniu z explicit trust tiers downstream agenty mogą wdrożyć policy typu "domyślnie odrzucaj UNTRUSTED, wymagaj explicit user opt-in".

## Stack - trzy ścieżki, jedno repo

Default install działa cross-platform na jednej maszynie. Dwie opcjonalne ścieżki dodają mocy: 100% lokalny stack na Apple Silicon (zero sieci, zero chmury, zero Ollamy) i cross-platform LLM-augmented stack przez Ollamę.

| Layer | Default (cross-platform) | Mac 100% local (MLX) | Cross-platform LLM (Ollama) |
|---|---|---|---|
| Runtime | Python 3.11+ | - | - |
| MCP SDK | `mcp` (FastMCP) | - | - |
| Transport | stdio (Claude Code) / Streamable HTTP (production) | - | - |
| **Embedder** | `sentence-transformers/all-MiniLM-L6-v2` (90 MB, 384-dim, CPU-fast) | **MLX `Qwen3-Embedding-8B-4bit-DWQ`** (4096-dim, Apple Silicon) | - |
| Lexical | BM25 via `rank_bm25` | - | - |
| **Vector store** | ChromaDB (embedded, zero deps) | **Qdrant** (production, reusable across projects - share an instance with sdet-brain) | Qdrant |
| File watcher | `watchdog` 250 ms debounce | - | - |
| **Query rewriter** | NoOp | **MLX `Qwen3-Coder-30B-A3B-Instruct-4bit`** - lazy load + LRU cache | Ollama (`gemma4:e4b`), HTTP fallback |
| **Reranker** | NoOp | **MLX `Qwen3-Coder-30B-A3B`** - single-pass batch scoring | Ollama, per-pair scoring |
| Telemetry | Off (strict opt-in) | Same | Same |

Dlaczego dokładnie taki podział: defaulty są **lekkie** (90 MB model, zero infrastruktury), żeby open-source community mogło zainstalować przez `pip install skills-radar` i odpalić od razu. Dwie ścieżki dla power-userów są **opt-in** - nie puchną base install, ale są podpięte czysto, gdy przełączysz flagi w configu.

Design highlight to **100% lokalny stack na Apple Silicon**. Ustaw:

```yaml
embedder:
  backend: mlx
  model: mlx-community/Qwen3-Embedding-8B-4bit-DWQ
retrieval:
  rewriter:
    enabled: true
    backend: mlx
    model: mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
  reranker:
    enabled: true
    backend: mlx
    model: mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
```

…i cały pipeline - embedding, query rewriting, reranking - leci na twoim M-series GPU + Neural Engine. Bez Ollamy. Bez HTTP. Bez sieci. Identyczne zapytania trafiają w per-instance LRU cache; warm queries kosztują ~3 sekundy w MCP serverze (model w pamięci) zamiast 6+ przy zimnych wywołaniach z CLI.

## Jakość retrievalu - liczby, które mają znaczenie

Polish fuzzy query to najczystsza demonstracja tradeoffu. Ten sam 60-skilowy korpus, to samo zapytanie, trzy konfiguracje:

**Query:** `napisz mi post na LinkedIn o WCAG` (mixed-language, ambiguous intent - może być "napisz posta o WCAG" albo "znajdź mi skill związany z WCAG").

| Config | Top 5 (name, score) | Verdict |
|---|---|---|
| Default sentence-transformers, NoOp rewriter | content-writing-lead (0.54), **ffcss-migrate (0.49 false positive)**, wcag-toolkit-lead (0.42), wcag-dynamic-test (0.32), wcag-report (0.29) | top-1 trafia, ale margines jest brzytwą; #2 to przypadkowy skill do migracji Tailwind→FFCSS |
| Default + Ollama rewriter (gemma4:e4b) | content-writing-lead (czystszy top), inne a11y skille się pojawiają | margines się rozszerza, +100-300 ms latency |
| **MLX rewriter (Qwen3-Coder-30B-A3B-Instruct-4bit)** | a11y-audit (**0.71**), a11y-orchestrator (0.65), a11y-fix (0.61), wcag-static-analyze (0.56), wcag-fix (0.44) | **wszystkie 5 hitów to a11y/WCAG**, top-1 powyżej 0.7, ~9 s na cold CLI / ~3 s warm w MCP serverze |

Dwie rzeczy do zauważenia. Po pierwsze - rewriter nie zachowuje pierwotnej intencji surface'owej. Z włączonym MLX rewriterem `content-writing-lead` w ogóle się nie pojawia. Rewriter znormalizował "post o WCAG" do keywordów typu *web accessibility wcag accessibility standards accessibility standards* - co jest OK dla intencji "znajdź mi skill związany z WCAG", ale jest miss, jeśli rzeczywiście chciałeś pomocy z pisaniem. Tradeoff jest realny i udokumentowany; rewriter jest **off by default**.

Po drugie - dla angielskich technicznych zapytań default backend już jest solidny (top-1 powyżej 0.6 z czystą separacją). MLX stack zwraca się głównie przy fuzzy / multilingual / casual phrasing, gdzie mały embedder nie wyrabia. Jeśli twój zespół pisze zapytania w angielskim engineering-speak, możesz nigdy nie potrzebować MLX path. Jeśli jesteś mną i połowa promptów jest po polsku, MLX stack to różnica między "good enough most of the time" a "right every time".

## Lokalna telemetria opt-in

Dorzuciłem SQLite event log w `~/.local/share/skills-radar/stats.db`. Trzy event kindy - search, load, index - każdy z relevantnymi polami (latency_ms, top1_score, trust tier itd.). Strict opt-in: domyślnie disabled, zero remote telemetry kiedykolwiek.

CLI `skills-radar stats` pokazuje:
- **Top loaded skills** (najczęściej rzeczywiście fetchowane, nie tylko wyszukiwane - mocny sygnał użyteczności)
- **Top queries** z częstotliwością
- **Miss rate** - searche z top-1 score < 0.4 (skalibrowane z obserwacji: poniżej tego progu ranking jest niewiarygodny)
- **Recent events** z detalem per-event

Miss rate powyżej 30% to sygnał, żeby włączyć Ollama rewriter albo upgrade'ować do MLX. Poniżej 15% default stack jest wystarczający.

## TUI dashboard

`skills-radar tui` startuje real-time read-only dashboard na `rich.Live` z czterema panelami:

```
┌─ skills-radar v0.3.0a0 · 60 skills · 5/5 paths · embedder=mlx · store=qdrant ─┐
├─ Trust tier breakdown ────────────────────────────────────────────────────────┤
│ TRUSTED    ████████░░░░░░░░  31                                                │
│ VERIFIED   ███████████░░░░░  38                                                │
│ USER       ░░░░░░░░░░░░░░░░   0                                                │
├─ Top queries · miss 12% of 17 ─┬─ Recent events · live ─────────────────────┤
│ wcag accessibility audit    3  │ 18:42  search  0.79  wcag audit  (132ms)    │
│ napisz post na linkedin     2  │ 18:41  load    perf-vue-runtime  (48ms)     │
│ vue memory leak             2  │ 18:40  search  0.67  vue memory leak  ...   │
├─ Top loaded skills ────────────┤                                              │
│ a11y-orchestrator           4  │                                              │
│ perf-vue-runtime            3  │                                              │
│ content-writing-lead        2  │                                              │
└────────────────────────────────┴──────────────────────────────────────────────┘
```

Stream recent events jest color-coded: zielony przy top-1 ≥ 0.6, żółty ≥ 0.4, czerwony < 0.4. Badge miss-rate w panelu top queries używa tego samego schematu. Możesz mieć to otwarte na drugim monitorze podczas pracy i obserwować jakość searcha w czasie rzeczywistym - bezcenne przy tuningu hub-tagów albo decyzji, kiedy włączyć rewriter.

## Hot reload

`watchdog` obserwuje wszystkie skonfigurowane paths. Każdy created / modified / deleted / moved SKILL.md triggeruje single-record update w indeksie, z debouncingiem 250 ms, żeby zlepić bursty zapisów z edytora. Dodaj nowy SKILL.md, zapisz, zapytaj przez Claude Code natychmiast - bez restartu, bez komendy reindex.

To jest differentiator vs każdy prior art, który ewaluowałem. Nikt inny tego nie ma dobrze. `back1ply/agent-skill-loader` zbliża się, ale używa substring searcha, który nie skaluje. `bobmatnyc/mcp-skillset` w ogóle tego nie ma.

## Deployment produkcyjny

Dla shared / multi-client / Docker deploymentów `skills-radar serve --transport http` chodzi na Streamable HTTP zgodnie z guidance MCP Python SDK: `stateless_http=True, json_response=True` dla horizontal scalability za load balancerem.

Wbudowany `Dockerfile` jest multi-stage: builder stage instaluje deps i pre-bake'uje model embeddingowy, więc runtime stage startuje w ~2 sekundy zamiast pobierać model 30-60 sekund przy first-run. Runtime jako non-root (uid 1000), z offline HF Hub flagami, więc działa w air-gapped środowiskach. Defaulty wewnątrz kontenera są strict: tier UNTRUSTED + `strip_live_exec=true` - community skille zamontowane przez Dockera nie powinny móc odpalać komend host-level.

`docker-compose.yml` montuje twoje `~/.claude/skills` i plugin cache read-only i persystuje store ChromaDB jako named volume. Healthcheck POST-uje prawdziwy MCP `initialize` handshake (nie tylko GET - Streamable HTTP wymaga JSON-RPC body) i raportuje healthy w ~1 sekundę.

## Co bym zrobił inaczej

Cztery rzeczy z perspektywy czasu:

1. **Zacząć od threat modelu, nie od retrievalu.** Pisałem sanityzację od pierwszego dnia, ale 60% efortu poszło najpierw na retrieval. Problem retrievalu jest ciekawy; threat model jest tym, co czyni narzędzie deployowalnym. Gdybym zaczynał od nowa, najpierw napisałbym `sanitize.py` i `trust tiers`, testy do obu, dopiero potem retrieval.

2. **Przetestować BM25 przed założeniem, że hybrid jest konieczny.** Moja intuicja mówiła, że czysty BM25 będzie omijał za dużo semantic matchy. Przy krótkich technicznych opisach (50-300 znaków) sam BM25 daje 70-80% rezultatu. Hybrid retrieval zwraca się dopiero przy fuzzy / multilingual queries. Dla hyper-minimalnej wersji BM25-only zashippowałby się tydzień wcześniej.

3. **Flaga `disable-model-invocation: true` jest bardziej użyteczna, niż początkowo myślałem.** Skille oznaczone manual-only są automatycznie filtrowane z `search_skills` - okazuje się, że niemała frakcja skili to template'y / reference docs, które nie powinny się auto-triggerować. Honorowanie tej flagi od pierwszego dnia jest tanie; retrofitting jest upierdliwy.

4. **Latencja MLX to złe miejsce na premature optimization.** Pierwsza wersja MLX rewritera poszła prosto we wzorzec "score every candidate one at a time" (mirroring Ollama). Dla rerank'a 20 kandydatów to 20 osobnych inferencji - minuty per query. Single-pass batch (jeden prompt enumerujący wszystkich kandydatów, model zwraca linie `N=score`, parsowane regexem) zwija to do jednej inferencji, ~5-15 s dla całej puli. Koszt: parser regex. Reward: usable latency. Ta sama lekcja stosuje się szerzej - przy lokalnych LLM **batchuj tyle, ile pozwala context window**, zanim zaczniesz tunować rozmiar modelu.

## Ekonomia skali

Dla usera z 80 skilami:

| Strategy | Per-session cost | Worst case (use 1 skill) |
|---|---|---|
| Native Claude Code skill listing | ~6,000 tokens | ~6,000 tokens |
| skills-radar Two-Tier Discovery | ~1,000 tokens (mini-index) | ~1,000 + ~2,000 = ~3,000 tokens |
| skills-radar - multiple loads (5 skills) | ~1,000 + 5 × 2,000 = ~11,000 tokens | (rare - most sessions load 0-2) |

Net: w realnym przypadku (1-2 skille loaded per sesja) skills-radar oszczędza ~3-5k tokenów na sesję. Przy skali (500 skili) natywne podejście staje się nieużywalne; koszt skills-radar zostaje płaski.

To nie tylko cost story - to **quality** story. Badania Anthropica o transformer attention pokazują, że długi kontekst degraduje jakość odpowiedzi. Szczuplejszy prompt daje modelowi szansę zostać focused.

## Co zostaje otwarte

Kilka kawałków świadomie zostawionych na następny milestone:

- **FAISS store backend.** Lżejszy niż ChromaDB (jeden plik, zero schemy), użyteczny jako fallback w restrictive środowiskach, gdzie nawet SQLite-backed vector DB jest za dużo.
- **Voyage / OpenAI embedder backends.** Cloud BYOK jako opcja dla power-userów, którzy chcą best-in-class embedding quality bez lokalnego modelu 4 GB.
- **Auto-discovery z GitHub repo** (np. `awesome-agent-skills`). Jedna komenda CLI ściąga publiczną kolekcję skili do tieru UNTRUSTED z explicit confirm per-skill.
- **Crypto signing dla tieru VERIFIED.** Dziś VERIFIED jest path-based (Anthropic-official plugin cache). Cryptographically signed skille z trust manifestem to naturalny następny krok dla community skill ecosystems.
- **LLM-based prompt-injection scanner.** Rozszerza katalog regexów. Mały lokalny model (np. `gemma4:e4b` przez Ollamę) klasyfikuje podejrzane body, których regexy nie złapały. Opt-in.
- **Sandbox `bundled_files`.** Dziś bundled files są enumerowane w response z `load_skill`. Przyszła wersja mogłaby opcjonalnie sandbox'ować read-only pliki referencowane jeden poziom w głąb, żeby agenty mogły je bezpiecznie pobierać.
- **Multi-language taksonomia hub-tagów.** Rekomendowany słownik `hub-tags` (`a11y`, `perf`, `qa` itd.) musi zostać opublikowany i przyjęty, żeby filtered search był użyteczny przy skali korpusu.

## Repo + instalacja

```bash
pip install skills-radar
skills-radar config-init
skills-radar index
claude mcp add skills-radar -- skills-radar serve --transport stdio --watch
```

Zrestartuj Claude Code. `/mcp` pokazuje skills-radar connected. Odpal `skills-radar mini-index` i zaimportuj `~/.claude/SKILLS-INDEX.md` w globalnym `CLAUDE.md`. To cały setup.

Source: **github.com/darco81/skills-radar**. Licencja MIT. Zbudowane w jeden majowy piątek 2026 między innymi rzeczami.

---


## sdet-brain - persistent RAG over MCP for one human and three Claudes

**URL:** https://portfolio.sdet.it/articles/sdet-brain
**Published:** 2026-05-09
**Language:** en
Tags: ai-tooling, rag, mcp, qdrant, mlx, claude, apple-silicon

How I stopped copy-pasting brand context into every new chat. Local-first RAG with Qdrant and MLX, exposed as 11 MCP tools to Claude Desktop, Claude Code, and OpenCode at the same time.

## The problem nobody warned me about

Three months into AI-augmented brand work I had a problem nobody warned me about: copy-paste fatigue.

Every new Claude.ai chat started the same way. Paste recent decisions. Paste voice samples. Paste current sprint state. Paste the file I'm working on. Paste the constraints. Send. Wait for the model to acknowledge it has context. Then ask the actual question.

For someone shipping case studies, sprint reports, and decision logs daily - that's hours per week of copy-paste, and it didn't scale.

Cloud RAG services existed but missed the point. My brand corpus is private until I publish it. Pricing notes, internal decisions, voice experiments mid-iteration - none of that should leave the laptop.

So I built the thing I needed.

## What it is, in one paragraph

sdet-brain is a single persistent index over my Markdown corpus, exposed as MCP tools so every MCP-aware client (Claude Desktop, Claude Code, OpenCode) sees the same context simultaneously. Embeddings are computed locally on Apple Silicon via MLX. The vector store is Qdrant in Docker. The server is FastAPI plus FastMCP 3.0 with stdio, SSE, and streamable HTTP transports. Markdown stays on disk as the single source of truth - Qdrant only holds derivatives.

## Architecture

```mermaid
graph TB
    subgraph Clients
        CD[Claude Desktop<br/>MCP stdio]
        CC[Claude Code<br/>MCP HTTP]
        OC[OpenCode<br/>MCP HTTP]
        WEB[Web client<br/>REST + SSE]
    end

    subgraph Server
        FAPI[FastAPI app]
        FMCP[FastMCP 3.0 wrapper]
        TOOLS[MCP tools<br/>core + domain]
    end

    subgraph Pipeline
        ING[Ingestion<br/>parser + chunker]
        EMB[Embeddings<br/>MLX + Gemini]
        WTC[Watchdog<br/>auto-reindex]
    end

    subgraph Storage
        QD[(Qdrant<br/>vectors + payload)]
        FS[(Markdown corpus<br/>on disk)]
    end

    CD --> FMCP
    CC --> FMCP
    OC --> FMCP
    WEB --> FAPI
    FAPI --> TOOLS
    FMCP --> TOOLS
    TOOLS --> ING
    TOOLS --> EMB
    TOOLS --> QD
    FS --> WTC
    WTC --> ING
    ING --> EMB
    EMB --> QD
```

Four layers, four top-level packages:

- **server/** - FastAPI plus FastMCP 3.0. Exposes 11 MCP tools.
- **ingestion/** - frontmatter parser, semantic chunker, watchdog.
- **embeddings/** - MLX local (primary), Gemini fallback.
- **storage/** - Qdrant client with hybrid search (BM25 + dense + RRF fusion).

CLI entrypoints live in `cli/`. Markdown corpus lives wherever I keep it on disk - Qdrant only stores derivatives.

## How a query flows

1. Client (say, Claude Code) calls the MCP tool `search` over HTTP.
2. FastMCP dispatches to the tools layer.
3. The embedder (MLX) computes the query vector. Lazy-loaded on first call (a few seconds), sub-100ms after that.
4. Qdrant runs hybrid search: BM25 + dense, RRF fusion, optional cross-encoder rerank.
5. The tool returns chunks with payload - path, source type, score, snippet.
6. The client gets JSON back, the model cites the results in its response.

For richer questions three LLM-backed tools layer on top: `query_rewrite` (HyDE-style query expansion), `multi_query_search` (decomposition plus RRF fusion across sub-queries), and `summarize_results` (cited summary).

## Why local-first, not cloud

Three reasons, in priority order.

**Privacy by architecture.** Brand corpus is private until I publish it. Embeddings and reasoning on a Mac means no inference traffic leaves the laptop. There is no "we promise not to train on it" toggle to trust.

**Zero per-query cost.** Apple Silicon runs Qwen3-Embedding-0.6B and Qwen3-Next-80B locally. Each query costs the laptop's electricity, not API tokens. With a hundred-plus queries on a working day this matters more than it sounds.

**Latency.** Round-tripping to a hosted embedding service adds 100-300ms per query. Local MLX is 20-50ms. For interactive tooling that's the difference between flow and friction.

## Six release tiers in two days

This is where the AI-velocity story comes in. The whole stack - MVP through DX polish - shipped between 30 April and 1 May 2026 across three autonomous Claude Code sessions.

| Tier | Tag | Highlight |
|---|---|---|
| 1 | v0.1.0 | MVP - Qdrant + MLX + 4 core MCP tools + watcher |
| 1.1 | v0.1.1 | Polish - healthcheck, env-driven paths, perf |
| 2 | v0.2.0 | Hybrid search (BM25 + RRF) + cross-encoder rerank + 5 domain tools |
| 3 | v0.3.0 | Local MLX LLM (Qwen3-Next-80B) + `/chat` + SSE streaming |
| 4 | v0.4.0 | Qwen3-Embedding-8B + tiered LLM router + multi-query agentic retrieval |
| 5 | v0.5.0 | DX - REPL CLI + inline citations + saved templates |

Each tier had its own atomic plan, atomic commits, and re-run quality gates: 213 tests passing, mypy --strict on 70 files, ruff clean, before moving on.

## Why source-available, not OSS (yet)

The repo is here for transparency, reference, and learning. You can inspect the architecture, run it locally for your own corpus, learn the patterns. What you can't do is fork it commercially or wrap it into a hosted product without talking to me first.

A formal OSI license decision (likely AGPL-3.0 or similar) will come with a structured public launch - when documentation, demo dataset, and onboarding tutorial are ready. That's a separate sprint, planned for a later milestone.

## What's next

A full case study with setup tutorial, demo Markdown corpus, and architecture deep-dive is planned as a "From the field" series episode in the coming weeks.

For now: the source is on GitHub, the architecture is in the README, and the patterns are battle-tested on my own brand work daily.

## Links

- [Repository](https://github.com/darco81/sdet-brain)
- [Architecture overview](https://github.com/darco81/sdet-brain#architecture-high-level)
- [Sprint reports](https://github.com/darco81/sdet-brain/tree/main/docs/sprints)

---


## sdet-brain - trwały RAG przez MCP dla jednego człowieka i trzech Claude'ów

**URL:** https://portfolio.sdet.pl/articles/sdet-brain
**Published:** 2026-05-09
**Language:** pl
Tags: ai-tooling, rag, mcp, qdrant, mlx, claude, apple-silicon

Jak przestałem wklejać kontekst marki do każdego nowego czatu. RAG działający lokalnie na Qdrant i MLX, widoczny jako 11 narzędzi MCP jednocześnie dla Claude Desktop, Claude Code i OpenCode.

## Problem, o którym nikt nie ostrzega

Po trzech miesiącach pracy z modelami AI nad marką trafiłem na problem, o którym nikt mnie nie uprzedził: zmęczenie wklejaniem kontekstu.

Każdy nowy czat z Claudem zaczynał się tak samo. Wklej ostatnie decyzje. Wklej próbki głosu. Wklej stan bieżącego sprintu. Wklej plik, nad którym pracuję. Wklej constraints. Wyślij. Poczekaj, aż model potwierdzi, że ma kontekst. Dopiero teraz zadaj właściwe pytanie.

Dla kogoś, kto codziennie shippuje case studies, sprint reporty i decision logi - to są godziny tygodniowo na sam copy-paste. To nie skaluje się.

Cloudowe RAG-i istnieją, ale rozjeżdżają się z punktem. Mój korpus marki jest prywatny, dopóki czegoś nie opublikuję. Notatki o pricing, decyzje wewnętrzne, eksperymenty z głosem w trakcie iteracji - żadna z tych rzeczy nie powinna opuszczać laptopa.

Zbudowałem to, czego potrzebowałem.

## Co to jest, w jednym akapicie

sdet-brain to jeden trwały indeks nad moim korpusem Markdown, eksponowany jako narzędzia MCP. Każdy klient z obsługą MCP (Claude Desktop, Claude Code, OpenCode) widzi ten sam kontekst równolegle. Embeddingi są liczone lokalnie na Apple Silicon przez MLX. Vector store to Qdrant w Dockerze. Serwer to FastAPI plus FastMCP 3.0 z transportami stdio, SSE i streamable HTTP. Markdown zostaje na dysku jako jedno źródło prawdy - w Qdrancie żyją tylko derywaty.

## Architektura

```mermaid
graph TB
    subgraph Clients
        CD[Claude Desktop<br/>MCP stdio]
        CC[Claude Code<br/>MCP HTTP]
        OC[OpenCode<br/>MCP HTTP]
        WEB[Web client<br/>REST + SSE]
    end

    subgraph Server
        FAPI[FastAPI app]
        FMCP[FastMCP 3.0 wrapper]
        TOOLS[MCP tools<br/>core + domain]
    end

    subgraph Pipeline
        ING[Ingestion<br/>parser + chunker]
        EMB[Embeddings<br/>MLX + Gemini]
        WTC[Watchdog<br/>auto-reindex]
    end

    subgraph Storage
        QD[(Qdrant<br/>vectors + payload)]
        FS[(Markdown corpus<br/>on disk)]
    end

    CD --> FMCP
    CC --> FMCP
    OC --> FMCP
    WEB --> FAPI
    FAPI --> TOOLS
    FMCP --> TOOLS
    TOOLS --> ING
    TOOLS --> EMB
    TOOLS --> QD
    FS --> WTC
    WTC --> ING
    ING --> EMB
    EMB --> QD
```

Cztery warstwy, cztery top-level pakiety:

- **server/** - FastAPI plus FastMCP 3.0. Eksponuje 11 narzędzi MCP.
- **ingestion/** - parser frontmatteru, semantyczny chunker, watchdog.
- **embeddings/** - MLX lokalnie (główny), Gemini jako fallback.
- **storage/** - klient Qdrant z hybrid search (BM25 + dense + RRF fusion).

Wejścia CLI siedzą w `cli/`. Korpus Markdown żyje tam, gdzie go trzymam - w Qdrancie tylko derywaty.

## Jak płynie zapytanie

1. Klient (powiedzmy Claude Code) woła narzędzie MCP `search` po HTTP.
2. FastMCP dispatch'uje do warstwy tools.
3. Embedder (MLX) liczy wektor zapytania. Lazy-load przy pierwszym wywołaniu (kilka sekund), poniżej 100 ms potem.
4. Qdrant odpala hybrid search - BM25 + dense, RRF fusion, opcjonalny cross-encoder rerank.
5. Narzędzie zwraca chunki z payloadem - ścieżka, source type, score, snippet.
6. Klient dostaje JSON, model cytuje wyniki w odpowiedzi.

Do bogatszych pytań nakładają się trzy narzędzia LLM-backed: `query_rewrite` (rozszerzanie zapytań w stylu HyDE), `multi_query_search` (dekompozycja plus RRF fusion po sub-zapytaniach) i `summarize_results` (podsumowanie z cytatami).

## Dlaczego lokalnie, a nie w chmurze

Trzy powody w kolejności priorytetu.

**Prywatność z architektury.** Korpus marki jest prywatny, dopóki czegoś z niego nie opublikuję. Embeddingi i reasoning na Macu oznaczają, że żaden ruch inference nie opuszcza laptopa. Nie ma przełącznika "obiecujemy że nie będziemy trenować na twoich danych", któremu trzeba by zaufać.

**Zero kosztu per query.** Apple Silicon ciągnie Qwen3-Embedding-0.6B i Qwen3-Next-80B lokalnie. Każde zapytanie kosztuje prąd z laptopa, nie tokeny w API. Przy stu-plus pytaniach w dniu pracy to ma większe znaczenie, niż brzmi.

**Latency.** Round-trip do hostowanego serwisu embeddingowego dokłada 100-300 ms na zapytanie. Lokalne MLX to 20-50 ms. Dla narzędzia, którego używa się interaktywnie, to różnica między flow a tarciem.

## Sześć tierów release'u w dwa dni

Tu wchodzi historia z velocity AI. Cały stack - od MVP po DX polish - został zshippowany między 30 kwietnia a 1 maja 2026 w trzech autonomicznych sesjach Claude Code.

| Tier | Tag | Wkład |
|---|---|---|
| 1 | v0.1.0 | MVP - Qdrant + MLX + 4 core narzędzia MCP + watcher |
| 1.1 | v0.1.1 | Polish - healthcheck, env-driven paths, perf |
| 2 | v0.2.0 | Hybrid search (BM25 + RRF) + cross-encoder rerank + 5 domain tools |
| 3 | v0.3.0 | Lokalny MLX LLM (Qwen3-Next-80B) + `/chat` + streaming SSE |
| 4 | v0.4.0 | Qwen3-Embedding-8B + tiered LLM router + multi-query agentic retrieval |
| 5 | v0.5.0 | DX - REPL CLI + inline citations + saved templates |

Każdy tier miał swój atomic plan, atomic commity i re-run quality gates: 213 testów przechodzi, mypy --strict na 70 plikach, ruff clean - zanim ruszamy dalej.

## Dlaczego source-available, a nie OSS (na razie)

Repo jest tu po to, żeby było transparentne - do czytania, do uczenia się, do uruchomienia lokalnie nad swoim korpusem. Czego nie można zrobić: forka komercyjnego ani wrapnięcia tego w hostowany produkt bez wcześniejszej rozmowy ze mną.

Formalna decyzja o licencji OSI (prawdopodobnie AGPL-3.0 albo coś podobnego) przyjdzie z ustrukturyzowanym launchem publicznym - kiedy dokumentacja, demo dataset i tutorial wdrożeniowy będą gotowe. To osobny sprint, zaplanowany na późniejszy kamień milowy.

## Co dalej

Pełen case study z tutorialem wdrożenia, demo korpusem Markdown i deep-dive po architekturze planuję jako odcinek serii "From the field" w nadchodzących tygodniach.

Na teraz: source jest na GitHubie, architektura w README, a wzorce są codziennie testowane na mojej własnej pracy nad marką.

## Linki

- [Repozytorium](https://github.com/darco81/sdet-brain)
- [Przegląd architektury](https://github.com/darco81/sdet-brain#architecture-high-level)
- [Sprint reports](https://github.com/darco81/sdet-brain/tree/main/docs/sprints)

---


## CDAT Pattern: 4 Layers, 3 Zero Rules, 9 Production Systems

**URL:** https://portfolio.sdet.it/articles/cdat-pattern-deep-dive
**Published:** 2026-05-03
**Language:** en
Tags: cdat, playwright, architecture, typescript, testing

**TL;DR:** CDAT is a 4-layer Playwright architecture (Components, Data, Actions, Tests) with 3 enforced zero-rules (no any, no waitForTimeout, no else). Battle-tested across 9 production systems over 2 years - sizes 6K to 18K LOC, 180 to 520 tests each, 3000+ tests in production across the portfolio. Live at cdat.sdet.it, MIT on GitHub.

A 2-year deep dive into the test architecture I extracted from real projects after POM stopped scaling. Components-Data-Actions-Tests, MIT licensed.

I've shipped Playwright tests in nine production systems over the last two years. Every single one of them started with classic Page Object Model. Every single one of them, by month four, had at least one Page Object that crossed 1,500 lines.

That's not a slogan. That's a pattern I watched repeat across B2B platforms, e-commerce, CRM, logistics, education, and automotive wholesale. Same language (TypeScript), same tooling (Playwright + axe-core), same outcome: POM scales until it doesn't, and the day it breaks is usually the day a junior tries to add their first test and can't find where the locator lives.

This article is what I extracted from those two years. Not a framework, not a library - a discipline. Four layers, three zero-rules, MIT licensed. The interactive version with toggle cards lives at [cdat.sdet.it](https://cdat.sdet.it). This piece is for people who want the long-form reasoning.

## Why POM breaks at scale

Page Object Model has been the de facto standard for over a decade. Most teams adopt it without thinking, the way you'd put pants on before leaving the house. And on small projects it works. The problems show up at scale, predictably, in four shapes.

### 1. God objects

A `CheckoutPage` class that started life with a fill-form-and-submit method ends up with fifty locators, thirty methods, and 1,200 lines of logic. The class has eaten the whole feature. Cyclomatic complexity is a vibe at this point. Tests reach into it for everything because everything *is* in there.

### 2. Mixed responsibilities

The same class holds locators, business logic, assertions, retry loops, fixture setup, and occasionally a console.log somebody forgot to delete. When something fails, the failure could come from any of those layers, and you can't tell which because they share a method.

```typescript
class LoginPage {
  async login(user: string, pass: string) {
    await this.usernameInput.fill(user);
    await this.passwordInput.fill(pass);
    await this.submitButton.click();
    await expect(this.dashboard).toBeVisible(); // Assertion in page object?
  }
}
```

That last line is the smell. The page object now has an opinion about what success looks like, which means a test that wants to assert *something else* has to either wrap the method or copy-paste it.

### 3. Poor reusability

When the partial flow is what you want - fill the form but don't submit, or submit and stay on the page - POM forces you to either expose half the internals or duplicate the method. Both are bad. Neither has a clean answer.

### 4. Maintenance nightmare

When a selector changes, you grep. When the business rule changes, you grep harder. When both change in the same week, you ship a regression because the page object's split personality made it easy to update one half without the other.

## What I tried first

Before CDAT I went through the usual menu. Two of them deserve a mention because both are real and one of them is genuinely good for some shapes.

**Screenplay Pattern.** Five abstractions: Actors, Abilities, Tasks, Questions, Interactions. The cleanliness is real, the cost is real too. A simple login becomes:

```typescript
const actor = Actor.named('User');
actor.attemptsTo(
  Navigate.to(LoginPage),
  Enter.theValue('user').into(UsernameField),
  Enter.theValue('pass').into(PasswordField),
  Click.on(SubmitButton),
  Wait.until(Dashboard, isVisible())
);
```

It reads beautifully. It also requires every team member to internalize five layers, and on most projects that's overkill. I shipped it once. The retro was unanimous: too many files for too little benefit.

**Facade + Delegation.** I wrote about this in [Facade and Delegation Pattern](/articles/facade-pattern-and-delegation) - splitting a 1,500-line `SaleActions` class into a thin facade that delegates to focused sub-modules. It works. It's still my recommendation when you can't change the framework but you can refactor inside it. CDAT is the next step: stop splitting one bad pattern, start with four good ones.

## The four layers

CDAT is structurally simple. Four files per feature, each with a single job, each with explicit dependency rules.

```
features/
├── login/
│   ├── components.ts    # C - Locators only
│   ├── data.ts          # D - Types & test data
│   ├── actions.ts       # A - Business logic
│   └── test.ts          # T - Scenarios & assertions
```

Layer dependency rules:

```
Components → nothing
Data       → nothing
Actions    → Components + Data
Tests      → Components + Data + Actions
```

Lower layers never depend on higher layers. That single rule is what gives you the reusability POM cannot.

### Components - locators only

```typescript
export class LoginComponents {
  readonly usernameInput: Locator;
  readonly passwordInput: Locator;
  readonly submitButton: Locator;

  constructor(private readonly page: Page) {
    this.usernameInput = page.getByLabel('Username');
    this.passwordInput = page.getByLabel('Password');
    this.submitButton = page.getByRole('button', { name: 'Sign in' });
  }
}
```

No business logic. No waits. No assertions. If you have a `click()` here, you've already started a `CheckoutPage` god-object in disguise.

### Data - types and constants

```typescript
export interface LoginCredentials {
  username: string;
  password: string;
  rememberMe?: boolean;
}

export const VALID_USER: LoginCredentials = {
  username: 'testuser',
  password: 'Password123!',
};
```

Pure data. No locators. No DOM. This file is what your AI assistant reads first when it tries to understand the feature, and the file you import in unit tests too if you have any.

### Actions - business logic, no assertions

```typescript
export class LoginActions {
  private readonly components: LoginComponents;

  constructor(page: Page) {
    this.components = new LoginComponents(page);
  }

  async fillUsername(username: string): Promise<void> {
    await Cdat.waitAndFill(this.components.usernameInput, username);
  }

  async login(credentials: LoginCredentials): Promise<void> {
    await this.fillUsername(credentials.username);
    await this.fillPassword(credentials.password);
    await this.clickSubmit();
  }

  // State getters return data, they don't assert
  async getErrorMessage(): Promise<string> {
    return Cdat.waitForText(this.components.errorMessage);
  }
}
```

Actions compose atomic steps into business flows. No `expect()` calls - those belong in tests. State getters return data, never assertions.

### Tests - scenarios and assertions

```typescript
test('Given valid credentials, When login, Then dashboard shown', async ({ page }) => {
  const actions = new LoginActions(page);
  await page.goto('/login');

  await actions.login(VALID_USER);

  await expect(page).toHaveURL(/\/dashboard/);
});
```

Every `expect()` in the codebase lives here. Scenario name reads as Given-When-Then. The test file is the only place where business intent and assertion live together.

This is **vertical slice architecture** - you organize by feature (`login/`, `cart/`, `checkout/`), not by layer (`pages/`, `actions/`, `tests/`). When you delete a feature, you delete a folder. When you onboard a junior, you point at one folder and say "everything you need is here".

## Three zero rules

Four layers tell you *where* code goes. Three zero-rules tell you *how* to write it. They sound dogmatic. They're enforced because every time I let one slide, I paid for it later.

### Zero `any`

```typescript
// ❌ Bad
async getProductData(): Promise<any> {
  return this.fetchProduct();
}

// ✅ Good
async getProductData(): Promise<ProductData> {
  return this.fetchProduct();
}
```

`any` defeats TypeScript. A test that uses `product.proce` (typo) compiles and runs and silently asserts nothing useful. ESLint rule `@typescript-eslint/no-explicit-any: error` makes this non-negotiable.

### Zero `waitForTimeout`

```typescript
// ❌ Bad
await page.waitForTimeout(5000);
await button.click();

// ✅ Good
await Cdat.waitAndClick(button);
```

Hardcoded timeouts are the number one cause of flaky tests. Too short → random fail. Too long → slow suite. CI is slower than your laptop, so what works locally fails on Monday morning. Smart waits target the actual condition, not a stopwatch.

The `Cdat` utility class (10 lines, zero dependencies, ships with the pattern) bundles the common cases: `waitAndClick`, `waitAndFill`, `waitForState`, `checkState`, `waitForText`. Five methods cover ~95% of what people reach for `waitForTimeout` to do.

### Zero `else`

```typescript
// ❌ Bad - pyramid of doom
async processCheckout(data: CheckoutData) {
  if (data.email) {
    if (data.address) {
      if (data.payment) {
        await this.submitOrder();
        return true;
      } else { return false; }
    } else { return false; }
  } else { return false; }
}

// ✅ Good - early returns
async processCheckout(data: CheckoutData): Promise<void> {
  if (!data.email) throw new Error('Email is required');
  if (!data.address) throw new Error('Shipping address is required');
  if (!data.payment) throw new Error('Payment method is required');

  await this.submitOrder();
}
```

Early returns flatten the code. Each precondition is explicit. The happy path lives at the bottom of the function with zero indentation. Debugging is faster because the failure tells you which guard fired, not which branch you fell through.

## What CDAT is not

This is where I want to be honest, because the worst case studies oversell.

CDAT is a **structure for E2E tests in Playwright**. It's not:

- A BDD framework. If you want Gherkin features and step definitions, use Cucumber. CDAT lives below that - you can layer Cucumber on top, but the pattern is for the test code itself.
- A visual regression tool. CDAT pairs well with Playwright's screenshot diffing or third-party tools, but the pattern doesn't help you decide *what* to snapshot.
- A unit-testing pattern. Unit tests don't have a DOM, so the components layer disappears. Use Vitest or Jest with their own conventions.
- Magic. Discipline scales. Patterns don't refactor themselves. If your team won't enforce the zero-rules, you'll end up with a 1,500-line `actions.ts` instead of a 1,500-line page object.

CDAT is also opinionated about TypeScript. There's a JavaScript version that works, but the type safety is half the point. If your project is JS-only, the zero-`any` rule is moot, and you lose roughly a third of the benefit.

## Nine systems, two years

The pattern's track record, anonymized but real:

| System type | LOC | Tests | Months in production |
|---|---:|---:|---:|
| B2B Platform A | 12K | 340 | 14 |
| E-commerce | 18K | 520 | 18 |
| CRM | 9K | 280 | 12 |
| Event Management | 7K | 210 | 10 |
| Education | 11K | 350 | 15 |
| Invoicing | 6K | 180 | 11 |
| Logistics | 14K | 410 | 16 |
| Automotive Wholesale | 13K | 380 | 13 |
| B2B Platform B | 10K | 300 | 9 |

That's **3000+ tests in production across the portfolio** - averaging ~330 per system, ranging from 180 (Invoicing) to 520 (E-commerce 1). The pattern doesn't push you to write more tests; it just stops the ones you do write from rotting.

What changed across all of them, qualitatively:

- **Junior onboarding** dropped from "two to three weeks until they can ship a test" to "one to three days". The folder structure is self-documenting; the four files per feature give them a template they can copy and edit.
- **Flakiness fell to near-zero.** Removing `waitForTimeout` is the single highest-leverage change. The remaining flakes are usually genuine product bugs.
- **Refactor friction dropped** because the dependency graph is one-way. Change a selector? Edit `components.ts`. Change a flow? Edit `actions.ts`. The test file rarely needs to move.
- **PR review time dropped** because reviewers know which file holds which concern. "This assertion doesn't belong here" becomes a one-line comment instead of a discussion.

I'm not going to pretend the pattern made every project succeed. Two of the nine had unrelated business reasons that killed them. The other seven are still running CDAT-shaped tests today.

## Migration: one feature at a time

If you have a POM codebase and you want to move, the migration path I've used five times:

1. **Pick a small feature** - login, search, anything self-contained.
2. **Extract `components.ts` first** - copy locators out of the page object, keep them as `readonly`.
3. **Extract `data.ts`** - pull credentials, URLs, error message constants.
4. **Rewrite as `actions.ts`** - move methods, strip out `expect()` calls into a separate test file.
5. **Run both** - keep the old POM test passing while the new CDAT test ships. Delete the POM file once you trust the new one.
6. **Repeat**.

The pattern doesn't require a big-bang rewrite. The first feature takes a day. The fifth takes an hour. By the tenth, your team has internalized it and the rest of the codebase falls out naturally.

The full migration playbook with worked examples lives in the [Migration Guide](https://cdat.sdet.it/docs/migration).

## Resources

Everything CDAT-shaped is at [cdat.sdet.it](https://cdat.sdet.it):

- **[Quick Start](https://cdat.sdet.it/docs/quick-start)** - minimum-viable login feature in 4 files
- **[The 4 Layers](https://cdat.sdet.it/docs/architecture)** - every layer with full code
- **[Three Zero Rules](https://cdat.sdet.it/docs/zero-rules)** - interactive bad/good toggles
- **[Examples](https://cdat.sdet.it/examples)** - basic, advanced, and POM→CDAT migration
- **[GitHub](https://github.com/darco81/cdat-pattern)** - MIT licensed, examples, `@cdat/utils` package

There's also an MCP server at `https://cdat.sdet.it/mcp` if you want to ask Claude or Cursor about CDAT directly. Add it to your client config:

```json
{
  "mcpServers": {
    "cdat": {
      "url": "https://cdat.sdet.it/mcp"
    }
  }
}
```

The site itself uses CDAT for its own E2E tests - 124 of them across chromium, firefox, webkit, and mobile-chrome. Self-referential evidence the pattern scales to projects where you don't get to write the tests first.

## When to use it, when to skip

Use CDAT when:

- You're starting a new Playwright project of any size.
- You're migrating from Selenium or Cypress and want a structure that won't repeat the POM problems.
- Your team is more than two people, or it will be within a year.

Skip CDAT when:

- You're writing five tests against a static landing page. POM is fine. Anything is fine.
- You're working in a JavaScript-only project and don't plan to introduce TypeScript. You'll lose half the benefit.
- Your team has internalized Screenplay and ships fast with it. Don't break what works.

The pattern is a tool, not a religion. The reason I use it across nine systems is that it's the only thing I've found that survives both rapid feature work *and* the fourteen-month mark when the original author has moved teams. Both matter. Both are hard to optimize for at the same time.

If you've shipped Playwright at scale and have a different answer that works, I want to read it. The whole point of putting two years in a single article is that someone else's two years might be the next iteration.

---


## When the Extension Host Refuses to Cooperate - How We Built Claude VSCode Controller for Linux

**URL:** https://portfolio.sdet.it/articles/claude-vscode-controller
**Published:** 2025-07-08
**Language:** en
Tags: ai-tooling, vscode, mcp, claude, websocket

Real-time integration between Claude Desktop AI and VSCode using Extension API, WebSocket, and MCP. Linux Extension Host compatibility deep dive.

_A story about how desperate need, a few sleepless hours, and sheer determination led to a breakthrough in AI-IDE integration._

Have you ever needed something so badly that you were willing to build it from scratch? That’s exactly what happened to me. This isn’t just a technical case study-it’s a tale of how the most frustrating problems can lead to the most interesting solutions.

## The Origin of the Problem - Why I Even Started

Picture this: you have a perfectly working Claude VSCode Controller on Windows, letting AI control your IDE with natural commands. Everything works like a dream-“open file,” “create component,” “run tests”-and then you switch to Ubuntu 24. You expect everything to work the same... and that’s when the developer nightmare begins.

```
❌ Extension Host (LocalProcess pid: 16296) is unresponsive
❌ Failed to load resource: the server responded with a status of 404 ()
❌ UNRESPONSIVE extension host: starting to profile NOW
```

The first launch on Linux ended with a spectacular Extension Host crash. Claude Desktop could see the MCP server, but the bridge to VSCode refused to start. After three hours of debugging, I thought, “Maybe it’s a bug in this particular VSCode version.” Spoiler: it wasn’t.

## First Line of Defense - The Standard Approach

Let’s start from the beginning. Claude VSCode Controller connects Claude Desktop to VSCode via:

- **MCP Server** (Model Context Protocol) - communicates with Claude Desktop
- **VSCode Extension** - a bridge using WebSocket on port 3333
- **WebSocket Communication** - real-time link between MCP and VSCode

On Windows, everything worked flawlessly. But Linux... Linux had its own opinion.

### Anatomy of the Problem

The first warning signs were in the Extension Host logs:

```typescript
// This worked on Windows:
import { WebSocketServer, WebSocket } from 'ws';

// But on Linux, Extension Host screamed:
// Cannot find module 'ws'
// Extension activation failed
```

“Alright,” I thought, “classic dependency issue.” I checked `node_modules`, reinstalled everything, cleared the cache. Nothing. Extension Host kept crashing when trying to load the WebSocket module.

## Down the Rabbit Hole - Detective Work

After hours of frustration, I dug deeper. It turned out that the Extension Host on Linux has... let’s say... a “unique” approach to loading external modules, especially those with native bindings like WebSocket.

### Eureka #1: ES6 vs CommonJS

The first breakthrough was discovering that Linux Extension Host struggles with ES6 imports for external dependencies:

```typescript
// ❌ Crashes Extension Host on Linux:
import { WebSocketServer } from 'ws';
let wss: WebSocketServer | null = null;
wss.on('connection', (ws) => { ... }); // BOOM!

// ✅ Stable:
const { WebSocketServer } = require('ws');
```

But that was just the first piece of the puzzle. TypeScript wasn’t happy with this approach...

### Eureka #2: TypeScript Strict Null Checks

```typescript
// ❌ TypeScript error:
let wss: WebSocketServer | null = null;
wss.on('connection', (ws) => { ... }); // Error: wss is possibly null

// ✅ Elegant solution:
const wsServer = new WebSocketServer({ port: 3333 });
wss = wsServer; // Store reference for cleanup
wsServer.on('connection', (ws: WS) => { ... });
```

### Eureka #3: The Hybrid Approach

The final solution was combining TypeScript type imports with runtime requires:

```typescript
// Import types for TypeScript (compile-time only)
import type { WebSocketServer as WSServer, WebSocket as WS } from 'ws';

// Use require at runtime (Linux-compatible)
const { WebSocketServer, WebSocket } = require('ws');

let wss: WSServer | null = null;
```

## Building the Solution - From Chaos to Order

Once I understood the nature of the problem, it was time for a systematic fix. I created `fix-extension-linux.sh`-a script to automate the entire repair process:

### Step 1: TypeScript Configuration

```json
// tsconfig.json - switch to CommonJS
{
  "compilerOptions": {
    "module": "commonjs", // Changed from "es2020"
    "strict": false // Relaxed for external modules
    // ...
  }
}
```

### Step 2: Dependency Bundling

Instead of relying on VSCode marketplace resolution, we bundle dependencies directly into the extension directory:

```bash
# Bundle ws directly in the extension directory
mkdir -p ~/.vscode/extensions/claude-mcp-controller/node_modules
cp -r node_modules/ws ~/.vscode/extensions/claude-mcp-controller/node_modules/
```

### Step 3: Automated Testing

```bash
#!/bin/bash
# test-extension-comprehensive.sh

echo "🧪 Testing VSCode Extension (Linux safe mode)..."

# Test 1: Check if extension files exist
if [ -f "$HOME/.vscode/extensions/claude-mcp-controller/out/extension.js" ]; then
    echo "✅ Extension files exist"
else
    echo "❌ Extension files missing"
    exit 1
fi

# Test 2: Check WebSocket module
if [ -d "$HOME/.vscode/extensions/claude-mcp-controller/node_modules/ws" ]; then
    echo "✅ WebSocket module found"
else
    echo "❌ WebSocket module missing"
fi

# Test 3: Check for require usage
if grep -q "require.*ws" "$HOME/.vscode/extensions/claude-mcp-controller/out/extension.js"; then
    echo "✅ Uses require('ws') - Linux compatible"
else
    echo "❌ Extension may have compatibility issues"
fi
```

## The Moment of Truth - Testing the Solution

After implementing all the fixes, it was time for the moment of truth. Restart VSCode, activate the extension... and...

```
✅ 🤖 Claude MCP: Online
✅ Extension Host: claude-mcp-controller activated successfully
✅ WebSocket server listening on port 3333
✅ Bridge connection established
```

**IT WORKED!** 🎉

First test - “Show me workspace info” in Claude Desktop:

```json
{
  "hasWorkspace": true,
  "folders": [
    {
      "name": "claude-vscode-controller",
      "path": "/home/dk/claude-vscode-controller"
    }
  ],
  "activeEditor": {
    "fileName": "/home/dk/claude-vscode-controller/package.json",
    "language": "json",
    "lineCount": 67
  }
}
```

**Claude Desktop had full control over VSCode on Linux!**

## Lessons Learned from the Trenches

### Technical Takeaways

1. **Extension Host on Linux is stricter** than on Windows/macOS regarding external modules
2. **Hybrid import strategy** (TypeScript types + runtime require) is an elegant cross-platform solution
3. **Bundling dependencies** in the extension directory eliminates marketplace resolution issues
4. **CommonJS compilation** is more stable than ES6 modules for VSCode extensions on Linux

### Soft Skills

1. **Persistence pays off** - the problem seemed unsolvable for the first few hours
2. **Systematic debugging** - step by step, log by log, until you reach the root cause
3. **Never give up** - if something doesn’t work, you can fix it or build it from scratch
4. **Documentation matters** - every solution is worth documenting for others

## Technical Deep Dive - For the Curious

### Architecture Diagram

```
┌─────────────────┐    WebSocket     ┌──────────────────┐
│   Claude        │◄────────────────►│   VSCode         │
│   Desktop       │    Port 3333     │   Extension      │
│                 │                  │   (Bridge)       │
└─────────────────┘                  └──────────────────┘
         ▲                                     ▲
         │ MCP Protocol                        │ VSCode API
         ▼                                     ▼
┌─────────────────┐                  ┌──────────────────┐
│   Enhanced      │                  │   VSCode         │
│   MCP Server    │                  │   Editor         │
└─────────────────┘                  └──────────────────┘
```

### Key Technical Components

**1. MCP Server (enhanced-mcp-server.js)**

```javascript
// Translates Claude commands to VSCode bridge calls
case "vscode_create_file":
  return await this.sendVSCodeCommand('createFile', {
    filePath: args.filePath,
    content: args.content
  });
```

**2. VSCode Extension Bridge**

```typescript
// Handles WebSocket communication and VSCode API calls
async function handleCommand(command: any, ws: WS) {
  switch (command.method) {
    case 'createFile':
      result = await createFile(command.params.filePath, command.params.content);
      break;
    case 'getWorkspaceInfo':
      result = getWorkspaceInfo();
      break;
    // ... 30+ more commands
  }

  ws.send(
    JSON.stringify({
      id: command.id,
      result: result,
    }),
  );
}
```

**3. Linux Compatibility Layer**

```typescript
// Hybrid module loading for Linux compatibility
import type { WebSocketServer as WSServer, WebSocket as WS } from 'ws';
const { WebSocketServer, WebSocket } = require('ws');

function startMCPBridge() {
  const wsServer = new WebSocketServer({ port: 3333 });
  wss = wsServer; // Store reference for cleanup

  wsServer.on('connection', (ws: WS) => {
    // Handle Claude Desktop connection
  });
}
```

## What’s Next - Project Development

Now that Linux support is a reality, it’s time for further development:

### Immediate Roadmap

- **Performance optimizations** for Linux Extension Host
- **Testing on additional Linux distributions** (Fedora, Arch, openSUSE)
- **Advanced debugging tools** for easier troubleshooting

### Long-term Vision

- **Multi-workspace support** - handling multiple VSCode instances
- **Plugin system** - allowing custom command extensions
- **Remote development** - support for VSCode remote instances

## Conclusions - More Than Just a Technical Fix

This story is more than just a technical problem-solving case. It’s a reminder of a few fundamental truths in the world of development:

### 1. “Impossible” doesn’t mean “can’t be done”

When Claude VSCode Controller didn’t work on Linux, I could have given up and stuck with Windows. Instead, I thought: “What if I can fix it?” Often, the most valuable things are born from frustration and necessity.

### 2. Community matters

The Extension Host crash issue on Linux affects many developers. Creating a solution and sharing it with others is a win-win. We help others and build our own reputation.

### 3. Document everything

Every solution born in the dark hours of debugging is worth documenting. Future me (and other developers) will be grateful.

### 4. Embrace the chaos

Some of the best solutions are born from seemingly hopeless situations. Extension Host crashes felt like the end of the world, but became the start of a fascinating journey in cross-platform development.

## Epilogue - From Zero to Hero

Today, after those chaotic hours of debugging, Claude VSCode Controller on Linux is not only working, but also serving the global Linux developer community. The project has professional documentation, automated installation tools, and a comprehensive testing framework.

What’s more, the technical insights from this project are helping others facing similar challenges.

But the most important lesson is this: **if you need something that doesn’t exist-build it**. The world of software development is full of opportunities for those willing to dig deeper, debug longer, and not give up when things get tough.

Sometimes, the best solutions come from the messiest problems. Extension Host crashes were a nightmare, but became the foundation for something much bigger than I originally planned.

---

_PS: If you ever have trouble with the VSCode Extension Host on Linux, remember-there’s always a way. Sometimes you just need to get creative with TypeScript, modules, and WebSocket imports. Happy debugging! 🐧🚀_

## Links & Resources

- **GitHub Repository**: [claude-vscode-controller](https://github.com/darco81/claude-vscode-controller)
- **Linux Branch**: [Linux-optimized version](https://github.com/darco81/claude-vscode-controller/tree/linux)
- **Installation Guide**: [Complete Linux setup](https://github.com/darco81/claude-vscode-controller/blob/main/LINUX.md)
- **Technical Deep Dive**: [Success story documentation](https://github.com/darco81/claude-vscode-controller/blob/main/SUKCES_LINUX.md)

---


## Gdy Extension Host odmawia posłuszeństwa - jak stworzyliśmy Claude VSCode Controller na Linux

**URL:** https://portfolio.sdet.pl/articles/claude-vscode-controller
**Published:** 2025-07-08
**Language:** pl
Tags: ai-tooling, vscode, mcp, claude, websocket

Integracja czasu rzeczywistego między Claude Desktop AI a VSCode - Extension API, WebSocket, MCP. Rozwiązywanie problemów Linux Extension Host.

_Opowieść o tym, jak desperacka potrzeba, kilka mrożących krew w żyłach godzin i chęć niepodawania się doprowadziły do przełomu w integracji AI z IDE_

Czy kiedykolwiek miałeś moment, gdy potrzebujesz czegoś tak bardzo, że gotów jesteś to stworzyć od zera? Ja właśnie przeżyłem taki moment. Historia, którą wam opowiem, to nie tylko techniczne studium przypadku, ale przede wszystkim opowieść o tym, że czasem najbardziej frustrujące problemy prowadzą do najciekawszych rozwiązań.

## Geneza problemu, czyli dlaczego w ogóle zacząłem

Wyobraźcie sobie sytuację: masz świetnie działający Claude VSCode Controller na Windows, który pozwala AI sterować twoim IDE przez naturalne komendy. Wszystko działa jak marzenie - "otwórz plik", "stwórz komponent", "uruchom testy" - i nagle przenosisz się na Linux Ubuntu 24. Oczekujesz, że wszystko będzie działać tak samo... i tu zaczyna się koszmar programisty.

```
❌ Extension Host (LocalProcess pid: 16296) is unresponsive
❌ Failed to load resource: the server responded with a status of 404 ()
❌ UNRESPONSIVE extension host: starting to profile NOW
```

Pierwsze uruchomienie na Linux skończyło się spektakularnym crashem Extension Host. Claude Desktop widział MCP server, ale most do VSCode nie chciał się uruchomić. Po trzech godzinach debugowania myślałem sobie: "No dobra, może to jakiś błąd w tej konkretnej wersji VSCode". Spoiler: nie był.

## Pierwsza linia obrony - standardowe podejście

Zacznijmy od początku. Claude VSCode Controller to projekt, który łączy Claude Desktop z VSCode przez:

- **MCP Server** (Model Context Protocol) - komunikujący się z Claude Desktop
- **Rozszerzenie VSCode** - most używający WebSocket na porcie 3333
- **Komunikację WebSocket** - połączenie w czasie rzeczywistym między MCP a VSCode

Na Windows wszystko działało bez zarzutu. Ale Linux... Linux miał swoje zdanie na ten temat.

### Anatomia problemu

Pierwszym sygnałem ostrzegawczym były logi Extension Host:

```typescript
// To działało na Windows:
import { WebSocketServer, WebSocket } from 'ws';

// Ale na Linux Extension Host krzyczał:
// Cannot find module 'ws'
// Extension activation failed
```

"No dobra", pomyślałem, "klasyczny problem z zależnościami". Sprawdziłem `node_modules`, reinstalowałem wszystko, czyściłem cache. Nic. Extension Host nadal się wywalał przy próbie załadowania modułu WebSocket.

## Głębiej w króliczą norę - śledztwo

Po kilku godzinach frustracji zacząłem kopać głębiej. Okazało się, że Extension Host na Linux ma... powiedzmy sobie delikatnie... "specyficzne" podejście do ładowania zewnętrznych modułów. Szczególnie tych, które używają natywnych powiązań, jak WebSocket.

### Eureka #1: ES6 vs CommonJS

Pierwszym przełomem było odkrycie, że Linux Extension Host ma problemy z importami ES6 przy zewnętrznych zależnościach:

```typescript
// ❌ Crashuje Extension Host na Linux:
import { WebSocketServer } from 'ws';
let wss: WebSocketServer | null = null;
wss.on('connection', (ws) => { ... }); // BOOM!

// ✅ Działa stabilnie:
const { WebSocketServer } = require('ws');
```

Ale to była dopiero pierwsza część układanki. TypeScript nie był zadowolony z takiego podejścia...

### Eureka #2: TypeScript strict null checks

```typescript
// ❌ Błąd TypeScript:
let wss: WebSocketServer | null = null;
wss.on('connection', (ws) => { ... }); // Error: wss is possibly null

// ✅ Eleganckie rozwiązanie:
const wsServer = new WebSocketServer({ port: 3333 });
wss = wsServer; // Przechowujemy referencję do sprzątania
wsServer.on('connection', (ws: WS) => { ... });
```

### Eureka #3: Podejście hybrydowe

Finalnym rozwiązaniem okazało się połączenie importów typów TypeScript z require na etapie działania:

```typescript
// Import typów dla TypeScript (tylko na etapie kompilacji)
import type { WebSocketServer as WSServer, WebSocket as WS } from 'ws';

// Użycie require w runtime (zgodne z Linux)
const { WebSocketServer, WebSocket } = require('ws');

let wss: WSServer | null = null;
```

## Budowanie rozwiązania - od chaosu do porządku

Gdy już zrozumiałem naturę problemu, czas było na systematyczne rozwiązanie. Stworzyłem `fix-extension-linux.sh` - skrypt, który automatyzuje cały proces naprawy:

### Krok 1: Konfiguracja TypeScript

```json
// tsconfig.json - przełączenie na CommonJS
{
  "compilerOptions": {
    "module": "commonjs", // Zmienione z "es2020"
    "strict": false // Poluzowane dla zewnętrznych modułów
    // ...
  }
}
```

### Krok 2: Bundlowanie zależności

Zamiast polegać na rozwiązywaniu zależności przez marketplace VSCode, bundlujemy zależności bezpośrednio w katalogu rozszerzenia:

```bash
# Bundluj ws bezpośrednio w katalogu rozszerzenia
mkdir -p ~/.vscode/extensions/claude-mcp-controller/node_modules
cp -r node_modules/ws ~/.vscode/extensions/claude-mcp-controller/node_modules/
```

### Krok 3: Automatyczne testowanie

```bash
#!/bin/bash
# test-extension-comprehensive.sh

echo "🧪 Testowanie rozszerzenia VSCode (tryb bezpieczny Linux)..."

# Test 1: Czy pliki rozszerzenia istnieją
if [ -f "$HOME/.vscode/extensions/claude-mcp-controller/out/extension.js" ]; then
    echo "✅ Pliki rozszerzenia istnieją"
else
    echo "❌ Brak plików rozszerzenia"
    exit 1
fi

# Test 2: Czy moduł WebSocket jest obecny
if [ -d "$HOME/.vscode/extensions/claude-mcp-controller/node_modules/ws" ]; then
    echo "✅ Moduł WebSocket znaleziony"
else
    echo "❌ Brak modułu WebSocket"
fi

# Test 3: Czy używany jest require
if grep -q "require.*ws" "$HOME/.vscode/extensions/claude-mcp-controller/out/extension.js"; then
    echo "✅ Używa require('ws') - zgodne z Linux"
else
    echo "❌ Rozszerzenie może mieć problemy ze zgodnością"
fi
```

## Moment prawdy - testowanie rozwiązania

Po wdrożeniu wszystkich poprawek, czas na moment prawdy. Restart VSCode, aktywacja rozszerzenia... i...

```
✅ 🤖 Claude MCP: Online
✅ Extension Host: claude-mcp-controller uruchomiony poprawnie
✅ WebSocket server nasłuchuje na porcie 3333
✅ Połączenie mostka nawiązane
```

**DZIAŁA!** 🎉

Pierwszy test - "Pokaż informacje o workspace" w Claude Desktop:

```json
{
  "hasWorkspace": true,
  "folders": [
    {
      "name": "claude-vscode-controller",
      "path": "/home/dk/claude-vscode-controller"
    }
  ],
  "activeEditor": {
    "fileName": "/home/dk/claude-vscode-controller/package.json",
    "language": "json",
    "lineCount": 67
  }
}
```

**Claude Desktop miał pełną kontrolę nad VSCode na Linux!**

## Lekcje wyciągnięte z okopów

### Techniczne wnioski

1. **Extension Host na Linux jest bardziej restrykcyjny** niż na Windows/macOS w kwestii zewnętrznych modułów
2. **Hybrydowa strategia importu** (typy TypeScript + require w runtime) to eleganckie rozwiązanie dla zgodności między systemami
3. **Bundlowanie zależności** w katalogu rozszerzenia eliminuje problemy z rozwiązywaniem przez marketplace
4. **Kompilacja do CommonJS** jest stabilniejsza niż ES6 modules dla rozszerzeń VSCode na Linux

### Wnioski miękkie

1. **Wytrwałość popłaca** - problem wydawał się nierozwiązywalny przez pierwsze kilka godzin
2. **Systematyczne debugowanie** - krok po kroku, log po logu, aż do źródła problemu
3. **Nie poddawaj się** - jak coś nie działa, można to naprawić lub stworzyć od nowa
4. **Dokumentacja ma znaczenie** - każde rozwiązanie warto udokumentować dla innych

## Techniczne szczegóły - dla dociekliwych

### Schemat architektury

```
┌─────────────────┐    WebSocket     ┌──────────────────┐
│   Claude        │◄────────────────►│   VSCode         │
│   Desktop       │    Port 3333     │   Extension      │
│                 │                  │   (Bridge)       │
└─────────────────┘                  └──────────────────┘
         ▲                                     ▲
         │ MCP Protocol                        │ VSCode API
         ▼                                     ▼
┌─────────────────┐                  ┌──────────────────┐
│   Enhanced      │                  │   VSCode         │
│   MCP Server    │                  │   Editor         │
└─────────────────┘                  └──────────────────┘
```

### Kluczowe komponenty techniczne

**1. MCP Server (enhanced-mcp-server.js)**

```javascript
// Tłumaczy komendy Claude na wywołania mostka VSCode
case "vscode_create_file":
  return await this.sendVSCodeCommand('createFile', {
    filePath: args.filePath,
    content: args.content
  });
```

**2. Mostek rozszerzenia VSCode**

```typescript
// Obsługuje komunikację WebSocket i wywołania API VSCode
async function handleCommand(command: any, ws: WS) {
  switch (command.method) {
    case 'createFile':
      result = await createFile(command.params.filePath, command.params.content);
      break;
    case 'getWorkspaceInfo':
      result = getWorkspaceInfo();
      break;
    // ... 30+ innych komend
  }

  ws.send(
    JSON.stringify({
      id: command.id,
      result: result,
    }),
  );
}
```

**3. Warstwa zgodności z Linux**

```typescript
// Hybrydowe ładowanie modułów dla zgodności z Linux
import type { WebSocketServer as WSServer, WebSocket as WS } from 'ws';
const { WebSocketServer, WebSocket } = require('ws');

function startMCPBridge() {
  const wsServer = new WebSocketServer({ port: 3333 });
  wss = wsServer; // Przechowujemy referencję do sprzątania

  wsServer.on('connection', (ws: WS) => {
    // Obsługa połączenia Claude Desktop
  });
}
```

## Co dalej - rozwój projektu

Teraz, gdy wsparcie dla Linux jest rzeczywistością, czas na dalszy rozwój:

### Najbliższe plany

- **Optymalizacje wydajności** dla Extension Host na Linux
- **Testy na innych dystrybucjach** (Fedora, Arch, openSUSE)
- **Zaawansowane narzędzia debugowania** dla łatwiejszego rozwiązywania problemów

### Dalsza wizja

- **Obsługa wielu workspace'ów** - wsparcie dla kilku instancji VSCode
- **System pluginów** - możliwość rozszerzania komend
- **Praca zdalna** - wsparcie dla VSCode w trybie zdalnym

## Wnioski - więcej niż tylko techniczne rozwiązanie

Ta historia to więcej niż opis rozwiązania problemu technicznego. To przypomnienie o kilku fundamentalnych prawdach w świecie programowania:

### 1. "Nie ma" nie znaczy "nie można"

Gdy Claude VSCode Controller nie działał na Linux, mogłem się poddać i zostać przy Windows. Zamiast tego pomyślałem: "A co jeśli da się to naprawić?". Często najbardziej wartościowe rzeczy powstają z frustracji i potrzeby.

### 2. Społeczność ma znaczenie

Problem z Extension Host na Linux dotyka wielu programistów. Stworzenie rozwiązania i podzielenie się nim z innymi to korzyść dla wszystkich. Pomagamy innym, a przy okazji budujemy swoją reputację.

### 3. Dokumentuj wszystko

Każde rozwiązanie, które powstało w trudnych godzinach debugowania, warto opisać. Przyszłe ja (i inni programiści) będą wdzięczni.

### 4. Oswój chaos

Niektóre z najlepszych rozwiązań rodzą się z pozornie beznadziejnych sytuacji. Problemy z Extension Host wydawały się końcem świata, a stały się początkiem fascynującej drogi w programowaniu wieloplatformowym.

## Epilog - od zera do bohatera

Dziś, po tych chaotycznych godzinach debugowania, Claude VSCode Controller na Linux nie tylko działa, ale też służy społeczności programistów korzystających z tego systemu. Projekt ma profesjonalną dokumentację, automatyczne narzędzia instalacyjne i rozbudowany system testów.

Co więcej, wnioski techniczne z tego projektu pomagają innym w podobnych wyzwaniach.

Ale najważniejsza lekcja jest taka: **gdy potrzebujesz czegoś, czego nie ma - stwórz to**. Świat programowania jest pełen możliwości dla tych, którzy są gotowi szukać głębiej, debugować dłużej i nie poddawać się, gdy pojawiają się trudności.

Często najlepsze rozwiązania rodzą się z największego chaosu. Problemy z Extension Host były koszmarem, ale stały się fundamentem czegoś znacznie większego, niż początkowo planowałem.

---

_PS: Jeśli kiedykolwiek będziesz miał problem z Extension Host w VSCode na Linux, pamiętaj - zawsze jest jakieś wyjście. Czasem wystarczy kreatywność z TypeScriptem, modułami i importami WebSocket. Powodzenia w debugowaniu! 🐧🚀_

## Linki i zasoby

- **Repozytorium GitHub**: [claude-vscode-controller](https://github.com/darco81/claude-vscode-controller)
- **Gałąź Linux**: [Wersja zoptymalizowana pod Linux](https://github.com/darco81/claude-vscode-controller/tree/linux)
- **Instrukcja instalacji**: [Kompletny setup na Linux](https://github.com/darco81/claude-vscode-controller/blob/main/LINUX.md)
- **Opis sukcesu technicznego**: [Dokumentacja sukcesu](https://github.com/darco81/claude-vscode-controller/blob/main/SUKCES_LINUX.md)

---


## Assertions in Playwright: When Do You Actually Need await?

**URL:** https://portfolio.sdet.it/articles/playwright-assertions-when-you-need-await
**Published:** 2025-05-20
**Language:** en
Tags: playwright, testing, assertions

Technical analysis of using the await keyword with assertions in Playwright. Debunks the common myth that all assertions require await.

## Introduction

In the world of automated testing for web applications, Playwright has become one of the most popular tools. However, even experienced QA engineers may have doubts about the correct use of assertions, especially when it comes to using the **_await_** keyword. I often hear the opinion that "every assertion in Playwright requires await, otherwise the test will be unstable." Is this really the case?

In this article, I'll debunk this myth and explain when **_await_** is essential and when it's completely unnecessary-all backed by official documentation and code analysis.

## Two Types of Assertions in Playwright

Let's start with the key information: Playwright has two fundamentally different types of assertions:

### 1. Auto-Retrying Assertions (requiring **_await_**)

These assertions are **asynchronous** and automatically retried. They will try to verify the condition multiple times until it is met or the timeout expires (default is 5 seconds). Examples:

```typescript
await expect(locator).toBeVisible();
await expect(page).toHaveURL(expectedUrl);
await expect(locator).toHaveText('Expected text');
```

The documentation clearly states that "retrying assertions are async, so you must await them."

### 2. Non-Retrying Assertions (not requiring **_await_**)

These assertions work synchronously, verifying values that we already have in memory:

```typescript
expect(value).toBe(5);
expect(array).toContain('element');
expect(object).toHaveProperty('name');
```

These assertions **do not require \***await**\*** because they don't perform any asynchronous operations. They are just simple value comparisons.

## Code Analysis and Step-by-Step Execution

Let's look at an example test:

```typescript
test('When_userClicksButton_Then_correctPageOpens', async ({ page }) => {
  // Arrange & Act
  await page.goto('https://example.com');
  await page.getByRole('button', { name: 'Click me' }).click();

  // Assert
  await expect(page).toHaveURL('https://example.com/destination');
  const headingText = await page.locator('h1').textContent();
  expect(headingText?.trim()).toBe('Welcome to Destination');
});
```

Let's analyze the execution of this test step by step:

1. The test opens the example.com page
2. The test clicks the "Click me" button
3. **Assertion 1**: The test checks the page URL using `await expect(page).toHaveURL(...)`
   - This is an asynchronous assertion (auto-retrying)
   - It will attempt to check the URL multiple times until it matches the expectation
   - **Requires \***await**\* for the test to wait for the condition to be met**
4. The test retrieves the heading text using `await page.locator('h1').textContent()`
   - This is an asynchronous operation because it requires communication with the browser
   - **Requires \***await**\* to wait for the text to be retrieved**
5. **Assertion 2**: The test compares the retrieved text using `expect(headingText?.trim()).toBe(...)`
   - This is a synchronous assertion (non-retrying)
   - It operates on a value that has already been retrieved in step 4
   - **Does not require \***await**\* because there is no asynchronous operation here**

## Why Omitting **_await_** in the Second expect is Correct?

The crucial fact here is that in step 4 we've already asynchronously retrieved the header content. By the time we execute the assertion in step 5, the `headingText` value is already available in our test's memory. We're not performing any operation that would require communication with the browser.

In this case, adding **_await_** before `expect(headingText?.trim()).toBe(...)` would not only be unnecessary but even misleading, as it would suggest that we're performing some asynchronous operation here, which is not true.

## When Does Omitting **_await_** Actually Cause Problems?

Test stability issues occur when we don't use **_await_** for auto-retrying assertions:

```typescript
// Bad - missing await for auto-retrying assertion
expect(page).toHaveURL('https://example.com/destination'); // Error! Should be await
```

In the above case, the test won't wait for the URL to change and may continue execution even if the page hasn't loaded yet, leading to unstable tests.

## What Does the Official Documentation Say?

Playwright's documentation is very clear on this matter. In the "Auto-retrying assertions" section, it lists assertions that require **_await_**, and in the "Non-retrying assertions" section, those that don't.

Moreover, the documentation explicitly states:

> "These assertions [non-retrying] allow to test any conditions, but do not auto-retry."

This means that these assertions are designed precisely for testing conditions without auto-retrying, which is exactly what we need when comparing already retrieved values.

## Conclusions

The myth that "every assertion in Playwright must use await" is false and stems from a misunderstanding of the differences between assertion types. The correct approach is:

1. Use **_await_** for auto-retrying assertions that communicate with the browser
2. Don't use **_await_** for simple comparisons of values you already have in memory

Following these principles will allow you to write more readable, efficient, and precise tests that accurately reflect your intentions. Remember that good test code should be readable and unambiguous-adding unnecessary **_await_** where it's not needed only obscures the actual intentions of the test.

---

_Article based on the official Playwright documentation available at playwright.dev_

---


## Asercje w Playwright: Kiedy faktycznie potrzebujesz await?

**URL:** https://portfolio.sdet.pl/articles/playwright-assertions-when-you-need-await
**Published:** 2025-05-20
**Language:** pl
Tags: playwright, testing, assertions

Analiza techniczna używania await z asercjami w Playwright. Obala popularny mit, że wszystkie asercje wymagają await.

## Wprowadzenie

W świecie testów automatycznych dla aplikacji webowych, Playwright stał się jednym z najpopularniejszych narzędzi. Jednak nawet doświadczeni inżynierowie QA mogą mieć wątpliwości dotyczące prawidłowego używania asercji, zwłaszcza gdy chodzi o stosowanie słowa kluczowego **_await_**. Często słyszę opinię, że "każda asercja w Playwright wymaga await, inaczej test będzie niestabilny". Czy rzeczywiście tak jest?

W tym artykule obalę ten mit i wyjaśnię, kiedy **_await_** jest niezbędny, a kiedy kompletnie zbędny - wszystko poparte oficjalną dokumentacją i analizą kodu.

## Dwa typy asercji w Playwright

Zacznijmy od kluczowej informacji: Playwright posiada dwa fundamentalnie różne typy asercji:

### 1. Asercje Auto-Retrying (wymagające **_await_**)

Te asercje są **asynchroniczne** i automatycznie ponawiane. Będą próbowały weryfikować warunek wielokrotnie, aż zostanie spełniony lub upłynie limit czasu (domyślnie 5 sekund). Przykłady:

```typescript
await expect(locator).toBeVisible();
await expect(page).toHaveURL(expectedUrl);
await expect(locator).toHaveText('Oczekiwany tekst');
```

Dokumentacja wyraźnie wskazuje, że "asercje z auto-retrying są asynchroniczne, więc musisz używać await" (tłum. własne: "Note that retrying assertions are async, so you must await them").

### 2. Asercje Non-Retrying (nie wymagające **_await_**)

Te asercje działają synchronicznie, weryfikując wartości, które już mamy w pamięci:

```typescript
expect(value).toBe(5);
expect(array).toContain('element');
expect(object).toHaveProperty('name');
```

Te asercje **nie wymagają \***await**\***, ponieważ nie wykonują żadnych operacji asynchronicznych. Są to zwykłe porównania wartości.

## Analiza kodu i wykonanie krok po kroku

Przyjrzyjmy się przykładowemu testowi:

```typescript
test('When_userClicksButton_Then_correctPageOpens', async ({ page }) => {
  // Arrange & Act
  await page.goto('https://example.com');
  await page.getByRole('button', { name: 'Click me' }).click();

  // Assert
  await expect(page).toHaveURL('https://example.com/destination');
  const headingText = await page.locator('h1').textContent();
  expect(headingText?.trim()).toBe('Welcome to Destination');
});
```

Przeanalizujmy wykonanie tego testu krok po kroku:

1. Test otwiera stronę example.com
2. Test klika przycisk "Click me"
3. **Asercja 1**: Test sprawdza URL strony za pomocą `await expect(page).toHaveURL(...)`
   - Jest to asercja asynchroniczna (auto-retrying)
   - Będzie próbowała sprawdzić URL wielokrotnie, aż będzie zgodny z oczekiwaniem
   - **Wymaga \***await**\*, aby test zaczekał na spełnienie warunku**
4. Test pobiera tekst nagłówka za pomocą `await page.locator('h1').textContent()`
   - Jest to operacja asynchroniczna, ponieważ wymaga komunikacji z przeglądarką
   - **Wymaga \***await**\*, aby zaczekać na pobranie tekstu**
5. **Asercja 2**: Test porównuje pobrany tekst za pomocą `expect(headingText?.trim()).toBe(...)`
   - Jest to asercja synchroniczna (non-retrying)
   - Operuje na wartości, która już została pobrana w kroku 4
   - **Nie wymaga \***await**\*, ponieważ nie ma tu żadnej operacji asynchronicznej**

## Dlaczego brak **_await_** w drugim expect jest poprawny?

Kluczowy jest tutaj fakt, że w kroku 4 już pobraliśmy zawartość nagłówka asynchronicznie. W momencie wykonania asercji w kroku 5, wartość `headingText` jest już dostępna w pamięci naszego testu. Nie wykonujemy żadnej operacji, która wymagałaby komunikacji z przeglądarką.

W takim przypadku dodanie **_await_** przed `expect(headingText?.trim()).toBe(...)` byłoby nie tylko zbędne, ale wręcz wprowadzające w błąd, ponieważ sugerowałoby, że wykonujemy tu jakąś operację asynchroniczną, co nie jest prawdą.

## Kiedy brak **_await_** faktycznie powoduje problemy?

Problemy z stabilnością testów występują, gdy nie używamy **_await_** dla asercji auto-retrying:

```typescript
// Źle - brak await przy asercji auto-retrying
expect(page).toHaveURL('https://example.com/destination'); // Błąd! Powinno być await
```

W powyższym przypadku test nie zaczeka na zmianę URL i może kontynuować wykonanie, nawet jeśli strona jeszcze się nie załadowała, co prowadzi do niestabilnych testów.

## Co mówi oficjalna dokumentacja?

Dokumentacja Playwright jest w tej kwestii bardzo jasna. W sekcji "Auto-retrying assertions" wymienia asercje, które wymagają **_await_**, a w sekcji "Non-retrying assertions" te, które go nie wymagają.

Co więcej, dokumentacja wyraźnie stwierdza:

> "These assertions [non-retrying] allow to test any conditions, but do not auto-retry."

Oznacza to, że asercje te są przeznaczone właśnie do testowania warunków bez auto-ponawiania, co jest dokładnie tym, czego potrzebujemy w przypadku porównywania już pobranych wartości.

## Wnioski

Mit, że "każda asercja w Playwright musi używać await" jest nieprawdziwy i wynika z niezrozumienia różnic między typami asercji. Prawidłowe podejście to:

1. Używaj **_await_** dla asercji auto-retrying, które komunikują się z przeglądarką
2. Nie używaj **_await_** dla prostych porównań wartości, które już masz w pamięci

Stosowanie tych zasad pozwoli pisać bardziej czytelne, wydajne i precyzyjne testy, które dokładnie odzwierciedlają twoje intencje. Pamiętaj, że dobry kod testowy powinien być czytelny i jednoznaczny - dodawanie zbędnych **_await_** tam, gdzie nie są potrzebne, tylko zaciemnia faktyczne intencje testu.

---

_Artykuł oparty na oficjalnej dokumentacji Playwright dostępnej na stronie playwright.dev_

---


## UI Tests Playwright - MAF (4-layer · Return Early · POM · Vertical Slice)

**URL:** https://portfolio.sdet.it/articles/ui-tests-playwright-maf
**Published:** 2025-04-26
**Language:** en
Tags: playwright, cdat, architecture, typescript

User interface testing with Playwright: TypeScript, Return Early Pattern, POM, Vertical Slice architecture on MAF app.

Test architecture is crucial for maintainability and scalability of automated tests. Analyzing the presented MAF E2E testing project built with Playwright and TypeScript, we can observe a hybrid approach combining Page Object Model (POM) with Vertical Slice architecture. Let's examine this architecture and its advantages in detail.

[Repo - in progress](https://github.com/darco81/maf-e2e-pw)

## Hybrid Approach: POM + Vertical Slice

The MAF E2E project utilizes two popular architectural patterns:

### Page Object Model (POM)

POM is a classic design pattern in UI testing that:

- Encapsulates UI interactions in dedicated classes
- Separates test logic from UI implementation details
- Creates an abstraction over interface elements

### Vertical Slice Architecture

Instead of organizing code by technical layers (e.g., all selectors together, all actions together), the project organizes code by features:

- Each functionality (e.g., Sidebar, Navbar) has its own self-contained directory
- All components needed to test a given function are kept together
- Provides better cohesion and reduced coupling between modules

## Project Structure

```
tests/
├── sidebar/           # Vertical slice for Sidebar
│   ├── actions.ts     # UI interactions
│   ├── components.ts  # Element selectors
│   ├── data.ts        # Test data
│   └── test.ts        # Test specifications
├── navbar/            # Vertical slice for Navbar
    ├── actions.ts
    ├── components.ts
    ├── data.ts
    └── test.ts
```

## Role of Individual Files

### components.ts - Selector Centralization

The `components.ts` file contains all selectors needed to locate UI elements:

```typescript
export const SidebarComponents = {
  root: '[data-testid="sidebar-root"]',
  toggle: '[data-testid="sidebar-toggle"]',
  // ...other selectors
};
```

**Benefits**:

- Centralization of selectors in one place
- Easy updates in case of UI changes
- Clear naming of elements
- Possibility to reuse the same selectors in different tests

### data.ts - Test Data Isolation

The `data.ts` file stores all test data, expected values, and constants:

```typescript
export const SidebarData = {
  title: 'M-A-F',
  subtitle: 'Moja Aplikacja Faktur',
  menuItems: {
    dashboard: 'Dashboard',
    invoices: 'Faktury',
    contractors: 'Kontrahenci',
  },
  // ...other data
};
```

**Benefits**:

- Separation of data from test logic
- Easy modification of expected values
- Test consistency (same values used consistently)
- Easier adaptation of tests to different environments

### actions.ts - Interaction Methods Without Assertions

The `actions.ts` file contains methods for interacting with the application, without assertions:

```typescript
export class SidebarActions {
  // ...
  async toggleSidebar() {
    await this.page.click(SidebarComponents.toggle);
  }

  async isSidebarCollapsed() {
    const sidebar = await this.page.$(SidebarComponents.root);
    return await sidebar?.evaluate((el) => el.classList.contains('sidebar-collapsed'));
  }
  // ...other methods
}
```

**Benefits**:

- Abstraction of UI interactions
- Reusability of methods in different tests
- More maintainable code - UI changes require modifications in only one place
- More readable tests, focused on behavior rather than technical implementation

### test.ts - Tests With Assertions

The `test.ts` file contains the actual tests with assertions:

```typescript
test('TC-SB-003: should collapse and expand sidebar correctly', async () => {
  // Arrange - Ensure sidebar is expanded initially
  // ...

  // Assert - Expanded state verification
  expect(await sidebarActions.isSidebarCollapsed()).toBeFalsy();
  expect(await sidebarActions.areTitlesVisible()).toBeTruthy();
  // ...

  // Act - Collapse sidebar
  await sidebarActions.toggleSidebar();
  // ...

  // Assert - Collapsed state verification
  expect(await sidebarActions.isSidebarCollapsed()).toBeTruthy();
  // ...
});
```

**Benefits**:

- Tests focused on specific behaviors
- Clear Arrange-Act-Assert structure
- Clear separation of test logic from implementation details
- Easier understanding of test intent

## Early Return Pattern

The project also uses the "Early Return" pattern instead of complex conditional structures:

```typescript
// Early return pattern
async isSidebarCollapsed() {
    const sidebar = await this.page.$(SidebarComponents.root);
    return await sidebar?.evaluate(el => el.classList.contains('sidebar-collapsed'));
}

// Instead of complex if/else structures
async toggleAction() {
    if (await this.someCondition()) {
        // do something
    } else {
        // do something else
    }
}
```

**Benefits**:

- Better code readability
- Reduced cyclomatic complexity
- Fewer levels of nesting
- Clear execution paths

## Key Benefits of This Approach

### 1. Increased Maintainability

- UI changes require updates in only one place (components.ts)
- Modification of expected values is centralized (data.ts)
- Clear separation of responsibilities between files

### 2. Better Code Organization

- Everything related to a given functionality is kept together
- Easy to find and update related elements
- Reduced need to jump between different files

### 3. Project Scalability

- Adding new functionalities doesn't affect existing ones
- Easy extension of tests with new cases
- Possibility for multiple people to work in parallel on different functionalities

### 4. More Readable Tests

- Tests focused on behavior verification, not implementation details
- Clear Arrange-Act-Assert structure
- Readable method and variable names reflecting intentions

## Conclusions

The hybrid approach combining Page Object Model with Vertical Slice architecture offers the best of both worlds: abstraction of UI interactions and organization of code by functionality. Additionally, the Early Return pattern improves readability and reduces code complexity.

Such architecture significantly enhances the process of creating and maintaining automated tests, especially in larger projects where scalability and code organization are key. In the case of the MAF application, the test structure mirrors the application structure, making it intuitive and easy to understand for the entire team.

---


## Testy UI Playwright - MAF (4-warstwy · Return Early · POM · Vertical Slice)

**URL:** https://portfolio.sdet.pl/articles/ui-tests-playwright-maf
**Published:** 2025-04-26
**Language:** pl
Tags: playwright, cdat, architecture, typescript

Testy interfejsu użytkownika z Playwright: TypeScript, Return Early Pattern, POM, Vertical Slice na aplikacji MAF.

Architektura testów automatycznych ma kluczowe znaczenie dla ich utrzymywalności i skalowalności. Analizując przedstawiony projekt testów E2E dla aplikacji MAF zbudowany z wykorzystaniem Playwright i TypeScript, możemy zauważyć zastosowanie hybrydowego podejścia łączącego Page Object Model (POM) z architekturą Vertical Slice. Przyjrzyjmy się dokładniej tej architekturze i jej zaletom.

[Repo - in progress](https://github.com/darco81/maf-e2e-pw)

## Hybrydowe podejście: POM + Vertical Slice

Projekt MAF E2E wykorzystuje dwa popularne wzorce architektoniczne:

### Page Object Model (POM)

POM to klasyczny wzorzec projektowy w testach UI, który:

- Enkapsuluje interakcje z interfejsem użytkownika w dedykowane klasy
- Oddziela logikę testową od szczegółów implementacji UI
- Tworzy abstrakcję nad elementami interfejsu

### Architektura Vertical Slice

Zamiast organizowania kodu według warstw technicznych (np. wszystkie selektory razem, wszystkie akcje razem), projekt organizuje kod według funkcji (features):

- Każda funkcjonalność (np. Sidebar, Navbar) ma własny, samodzielny katalog
- Wszystkie komponenty potrzebne do testowania danej funkcji znajdują się razem
- Zapewnia lepszą spójność i mniejsze powiązania między modułami

## Struktura projektu

```
tests/
├── sidebar/           # Vertical slice dla Sidebar
│   ├── actions.ts     # Interakcje UI
│   ├── components.ts  # Selektory elementów
│   ├── data.ts        # Dane testowe
│   └── test.ts        # Specyfikacje testów
├── navbar/            # Vertical slice dla Navbar
    ├── actions.ts
    ├── components.ts
    ├── data.ts
    └── test.ts
```

## Rola poszczególnych plików

### components.ts - Centralizacja selektorów

Plik `components.ts` zawiera wszystkie selektory potrzebne do lokalizacji elementów UI:

```typescript
export const SidebarComponents = {
  root: '[data-testid="sidebar-root"]',
  toggle: '[data-testid="sidebar-toggle"]',
  // ...pozostałe selektory
};
```

**Korzyści**:

- Centralizacja selektorów w jednym miejscu
- Łatwa aktualizacja w przypadku zmian w UI
- Czytelne nazewnictwo elementów
- Możliwość reużycia tych samych selektorów w różnych testach

### data.ts - Izolacja danych testowych

Plik `data.ts` przechowuje wszystkie dane testowe, oczekiwane wartości i stałe:

```typescript
export const SidebarData = {
  title: 'M-A-F',
  subtitle: 'Moja Aplikacja Faktur',
  menuItems: {
    dashboard: 'Dashboard',
    invoices: 'Faktury',
    contractors: 'Kontrahenci',
  },
  // ...pozostałe dane
};
```

**Korzyści**:

- Separacja danych od logiki testowej
- Łatwa modyfikacja oczekiwanych wartości
- Spójność testów (te same wartości używane konsekwentnie)
- Łatwiejsze przystosowanie testów do różnych środowisk

### actions.ts - Metody interakcji bez asercji

Plik `actions.ts` zawiera metody interakcji z aplikacją, bez asercji:

```typescript
export class SidebarActions {
  // ...
  async toggleSidebar() {
    await this.page.click(SidebarComponents.toggle);
  }

  async isSidebarCollapsed() {
    const sidebar = await this.page.$(SidebarComponents.root);
    return await sidebar?.evaluate((el) => el.classList.contains('sidebar-collapsed'));
  }
  // ...pozostałe metody
}
```

**Korzyści**:

- Abstrakcja interakcji z UI
- Reużywalność metod w różnych testach
- Kod łatwiejszy w utrzymaniu - zmiany w UI wymagają modyfikacji tylko w jednym miejscu
- Czytelniejsze testy, skupione na zachowaniu, a nie technicznej implementacji

### test.ts - Testy z asercjami

Plik `test.ts` zawiera właściwe testy z asercjami:

```typescript
test('TC-SB-003: should collapse and expand sidebar correctly', async () => {
  // Arrange - Ensure sidebar is expanded initially
  // ...

  // Assert - Expanded state verification
  expect(await sidebarActions.isSidebarCollapsed()).toBeFalsy();
  expect(await sidebarActions.areTitlesVisible()).toBeTruthy();
  // ...

  // Act - Collapse sidebar
  await sidebarActions.toggleSidebar();
  // ...

  // Assert - Collapsed state verification
  expect(await sidebarActions.isSidebarCollapsed()).toBeTruthy();
  // ...
});
```

**Korzyści**:

- Testy skoncentrowane na konkretnych zachowaniach
- Czytelna struktura Arrange-Act-Assert
- Wyraźne oddzielenie logiki testowej od szczegółów implementacji
- Łatwiejsze zrozumienie intencji testu

## Wzorzec Early Return

Projekt korzysta również z wzorca "Early Return" zamiast złożonych struktur warunkowych:

```typescript
// Wzorzec early return
async isSidebarCollapsed() {
    const sidebar = await this.page.$(SidebarComponents.root);
    return await sidebar?.evaluate(el => el.classList.contains('sidebar-collapsed'));
}

// Zamiast złożonych struktur if/else
async toggleAction() {
    if (await this.someCondition()) {
        // zrób coś
    } else {
        // zrób coś innego
    }
}
```

**Korzyści**:

- Lepsza czytelność kodu
- Zmniejszona złożoność cyklomatyczna
- Mniej poziomów zagnieżdżenia
- Jasne ścieżki wykonania

## Kluczowe korzyści z tego podejścia

### 1. Zwiększona utrzymywalność

- Zmiany w UI wymagają aktualizacji tylko w jednym miejscu (components.ts)
- Modyfikacja oczekiwanych wartości jest scentralizowana (data.ts)
- Wyraźny podział odpowiedzialności między plikami

### 2. Lepsza organizacja kodu

- Wszystko związane z daną funkcjonalnością znajduje się razem
- Łatwe znajdowanie i aktualizacja powiązanych elementów
- Zmniejszona potrzeba przeskakiwania między różnymi plikami

### 3. Skalowalność projektu

- Dodawanie nowych funkcjonalności nie wpływa na istniejące
- Łatwe rozszerzanie testu o nowe przypadki
- Możliwość równoległej pracy wielu osób nad różnymi funkcjonalnościami

### 4. Czytelniejsze testy

- Testy skupione na weryfikacji zachowania, nie na szczegółach implementacji
- Jasna struktura Arrange-Act-Assert
- Czytelne nazwy metod i zmiennych odzwierciedlające intencje

## Wnioski

Hybrydowe podejście łączące Page Object Model z architekturą Vertical Slice oferuje najlepsze z obu światów: abstrakcję interakcji z UI oraz organizację kodu według funkcjonalności. Dodatkowo wzorzec Early Return poprawia czytelność i zmniejsza złożoność kodu.

Taka architektura znacząco usprawnia proces tworzenia i utrzymania testów automatycznych, szczególnie w większych projektach, gdzie skalowalność i organizacja kodu są kluczowe. W przypadku aplikacji MAF, struktura testów odzwierciedla strukturę aplikacji, co czyni ją intuicyjną i łatwą do zrozumienia dla całego zespołu.

---


## Automatic Update Dates in Project Portfolio

**URL:** https://portfolio.sdet.it/articles/automatic-update-dates-is-project-portfolio
**Published:** 2025-04-08
**Language:** en
Tags: github-api, portfolio, frontend

Benefits for portfolio prestige - automatic GitHub API integration for last-commit-date display per project card.

Showcasing the timeliness of your projects is an important element of building your professional online image. Here's how you can implement automatic display of repository last update dates in your React portfolio.

## Why Show Update Dates?

Displaying the last update dates of projects in your portfolio brings several important benefits:

1. **Transparency** - you show that your projects are actively developed
2. **Credibility** - visitors can see that you're not presenting "dead" projects
3. **Professionalism** - attention to detail and automation speak to your quality as a developer

## How to Implement It?

The foundation of the solution is the GitHub API, which allows you to retrieve information about the last commit in a repository. Here are the key steps:

### 1. Create a Service for GitHub API Communication

```typescript
export function extractRepoInfo(url: string): { owner: string; repo: string } | null {
  if (!url) return null;

  try {
    const githubUrlRegex = /github\.com\/([^/]+)\/([^/]+)/;
    const match = url.match(githubUrlRegex);

    if (match && match.length >= 3) {
      return {
        owner: match[1],
        repo: match[2].split('#')[0].split('?')[0],
      };
    }
    return null;
  } catch (error) {
    console.error('Error extracting repo info:', error);
    return null;
  }
}

export async function fetchLastCommitDateWithCache(
  owner: string,
  repo: string,
): Promise<string | null> {
  // Implementation of fetching date from GitHub API with a cache mechanism
  try {
    const response = await fetch(
      `https://api.github.com/repos/${owner}/${repo}/commits?per_page=1`,
    );

    if (!response.ok) {
      throw new Error(`GitHub API error: ${response.status}`);
    }

    const commits = await response.json();

    if (commits && commits.length > 0 && commits[0].commit?.committer?.date) {
      return commits[0].commit.committer.date;
    }

    return null;
  } catch (error) {
    console.error('Error fetching commit data:', error);
    return null;
  }
}

export function formatCommitDate(dateString: string, locale: string = 'en'): string {
  try {
    const date = new Date(dateString);
    return new Intl.DateTimeFormat(locale === 'pl' ? 'pl-PL' : 'en-US', {
      year: 'numeric',
      month: 'short',
      day: 'numeric',
    }).format(date);
  } catch (error) {
    return dateString;
  }
}
```

### 2. Update the Projects Component

```tsx
const Projects = () => {
  const { lang } = useLanguage();
  const [projects, setProjects] = useState<(Project)[]>(initialProjects);
  const [isLoading, setIsLoading] = useState(true);

  // Fetch last commit dates when component mounts
  useEffect(() => {
    const fetchCommitDates = async () => {
      setIsLoading(true);

      const updatedProjects = await Promise.all(
        initialProjects.map(async (project) => {
          if (!project.link || project.link.trim() === '') {
            return project;
          }

          const repoInfo = extractRepoInfo(project.link);
          if (!repoInfo) {
            return project;
          }

          try {
            const lastCommitDate = await fetchLastCommitDateWithCache(repoInfo.owner, repoInfo.repo);
            return {
              ...project,
              lastCommitDate
            };
          } catch (error) {
            return project;
          }
        })
      );

      setProjects(updatedProjects);
      setIsLoading(false);
    };

    fetchCommitDates();
  }, []);

  return (
    // Render projects with date information
  );
};
```

### 3. Display the Date in the Project Card

```tsx
{
  project.lastCommitDate ? (
    <CardFooter className="mt-auto border-t border-[var(--matrix-darker)] pt-3">
      <div className="flex items-center text-xs text-[var(--matrix-mid-light)]">
        <Clock className="mr-1 h-3 w-3" />
        <span>
          {projectsMessages.lastUpdated[lang]}: {formatCommitDate(project.lastCommitDate, lang)}
        </span>
      </div>
    </CardFooter>
  ) : project.link && isLoading ? (
    <CardFooter className="mt-auto border-t border-[var(--matrix-darker)] pt-3">
      <div className="flex animate-pulse items-center text-xs text-[var(--matrix-mid-light)]">
        <Clock className="mr-1 h-3 w-3" />
        <span>{projectsMessages.lastUpdated[lang]}...</span>
      </div>
    </CardFooter>
  ) : null;
}
```

## Advantages and Disadvantages of the Solution

### Advantages

1. **Automation** - dates are updated without manual intervention
2. **Currency** - the latest information is always presented
3. **Completeness** - date information is available for every GitHub project
4. **Personalization** - dates are formatted according to the selected UI language
5. **Credibility** - information comes directly from GitHub, not hardcoded

### Disadvantages

1. **API Limits** - GitHub API has a limit of 60 requests per hour for unauthenticated requests
2. **API Availability Dependency** - if the API is unavailable, information will not be displayed
3. **Performance** - additional API requests may slightly slow down portfolio loading
4. **Private Repository Visibility** - works only for public repositories

## Benefits for Portfolio Prestige

1. **Professional Image** - automatic dates show your commitment to updating projects
2. **Transparency** - visitors immediately see which projects are actively developed
3. **Technical Credibility** - implementing such a solution demonstrates your ability to integrate with APIs
4. **Attention to Detail** - caring about details like update dates demonstrates your thoroughness
5. **Consistency** - automatic dates are always current, creating a coherent and well-maintained portfolio image

Implementing automatic update dates is a small but significant element that makes your portfolio stand out with professionalism and attention to detail. This feature not only increases the informational value of your projects but also demonstrates your programming skills in practical application.

---


## Automatyczne Daty Aktualizacji w Portfolio Projektu

**URL:** https://portfolio.sdet.pl/articles/automatic-update-dates-is-project-portfolio
**Published:** 2025-04-08
**Language:** pl
Tags: github-api, portfolio, frontend

Korzyści dla prestiżu portfolio - automatyczna integracja z GitHub API dla ostatniej daty commita per karta projektu.

Prezentacja aktualności Twoich projektów to ważny element budowania profesjonalnego wizerunku online. Oto jak możesz zaimplementować automatyczne wyświetlanie dat ostatniej aktualizacji repozytorium w swoim portfolio React.

## Dlaczego warto pokazywać daty aktualizacji?

Wyświetlanie dat ostatnich aktualizacji projektów w portfolio przynosi kilka istotnych korzyści:

1. **Przejrzystość** - pokazujesz, że Twoje projekty są aktywnie rozwijane
2. **Wiarygodność** - odwiedzający mogą zobaczyć, że nie prezentujesz "martwych" projektów
3. **Profesjonalizm** - troska o szczegóły i automatyzacja stanowią o Twojej jakości jako dewelopera

## Jak to zaimplementować?

Podstawą rozwiązania jest GitHub API, które pozwala na pobieranie informacji o ostatnim commicie w repozytorium. Oto kluczowe kroki:

### 1. Utwórz serwis do komunikacji z GitHub API

```typescript
export function extractRepoInfo(url: string): { owner: string; repo: string } | null {
  if (!url) return null;

  try {
    const githubUrlRegex = /github\.com\/([^/]+)\/([^/]+)/;
    const match = url.match(githubUrlRegex);

    if (match && match.length >= 3) {
      return {
        owner: match[1],
        repo: match[2].split('#')[0].split('?')[0],
      };
    }
    return null;
  } catch (error) {
    console.error('Error extracting repo info:', error);
    return null;
  }
}

export async function fetchLastCommitDateWithCache(
  owner: string,
  repo: string,
): Promise<string | null> {
  // Implementacja pobierania daty z GitHub API z mechanizmem cache
  try {
    const response = await fetch(
      `https://api.github.com/repos/${owner}/${repo}/commits?per_page=1`,
    );

    if (!response.ok) {
      throw new Error(`GitHub API error: ${response.status}`);
    }

    const commits = await response.json();

    if (commits && commits.length > 0 && commits[0].commit?.committer?.date) {
      return commits[0].commit.committer.date;
    }

    return null;
  } catch (error) {
    console.error('Error fetching commit data:', error);
    return null;
  }
}

export function formatCommitDate(dateString: string, locale: string = 'en'): string {
  try {
    const date = new Date(dateString);
    return new Intl.DateTimeFormat(locale === 'pl' ? 'pl-PL' : 'en-US', {
      year: 'numeric',
      month: 'short',
      day: 'numeric',
    }).format(date);
  } catch (error) {
    return dateString;
  }
}
```

### 2. Zaktualizuj komponent projektów

```tsx
const Projects = () => {
  const { lang } = useLanguage();
  const [projects, setProjects] = useState<(Project)[]>(initialProjects);
  const [isLoading, setIsLoading] = useState(true);

  // Fetch last commit dates when component mounts
  useEffect(() => {
    const fetchCommitDates = async () => {
      setIsLoading(true);

      const updatedProjects = await Promise.all(
        initialProjects.map(async (project) => {
          if (!project.link || project.link.trim() === '') {
            return project;
          }

          const repoInfo = extractRepoInfo(project.link);
          if (!repoInfo) {
            return project;
          }

          try {
            const lastCommitDate = await fetchLastCommitDateWithCache(repoInfo.owner, repoInfo.repo);
            return {
              ...project,
              lastCommitDate
            };
          } catch (error) {
            return project;
          }
        })
      );

      setProjects(updatedProjects);
      setIsLoading(false);
    };

    fetchCommitDates();
  }, []);

  return (
    // Render projects with date information
  );
};
```

### 3. Wyświetl datę w karcie projektu

```tsx
{
  project.lastCommitDate ? (
    <CardFooter className="mt-auto border-t border-[var(--matrix-darker)] pt-3">
      <div className="flex items-center text-xs text-[var(--matrix-mid-light)]">
        <Clock className="mr-1 h-3 w-3" />
        <span>
          {projectsMessages.lastUpdated[lang]}: {formatCommitDate(project.lastCommitDate, lang)}
        </span>
      </div>
    </CardFooter>
  ) : project.link && isLoading ? (
    <CardFooter className="mt-auto border-t border-[var(--matrix-darker)] pt-3">
      <div className="flex animate-pulse items-center text-xs text-[var(--matrix-mid-light)]">
        <Clock className="mr-1 h-3 w-3" />
        <span>{projectsMessages.lastUpdated[lang]}...</span>
      </div>
    </CardFooter>
  ) : null;
}
```

## Zalety i wady rozwiązania

### Zalety

1. **Automatyzacja** - daty aktualizowane są bez ręcznej interwencji
2. **Aktualność** - zawsze prezentowane są najnowsze dane
3. **Kompletność** - informacja o dacie dostępna jest dla każdego projektu z GitHub
4. **Personalizacja** - daty formatowane są zgodnie z wybranym językiem UI
5. **Wiarygodność** - informacje pochodzą bezpośrednio z GitHub, nie są "hardcodowane"

### Wady

1. **Limity API** - GitHub API ma limit 60 zapytań na godzinę dla nieautoryzowanych żądań
2. **Zależność od dostępności API** - jeśli API jest niedostępne, informacje nie będą wyświetlane
3. **Wydajność** - dodatkowe zapytania do API mogą nieznacznie spowolnić ładowanie portfolio
4. **Widoczność prywatnych repozytoriów** - działa tylko dla publicznych repozytoriów

## Korzyści dla prestiżu portfolio

1. **Profesjonalny wizerunek** - automatyczne daty pokazują Twoje zaangażowanie w aktualizację projektów
2. **Transparentność** - goście od razu widzą, które projekty są aktywnie rozwijane
3. **Wiarygodność techniczna** - implementacja takiego rozwiązania demonstruje umiejętność integracji z API
4. **Szczegółowość** - dbałość o detale jak daty aktualizacji świadczy o Twojej dokładności
5. **Spójność** - automatyczne daty są zawsze aktualne, co tworzy spójny i zadbany obraz portfolio

Zaimplementowanie automatycznych dat aktualizacji to drobny, ale znaczący element, który sprawia, że Twoje portfolio wyróżnia się profesjonalizmem i dbałością o szczegóły. Ta funkcja nie tylko zwiększa wartość informacyjną projektów, ale też demonstruje Twoje umiejętności programistyczne w praktycznym zastosowaniu.

---


## Facade and Delegation Pattern

**URL:** https://portfolio.sdet.it/articles/facade-pattern-and-delegation
**Published:** 2025-04-07
**Language:** en
Tags: patterns, refactoring, playwright, cdat

Refactoring large test files - organize chaos with Facade + Delegation patterns.

## Facade Pattern and Delegation as a Way to Organize Chaos

When projects grow, test classes often transform into monolithic behemoths filled with repetitive code. In this article, we'll discuss a strategy for refactoring large test action files using the facade pattern and delegation to maintain order and scalability.

## State Before Refactoring: Anatomy of Chaos

Analyzing the provided code, we see a typical example of the problem - the **_SaleActions_** class in the **_old-actions.ts_** file counts over 1,500 lines of code. It's a classic "God Object" containing:

- Interface navigation methods
- Form filling methods
- Price and discount calculations
- Approval and verification actions
- Shopping cart manipulations
- Payment handling

The problems with this approach are obvious:

1. **Maintenance difficulty** - a single change requires understanding the entire class
2. **Conflict risk** - many developers working on the same file
3. **Debugging difficulty** - problems are hard to locate
4. **SOLID principles violation** - especially the Single Responsibility Principle
5. **Entry barrier** - new team members feel overwhelmed

## Architecture After Refactoring: Facade and Delegation

The refactoring introduces the facade pattern, preserving the existing interface of the **_SaleActions_** class while delegating specific operations to specialized classes:

```typescript
export class SaleActions {
  // Modules from the new structure
  private basketTableManager: BasketTableManager;
  private processActions: ProcessActions.SaleProcessActions;
  private paymentActions: ProcessActions.PaymentActions;
  private contractorActions: ProcessActions.ContractorActions;
  private itemPriceActions: ItemActions.ItemPriceActions;
  private itemDiscountActions: ItemActions.ItemDiscountActions;
  private itemManagementActions: ItemActions.ItemManagementActions;
  private loaderActions: CommonActions.LoaderActions;
  private uiActions: CommonActions.UIActions;

  constructor(private page: Page) {
    // Initialization of all modules...
  }

  // Delegations to appropriate modules...
  async collectBasketData(): Promise<BasketItem[]> {
    return this.basketTableManager.collectBasketData();
  }

  async createNewSale(): Promise<void> {
    return this.processActions.createNewSale();
  }

  // Other delegations...
}
```

The code above shows the main **_SaleActions_** class, which now acts as a facade. Instead of implementing all methods, it delegates calls to specialized classes:

1. **BasketTableManager** - shopping cart table management
2. **ProcessActions** - sales and payment processes
3. **ItemActions** - item, price, and discount management
4. **CommonActions** - common UI and loading operations

## Advantages of the New Approach

### 1. Easier Maintenance and Development

Each class now has a clearly defined responsibility, making it easier to find and modify code. New functionalities can be added in appropriate modules without touching the entire system.

### 2. Better Code Organization

The modular structure makes it easier to understand the system:

```
SaleActions/
├── common/
│   ├── loader-actions.ts
│   └── ui-actions.ts
├── item/
│   ├── item-discount-actions.ts
│   ├── item-management-actions.ts
│   └── item-price-actions.ts
├── process/
│   ├── contractor-actions.ts
│   ├── payment-actions.ts
│   └── sale-process-actions.ts
└── table/
    ├── basket-action-executor.ts
    ├── basket-data-extractor.ts
    ├── basket-table-manager.ts
    └── basket-table-navigator.ts
```

### 3. Easier Testing

Smaller, specialized classes are easier to unit test. We can now test **_BasketDataExtractor_** functionality independently of the rest of the system.

### 4. SOLID Compliance

- **Single Responsibility** - each class has one responsibility
- **Open/Closed** - extending functionality without modifying existing code
- **Liskov Substitution** - interfaces allow implementation substitutions
- **Interface Segregation** - small, dedicated interfaces
- **Dependency Inversion** - dependencies through abstractions

### 5. Easier Onboarding of New Team Members

New developers can focus on understanding one module, instead of the entire system.

## Potential Disadvantages and Challenges

### 1. Structure Complexity

Introducing many classes and interfaces increases the structural complexity of the project. This may make it harder to understand data flow for people unfamiliar with the pattern.

### 2. Refactoring Costs

Transforming an existing system requires time and attention. There's a risk of introducing errors during code migration.

### 3. Potential Redundancy

Introducing delegation can lead to excessive intermediate layers:

```typescript
// Example of potential redundancy
async createNewSale(): Promise<void> {
  return this.processActions.createNewSale();
}
```

### 4. State Management

Distributed classes can make it difficult to manage shared state. It may be necessary to introduce synchronization mechanisms.

## Strategic Approach to Refactoring

### 1. Analyzing Existing Code

Before you start, analyze the existing code, identifying natural functionality clusters. In our case, we extracted table operations, item management, and sales processes.

### 2. Gradual Implementation

Instead of refactoring everything at once, it's better to work iteratively:

1. Extract one functionality group (e.g., table operations)
2. Build a new class and move code to it
3. Apply delegations in the main class
4. Run tests to make sure everything works correctly
5. Move to the next functionality group

### 3. Building on Interfaces

Use interfaces to define contracts between components:

```typescript
export interface TableActionExecutor {
  openRowMenu(rowIndex: string | number): Promise<void>;
  executeAction(actionId: string): Promise<void>;
  // other methods...
}

export class BasketActionExecutor implements TableActionExecutor {
  // implementation...
}
```

### 4. Maintaining Compatibility

It's essential to maintain the existing public interface of the main class so that tests don't require modification:

```typescript
// Before refactoring
await saleActions.clickChangePriceButton();

// After refactoring - same interface, different implementation
async clickChangePriceButton(): Promise<void> {
  return this.itemPriceActions.clickChangePriceButton();
}
```

## DRY vs YAGNI in the Context of Refactoring

During refactoring, we often encounter tension between DRY (Don't Repeat Yourself) and YAGNI (You Aren't Gonna Need It) principles:

**DRY**: Eliminating code duplication leads to creating abstractions, which can be seen in our **_BasketTableNavigator_** or **_BasketDataExtractor_** classes.

**YAGNI**: Excessive abstraction can lead to unnecessary complexity. Sometimes a simple method in one class is better than a complicated class hierarchy.

A reasonable approach is:

1. Eliminating obvious duplications
2. Creating abstractions only where they bring clear benefits
3. Delaying the creation of advanced abstractions until patterns become clear

## Practical Conclusions

1. **Start with a clear plan** - a refactoring map will help maintain the direction of changes

2. **Test continuously** - each change should be verified by tests

3. **Document changes** - well-written comments and documentation will make it easier to understand the new structure

4. **Consider using tools** - automation can help in safe refactoring

5. **Communicate changes to the team** - everyone should understand the new architecture

## Summary

The facade pattern and delegation are powerful tools in refactoring large, monolithic test classes. While this process requires careful planning and execution, the benefits of easier maintenance, testing, and code extension are worth the effort.

Let's remember, however, that refactoring is not an end in itself, but a means to create better, more maintainable code. The key is finding a balance between ideal abstraction and practical usefulness, always keeping in mind the needs of the team and project.

---


## Wzorzec Fasady i Delegacji

**URL:** https://portfolio.sdet.pl/articles/facade-pattern-and-delegation
**Published:** 2025-04-07
**Language:** pl
Tags: patterns, refactoring, playwright, cdat

Refaktoryzacja dużych plików testowych - sposób na uporządkowanie chaosu z wzorcami Fasady i Delegacji.

## Wzorzec Fasady i Delegacje jako sposób na uporządkowanie chaosu

Gdy projekty rozrastają się, klasy testowe często przekształcają się w monolityczne kolosy pełne powtarzającego się kodu. W tym artykule omówimy strategię refaktoryzacji dużych plików z akcjami testowymi, wykorzystując wzorzec fasady i delegacje, aby utrzymać porządek i skalowalność.

## Stan przed refaktoryzacją: anatomia chaosu

Analizując dostarczony kod, widzimy typowy przykład problemu - klasa **_SaleActions_** w pliku **_old-actions.ts_** liczy ponad 1500 linii kodu. Jest to klasyczny "God Object" zawierający:

- Metody nawigacji interfejsu
- Metody wypełniania formularzy
- Obliczenia cenowe i rabatowe
- Akcje zatwierdzania i weryfikacji
- Manipulacje koszykiem zakupowym
- Obsługę płatności

Problemy takiego podejścia są oczywiste:

1. **Trudność w utrzymaniu** - pojedyncza zmiana wymaga zrozumienia całej klasy
2. **Ryzyko konfliktów** - wielu deweloperów pracujących nad tym samym plikiem
3. **Trudność w debugowaniu** - problemy są trudne do zlokalizowania
4. **Naruszenie zasad SOLID** - szczególnie Single Responsibility Principle
5. **Bariera wejścia** - nowi członkowie zespołu czują się przytłoczeni

## Architektura po refaktoryzacji: fasada i delegacje

Refaktoryzacja wprowadza wzorzec fasady, zachowując istniejący interfejs klasy **_SaleActions_**, ale delegując konkretne operacje do wyspecjalizowanych klas:

```typescript
export class SaleActions {
  // Moduły z nowej struktury
  private basketTableManager: BasketTableManager;
  private processActions: ProcessActions.SaleProcessActions;
  private paymentActions: ProcessActions.PaymentActions;
  private contractorActions: ProcessActions.ContractorActions;
  private itemPriceActions: ItemActions.ItemPriceActions;
  private itemDiscountActions: ItemActions.ItemDiscountActions;
  private itemManagementActions: ItemActions.ItemManagementActions;
  private loaderActions: CommonActions.LoaderActions;
  private uiActions: CommonActions.UIActions;

  constructor(private page: Page) {
    // Inicjalizacja wszystkich modułów...
  }

  // Delegacje do odpowiednich modułów...
  async collectBasketData(): Promise<BasketItem[]> {
    return this.basketTableManager.collectBasketData();
  }

  async createNewSale(): Promise<void> {
    return this.processActions.createNewSale();
  }

  // Inne delegacje...
}
```

Powyższy kod pokazuje główną klasę **_SaleActions_**, która działa teraz jako fasada. Zamiast implementować wszystkie metody, deleguje wywołania do wyspecjalizowanych klas:

1. **BasketTableManager** - zarządzanie tabelą koszyka
2. **ProcessActions** - procesy sprzedaży i płatności
3. **ItemActions** - zarządzanie elementami, cenami i rabatami
4. **CommonActions** - wspólne operacje UI i ładowania

## Zalety nowego podejścia

### 1. Łatwiejsze utrzymanie i rozwój

Każda klasa ma teraz jasno określoną odpowiedzialność, co ułatwia znajdowanie i modyfikowanie kodu. Nowe funkcjonalności można dodawać w odpowiednich modułach bez dotykania całego systemu.

### 2. Lepsza organizacja kodu

Struktura modułowa pozwala łatwiej zrozumieć system:

```
SaleActions/
├── common/
│   ├── loader-actions.ts
│   └── ui-actions.ts
├── item/
│   ├── item-discount-actions.ts
│   ├── item-management-actions.ts
│   └── item-price-actions.ts
├── process/
│   ├── contractor-actions.ts
│   ├── payment-actions.ts
│   └── sale-process-actions.ts
└── table/
    ├── basket-action-executor.ts
    ├── basket-data-extractor.ts
    ├── basket-table-manager.ts
    └── basket-table-navigator.ts
```

### 3. Łatwiejsze testowanie

Mniejsze, wyspecjalizowane klasy są łatwiejsze do testowania jednostkowego. Możemy teraz testować działanie **_BasketDataExtractor_** niezależnie od reszty systemu.

### 4. Zgodność z SOLID

- **Single Responsibility** - każda klasa ma jedną odpowiedzialność
- **Open/Closed** - rozszerzanie funkcjonalności bez modyfikacji istniejącego kodu
- **Liskov Substitution** - interfejsy umożliwiają podmiany implementacji
- **Interface Segregation** - małe, dedykowane interfejsy
- **Dependency Inversion** - zależności poprzez abstrakcje

### 5. Ułatwione wdrażanie nowych członków zespołu

Nowi deweloperzy mogą skupić się na zrozumieniu jednego modułu, zamiast całego systemu.

## Potencjalne wady i wyzwania

### 1. Złożoność struktury

Wprowadzenie wielu klas i interfejsów zwiększa złożoność strukturalną projektu. Może to utrudniać zrozumienie przepływu danych dla osób niezaznajomionych z wzorcem.

### 2. Koszty refaktoryzacji

Przekształcenie istniejącego systemu wymaga czasu i uwagi. Istnieje ryzyko wprowadzenia błędów podczas przenoszenia kodu.

### 3. Potencjalna nadmiarowość

Wprowadzenie delegacji może prowadzić do nadmiernej warstwy pośredniej:

```typescript
// Przykład potencjalnej nadmiarowości
async createNewSale(): Promise<void> {
  return this.processActions.createNewSale();
}
```

### 4. Zarządzanie stanem

Rozproszone klasy mogą utrudniać zarządzanie współdzielonym stanem. Konieczne może być wprowadzenie mechanizmów synchronizacji.

## Strategiczne podejście do refaktoryzacji

### 1. Analiza istniejącego kodu

Zanim zaczniesz, przeanalizuj istniejący kod, identyfikując naturalne klastry funkcjonalności. W naszym przypadku wyodrębniliśmy operacje na tabeli, zarządzanie elementami i procesy sprzedaży.

### 2. Stopniowa implementacja

Zamiast refaktoryzować wszystko naraz, lepiej pracować iteracyjnie:

1. Wyodrębnij jedną grupę funkcjonalności (np. operacje na tabeli)
2. Zbuduj nową klasę i przenieś do niej kod
3. Zastosuj delegacje w głównej klasie
4. Uruchom testy, aby upewnić się, że wszystko działa poprawnie
5. Przejdź do kolejnej grupy funkcjonalności

### 3. Budowanie na interfejsach

Wykorzystaj interfejsy do określenia kontraktów między komponentami:

```typescript
export interface TableActionExecutor {
  openRowMenu(rowIndex: string | number): Promise<void>;
  executeAction(actionId: string): Promise<void>;
  // inne metody...
}

export class BasketActionExecutor implements TableActionExecutor {
  // implementacja...
}
```

### 4. Zachowanie kompatybilności

Zasadnicze jest zachowanie istniejącego interfejsu publicznego głównej klasy, aby testy nie wymagały modyfikacji:

```typescript
// Przed refaktoryzacją
await saleActions.clickChangePriceButton();

// Po refaktoryzacji - taki sam interfejs, inna implementacja
async clickChangePriceButton(): Promise<void> {
  return this.itemPriceActions.clickChangePriceButton();
}
```

## DRY vs YAGNI w kontekście refaktoryzacji

Podczas refaktoryzacji często napotykamy napięcie między zasadami DRY (Don't Repeat Yourself) i YAGNI (You Aren't Gonna Need It):

**DRY**: Eliminacja duplikacji kodu prowadzi do tworzenia abstrakcji, co widać w naszych klasach **_BasketTableNavigator_** czy **_BasketDataExtractor_**.

**YAGNI**: Nadmierna abstrakcja może prowadzić do niepotrzebnej złożoności. Czasem prosta metoda w jednej klasie jest lepsza niż skomplikowana hierarchia klas.

Rozsądne podejście to:

1. Eliminacja oczywistych duplikacji
2. Tworzenie abstrakcji tylko tam, gdzie przynoszą wyraźne korzyści
3. Opóźnienie tworzenia zaawansowanych abstrakcji do momentu, gdy wzorce staną się jasne

## Wnioski praktyczne

1. **Zacznij od jasnego planu** - mapa refaktoryzacji pomoże utrzymać kierunek zmian

2. **Testuj na bieżąco** - każda zmiana powinna być weryfikowana przez testy

3. **Dokumentuj zmiany** - dobrze napisane komentarze i dokumentacja ułatwią zrozumienie nowej struktury

4. **Rozważ użycie narzędzi** - automatyzacja może pomóc w bezpiecznej refaktoryzacji

5. **Komunikuj zmiany zespołowi** - wszyscy powinni rozumieć nową architekturę

## Podsumowanie

Wzorzec fasady i delegacje stanowią potężne narzędzie w refaktoryzacji dużych, monolitycznych klas testowych. Choć proces ten wymaga starannego planowania i wykonania, korzyści w postaci łatwiejszego utrzymania, testowania i rozszerzania kodu są warte wysiłku.

Pamiętajmy jednak, że refaktoryzacja nie jest celem samym w sobie, ale środkiem do tworzenia lepszego, bardziej utrzymywalnego kodu. Kluczem jest znalezienie równowagi między idealną abstrakcją a praktyczną użytecznością, mając zawsze na uwadze potrzeby zespołu i projektu.

---


## Testing Library vs Playwright

**URL:** https://portfolio.sdet.it/articles/react-testing-library-vs-playwright
**Published:** 2025-03-31
**Language:** en
Tags: react, testing, playwright, testing-library

Testing React components with Testing Library vs Playwright - which one to choose and when?

## Introduction

Testing frontend applications, especially those built with React, has become one of the key elements in the software development process. Two popular testing libraries - React Testing Library and Playwright - offer different approaches to verifying the correctness of user interfaces. In this article, we will conduct an in-depth analysis of both solutions, highlighting their strengths and weaknesses, and scenarios where they perform best.

## Table of Contents:

1. [Characteristics of the tools](#characteristics-of-the-tools)
2. [Testing philosophy](#testing-philosophy)
3. [Environment configuration](#environment-configuration)
4. [Basic test cases](#basic-test-cases)
5. [User interaction testing](#user-interaction-testing)
6. [Asynchronous testing](#asynchronous-testing)
7. [Mocking and test isolation](#mocking-and-test-isolation)
8. [Debugging tests](#debugging-tests)
9. [Performance and scalability](#performance-and-scalability)
10. [CI/CD integration](#cicd-integration)
11. [Comparison based on real scenarios](#comparison-based-on-real-scenarios)
12. [Summary and recommendations](#summary-and-recommendations)

## Characteristics of the tools

### React Testing Library

React Testing Library (RTL) is part of a larger family of Testing Library libraries, designed to test UI components in a way that reflects real user experiences. RTL emphasizes testing what the user sees and interacts with, rather than focusing on the internal implementation of components.

```javascript
// Basic test example with React Testing Library
import { render, screen, fireEvent } from '@testing-library/react';
import Counter from './Counter';

test('increment counter after click', () => {
  render(<Counter />);

  // Find elements based on text/role
  const counter = screen.getByText(/counter: 0/i);
  const incrementButton = screen.getByRole('button', { name: /increment/i });

  // Simulate a click
  fireEvent.click(incrementButton);

  // Check if the state has changed
  expect(screen.getByText(/counter: 1/i)).toBeInTheDocument();
});
```

### Playwright

Playwright is a browser automation framework that enables end-to-end (E2E) testing of web applications across multiple browsers (Chromium, Firefox, WebKit). Although Playwright is primarily an E2E testing tool, it can also be used to test React components using `@playwright/experimental-ct-react`.

```javascript
// Basic test example with Playwright for a component
import { test, expect } from '@playwright/experimental-ct-react';
import Counter from './Counter';

test('increment counter after click', async ({ mount }) => {
  // Render the component
  const component = await mount(<Counter />);

  // Check the initial state
  await expect(component.getByText(/counter: 0/i)).toBeVisible();

  // Click the button
  await component.getByRole('button', { name: /increment/i }).click();

  // Check if the state has changed
  await expect(component.getByText(/counter: 1/i)).toBeVisible();
});
```

## Testing philosophy

| Aspect        | React Testing Library                                    | Playwright                                                      |
| ------------- | -------------------------------------------------------- | --------------------------------------------------------------- |
| Testing level | Mainly unit and integration tests                        | Mainly E2E tests, with component testing capabilities           |
| Approach      | "Testing Library Way": test behavior, not implementation | "Browser First": test like a real browser                       |
| Selectors     | Prefers accessible attributes (roles, labels, text)      | Offers multiple element selection strategies (CSS, XPath, text) |
| Isolation     | Tests components in isolation or shallow integrations    | Tests entire applications or components in a browser context    |
| Focus         | On behavior accessible to the user                       | On full functionality available in the browser                  |

## Environment configuration

### React Testing Library Configuration

```javascript
// package.json
{
  "dependencies": {
    "react": "^18.2.0",
    "react-dom": "^18.2.0"
  },
  "devDependencies": {
    "@testing-library/jest-dom": "^6.1.4",
    "@testing-library/react": "^14.0.0",
    "@testing-library/user-event": "^14.5.1",
    "jest": "^29.7.0",
    "jest-environment-jsdom": "^29.7.0"
  }
}
```

```javascript
// jest.config.js
module.exports = {
  testEnvironment: 'jsdom',
  setupFilesAfterEnv: ['./jest.setup.js'],
  transform: {
    '^.+\\.(js|jsx|ts|tsx)$': 'babel-jest',
  },
};
```

```javascript
// jest.setup.js
import '@testing-library/jest-dom';
```

### Playwright Configuration for Component Testing

```javascript
// package.json
{
  "dependencies": {
    "react": "^18.2.0",
    "react-dom": "^18.2.0"
  },
  "devDependencies": {
    "@playwright/experimental-ct-react": "^1.40.0",
    "@playwright/test": "^1.40.0"
  }
}
```

```javascript
// playwright-ct.config.ts
import { defineConfig } from '@playwright/experimental-ct-react';
import { resolve } from 'path';

export default defineConfig({
  testDir: './tests',
  use: {
    ctPort: 3100,
    ctViteConfig: {
      resolve: {
        alias: {
          '@': resolve(__dirname, './src'),
        },
      },
    },
  },
  projects: [
    {
      name: 'chromium',
      use: { browserName: 'chromium' },
    },
    {
      name: 'firefox',
      use: { browserName: 'firefox' },
    },
    {
      name: 'webkit',
      use: { browserName: 'webkit' },
    },
  ],
});
```

```typescript
// playwright/index.html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Testing with Playwright</title>
</head>
<body>
  <div id="root"></div>
  <script type="module" src="./index.tsx"></script>
</body>
</html>
```

## Basic test cases

### React Testing Library

```javascript
// Testing component rendering
import { render, screen } from '@testing-library/react';
import UserProfile from './UserProfile';

test('displays user data correctly', () => {
  const user = {
    name: 'John Smith',
    email: 'john@example.com',
    role: 'Developer',
  };

  render(<UserProfile user={user} />);

  expect(screen.getByText('John Smith')).toBeInTheDocument();
  expect(screen.getByText('john@example.com')).toBeInTheDocument();
  expect(screen.getByText('Developer')).toBeInTheDocument();
});

// Testing conditional rendering
test('displays a message when no user data is available', () => {
  render(<UserProfile />);

  expect(screen.getByText(/no user data available/i)).toBeInTheDocument();
});
```

### Playwright

```javascript
// Testing component rendering
import { test, expect } from '@playwright/experimental-ct-react';
import UserProfile from './UserProfile';

test('displays user data correctly', async ({ mount }) => {
  const user = {
    name: 'John Smith',
    email: 'john@example.com',
    role: 'Developer',
  };

  const component = await mount(<UserProfile user={user} />);

  await expect(component.getByText('John Smith')).toBeVisible();
  await expect(component.getByText('john@example.com')).toBeVisible();
  await expect(component.getByText('Developer')).toBeVisible();
});

// Testing conditional rendering
test('displays a message when no user data is available', async ({ mount }) => {
  const component = await mount(<UserProfile />);

  await expect(component.getByText(/no user data available/i)).toBeVisible();
});
```

## User interaction testing

### React Testing Library with user-event

```javascript
// Testing a login form
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import LoginForm from './LoginForm';

test('calls onSubmit with login data after button click', async () => {
  const mockSubmit = jest.fn();
  render(<LoginForm onSubmit={mockSubmit} />);

  // Find form fields
  const emailInput = screen.getByLabelText(/email/i);
  const passwordInput = screen.getByLabelText(/password/i);
  const submitButton = screen.getByRole('button', { name: /log in/i });

  // Type data
  await userEvent.type(emailInput, 'test@example.com');
  await userEvent.type(passwordInput, 'password123');

  // Click the button
  await userEvent.click(submitButton);

  // Check if the function was called with the right arguments
  expect(mockSubmit).toHaveBeenCalledWith({
    email: 'test@example.com',
    password: 'password123',
  });
});
```

### Playwright

```javascript
// Testing a login form
import { test, expect } from '@playwright/experimental-ct-react';
import LoginForm from './LoginForm';

test('calls onSubmit with login data after button click', async ({ mount }) => {
  const onSubmitMock = { submit: ({ email, password }) => {} };
  const submitSpy = test.spyOn(onSubmitMock, 'submit');

  const component = await mount(<LoginForm onSubmit={onSubmitMock.submit} />);

  // Type data
  await component.getByLabel(/email/i).fill('test@example.com');
  await component.getByLabel(/password/i).fill('password123');

  // Click the button
  await component.getByRole('button', { name: /log in/i }).click();

  // Check if the function was called with the right arguments
  expect(submitSpy).toHaveBeenCalledWith({
    email: 'test@example.com',
    password: 'password123',
  });
});
```

## Asynchronous testing

### React Testing Library

```javascript
// Testing data loading
import { render, screen, waitFor } from '@testing-library/react';
import UserList from './UserList';
import { fetchUsers } from './api';

// Mocking the API module
jest.mock('./api');

test('displays user list after loading', async () => {
  // Prepare the mock
  fetchUsers.mockResolvedValueOnce([
    { id: 1, name: 'John Smith' },
    { id: 2, name: 'Jane Doe' },
  ]);

  render(<UserList />);

  // Check if the loader is displayed
  expect(screen.getByText(/loading/i)).toBeInTheDocument();

  // Wait for the data
  await waitFor(() => {
    expect(screen.getByText('John Smith')).toBeInTheDocument();
    expect(screen.getByText('Jane Doe')).toBeInTheDocument();
    expect(screen.queryByText(/loading/i)).not.toBeInTheDocument();
  });
});
```

### Playwright

```javascript
// Testing data loading
import { test, expect } from '@playwright/experimental-ct-react';
import { MockedApiProvider } from './test-utils';
import UserList from './UserList';

test('displays user list after loading', async ({ mount }) => {
  // Data for the mock
  const mockUsers = [
    { id: 1, name: 'John Smith' },
    { id: 2, name: 'Jane Doe' },
  ];

  // Render the component with a mock provider
  const component = await mount(
    <MockedApiProvider
      mocks={{
        fetchUsers: async () => mockUsers,
      }}
    >
      <UserList />
    </MockedApiProvider>,
  );

  // Check if the loader is displayed
  await expect(component.getByText(/loading/i)).toBeVisible();

  // Wait for the data
  await expect(component.getByText('John Smith')).toBeVisible();
  await expect(component.getByText('Jane Doe')).toBeVisible();

  // Check if the loader disappeared
  await expect(component.getByText(/loading/i)).not.toBeVisible();
});
```

## Mocking and test isolation

### React Testing Library

```javascript
// Mocking React context
import { render, screen, fireEvent } from '@testing-library/react';
import { ThemeContext } from './ThemeContext';
import ThemeSwitcher from './ThemeSwitcher';

test('toggles theme', () => {
  const mockSetTheme = jest.fn();

  render(
    <ThemeContext.Provider value={{ theme: 'light', setTheme: mockSetTheme }}>
      <ThemeSwitcher />
    </ThemeContext.Provider>,
  );

  // Click the switch button
  fireEvent.click(screen.getByRole('button', { name: /change theme/i }));

  // Click the switch button
  fireEvent.click(screen.getByRole('button', { name: /change theme/i }));

  // Check if the function was called with the right argument
  expect(mockSetTheme).toHaveBeenCalledWith('dark');
});

// Mocking modules
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import WeatherWidget from './WeatherWidget';
import { getWeather } from './weatherService';

// Mocking the weather service module
jest.mock('./weatherService');

test('displays weather information after searching for a city', async () => {
  // Set up the mock
  getWeather.mockResolvedValueOnce({
    temperature: 21,
    conditions: 'Sunny',
    humidity: 45,
  });

  render(<WeatherWidget />);

  // Enter the city name
  await userEvent.type(screen.getByLabelText(/city/i), 'London');

  // Click the search button
  await userEvent.click(screen.getByRole('button', { name: /check/i }));

  // Wait for results
  expect(await screen.findByText(/temperature: 21°C/i)).toBeInTheDocument();
  expect(screen.getByText(/conditions: sunny/i)).toBeInTheDocument();
  expect(screen.getByText(/humidity: 45%/i)).toBeInTheDocument();

  // Check if the service was called with the right argument
  expect(getWeather).toHaveBeenCalledWith('London');
});
```

### Playwright

```javascript
// Mocking React context
import { test, expect } from '@playwright/experimental-ct-react';
import { ThemeContext } from './ThemeContext';
import ThemeSwitcher from './ThemeSwitcher';

test('toggles theme', async ({ mount }) => {
  const mockContextValue = {
    theme: 'light',
    setTheme: test.fn(),
  };

  const component = await mount(
    <ThemeContext.Provider value={mockContextValue}>
      <ThemeSwitcher />
    </ThemeContext.Provider>,
  );

  // Click the switch button
  await component.getByRole('button', { name: /change theme/i }).click();

  // Check if the function was called with the right argument
  expect(mockContextValue.setTheme).toHaveBeenCalledWith('dark');
});

// Mocking HTTP requests
import { test, expect } from '@playwright/experimental-ct-react';
import WeatherWidget from './WeatherWidget';

test('displays weather information after searching for a city', async ({ mount, page }) => {
  // Prepare the mock for the API
  await page.route('**/api/weather?city=**', (route) => {
    route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({
        temperature: 21,
        conditions: 'Sunny',
        humidity: 45,
      }),
    });
  });

  const component = await mount(<WeatherWidget />);

  // Enter the city name
  await component.getByLabel(/city/i).fill('London');

  // Click the search button
  await component.getByRole('button', { name: /check/i }).click();

  // Wait for results
  await expect(component.getByText(/temperature: 21°C/i)).toBeVisible();
  await expect(component.getByText(/conditions: sunny/i)).toBeVisible();
  await expect(component.getByText(/humidity: 45%/i)).toBeVisible();
});
```

## Debugging tests

### React Testing Library

```javascript
// Debugging tests
import { render, screen } from '@testing-library/react';
import ComplexComponent from './ComplexComponent';

test('renders complex component', () => {
  render(<ComplexComponent />);

  // Display DOM structure to the console
  screen.debug();

  // Display a specific element
  const header = screen.getByRole('heading', { name: /title/i });
  screen.debug(header);

  // Log available elements and their roles
  console.log(screen.logTestingPlaygroundURL());
});
```

### Playwright

```javascript
// Debugging tests
import { test, expect } from '@playwright/experimental-ct-react';
import ComplexComponent from './ComplexComponent';

test('renders complex component', async ({ mount, page }) => {
  const component = await mount(<ComplexComponent />);

  // Capture a screenshot
  await page.screenshot({ path: 'screenshot.png' });

  // Enable debug mode
  await page.pause();

  // Accessibility inspection
  const snapshot = await page.accessibility.snapshot();
  console.log(JSON.stringify(snapshot, null, 2));

  // Check the DOM structure
  const html = await page.content();
  console.log(html);
});
```

## Debugging tools

| Tool                     | React Testing Library          | Playwright                      |
| ------------------------ | ------------------------------ | ------------------------------- |
| DOM preview              | `screen.debug()`               | `page.content()`                |
| Screenshots              | Not natively supported         | `page.screenshot()`             |
| Accessibility inspection | `logRoles()`                   | `page.accessibility.snapshot()` |
| Interactive debugging    | Through IDE breakpoints        | `page.pause()`                  |
| Video recording          | Not natively supported         | `recordVideo` in configuration  |
| Console inspection       | `jest.spyOn(console, 'error')` | `page.on('console')`            |

## Performance and scalability

### Performance comparison

| Aspect                     | React Testing Library               | Playwright                           |
| -------------------------- | ----------------------------------- | ------------------------------------ |
| Startup time               | Fast - runs in Node.js environment  | Slower - requires browser launch     |
| Memory                     | Low usage                           | Higher usage due to the browser      |
| Parallelism                | Supported by Jest                   | Built-in sharding support            |
| Test isolation             | Isolated by default                 | Possible state sharing between tests |
| Handling large test suites | Good, but may require configuration | Very good, with built-in tools       |

### Optimization examples

```javascript
// RTL optimization - reuse rendering
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import Counter from './Counter';

describe('Counter', () => {
  const user = userEvent.setup();

  beforeEach(() => {
    render(<Counter />);
  });

  test('shows initial value', () => {
    expect(screen.getByText(/counter: 0/i)).toBeInTheDocument();
  });

  test('increases value after click', async () => {
    await user.click(screen.getByRole('button', { name: /increment/i }));
    expect(screen.getByText(/counter: 1/i)).toBeInTheDocument();
  });
});
```

```javascript
// Playwright optimization - state sharing
import { test, expect } from '@playwright/experimental-ct-react';
import ComplexApp from './ComplexApp';

// Perform expensive operations only once
test.beforeAll(async ({ browser }) => {
  const page = await browser.newPage();
  await page.goto('http://localhost:3000');
  await page.evaluate(() => localStorage.setItem('token', 'test-token'));
  await page.context().storageState({ path: 'state.json' });
  await page.close();
});

// Use saved state
test.use({
  storageState: 'state.json',
});

test('renders app with saved state', async ({ mount }) => {
  const component = await mount(<ComplexApp />);
  await expect(component.getByText(/logged in/i)).toBeVisible();
});
```

## CI/CD integration

### React Testing Library in CI/CD

```yaml
# .github/workflows/react-testing-library.yml
name: RTL Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18.x'
          cache: 'npm'
      - run: npm ci
      - run: npm test
      - name: Upload test results
        uses: actions/upload-artifact@v3
        with:
          name: test-results
          path: coverage/
```

### Playwright in CI/CD

```yaml
# .github/workflows/playwright.yml
name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18.x'
          cache: 'npm'
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npm run test:ct
      - name: Upload test results
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: playwright-report
          path: playwright-report/
```

## Comparison based on real scenarios

### Scenario 1: Testing a form with validation

```javascript
// FormWithValidation component
import React, { useState } from 'react';

const FormWithValidation = ({ onSubmit }) => {
  const [formData, setFormData] = useState({
    name: '',
    email: '',
    password: '',
  });
  const [errors, setErrors] = useState({});

  const validate = () => {
    const newErrors = {};
    if (!formData.name) newErrors.name = 'Name is required';
    if (!formData.email) newErrors.email = 'Email is required';
    if (!formData.email.includes('@')) newErrors.email = 'Invalid email format';
    if (formData.password.length < 8) newErrors.password = 'Password must be at least 8 characters';
    return newErrors;
  };

  const handleChange = (e) => {
    const { name, value } = e.target;
    setFormData((prev) => ({ ...prev, [name]: value }));
  };

  const handleSubmit = (e) => {
    e.preventDefault();
    const newErrors = validate();
    if (Object.keys(newErrors).length === 0) {
      onSubmit(formData);
    } else {
      setErrors(newErrors);
    }
  };

  return (
    <form onSubmit={handleSubmit} noValidate>
      <div>
        <label htmlFor="name">Name:</label>
        <input
          id="name"
          name="name"
          value={formData.name}
          onChange={handleChange}
          aria-invalid={!!errors.name}
        />
        {errors.name && <p role="alert">{errors.name}</p>}
      </div>

      <div>
        <label htmlFor="email">Email:</label>
        <input
          id="email"
          name="email"
          type="email"
          value={formData.email}
          onChange={handleChange}
          aria-invalid={!!errors.email}
        />
        {errors.email && <p role="alert">{errors.email}</p>}
      </div>

      <div>
        <label htmlFor="password">Password:</label>
        <input
          id="password"
          name="password"
          type="password"
          value={formData.password}
          onChange={handleChange}
          aria-invalid={!!errors.password}
        />
        {errors.password && <p role="alert">{errors.password}</p>}
      </div>

      <button type="submit">Save</button>
    </form>
  );
};

export default FormWithValidation;
```

#### React Testing Library

```javascript
// Test form with validation - RTL
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import FormWithValidation from './FormWithValidation';

describe('FormWithValidation', () => {
  test('displays validation errors for empty fields', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Click the submit button without filling in the fields
    await user.click(screen.getByRole('button', { name: /save/i }));

    // Check if error messages are displayed
    expect(screen.getByText(/name is required/i)).toBeInTheDocument();
    expect(screen.getByText(/email is required/i)).toBeInTheDocument();
    expect(screen.getByText(/password must be at least 8 characters/i)).toBeInTheDocument();

    // Check if the onSubmit function was not called
    expect(handleSubmit).not.toHaveBeenCalled();
  });

  test('submits the form with valid data', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Fill the form with valid data
    await user.type(screen.getByLabelText(/name/i), 'John Smith');
    await user.type(screen.getByLabelText(/email/i), 'john@example.com');
    await user.type(screen.getByLabelText(/password/i), 'password123');

    // Click the submit button
    await user.click(screen.getByRole('button', { name: /save/i }));

    // Check if the onSubmit function was called with the correct data
    expect(handleSubmit).toHaveBeenCalledWith({
      name: 'John Smith',
      email: 'john@example.com',
      password: 'password123',
    });
  });

  test('displays error with invalid email format', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Fill the form with an invalid email
    await user.type(screen.getByLabelText(/name/i), 'John Smith');
    await user.type(screen.getByLabelText(/email/i), 'invalid-email');
    await user.type(screen.getByLabelText(/password/i), 'password123');

    // Click the submit button
    await user.click(screen.getByRole('button', { name: /save/i }));

    // Check if the error message is displayed
    expect(screen.getByText(/invalid email format/i)).toBeInTheDocument();

    // Check if the onSubmit function was not called
    expect(handleSubmit).not.toHaveBeenCalled();
  });
});
```

#### Playwright

```javascript
// Test form with validation - Playwright
import { test, expect } from '@playwright/experimental-ct-react';
import FormWithValidation from './FormWithValidation';

test.describe('FormWithValidation', () => {
  test('displays validation errors for empty fields', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Click the submit button without filling in the fields
    await component.getByRole('button', { name: /save/i }).click();

    // Check if error messages are displayed
    await expect(component.getByText(/name is required/i)).toBeVisible();
    await expect(component.getByText(/email is required/i)).toBeVisible();
    await expect(component.getByText(/password must be at least 8 characters/i)).toBeVisible();

    // Check if the onSubmit function was not called
    expect(submitSpy).not.toHaveBeenCalled();
  });

  test('submits the form with valid data', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Fill the form with valid data
    await component.getByLabel(/name/i).fill('John Smith');
    await component.getByLabel(/email/i).fill('john@example.com');
    await component.getByLabel(/password/i).fill('password123');

    // Click the submit button
    await component.getByRole('button', { name: /save/i }).click();

    // Check if the onSubmit function was called with the correct data
    expect(submitSpy).toHaveBeenCalledWith({
      name: 'John Smith',
      email: 'john@example.com',
      password: 'password123',
    });
  });

  test('displays error with invalid email format', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Fill the form with an invalid email
    await component.getByLabel(/name/i).fill('John Smith');
    await component.getByLabel(/email/i).fill('invalid-email');
    await component.getByLabel(/password/i).fill('password123');

    // Click the submit button
    await component.getByRole('button', { name: /save/i }).click();

    // Check if the error message is displayed
    await expect(component.getByText(/invalid email format/i)).toBeVisible();

    // Check if the onSubmit function was not called
    expect(submitSpy).not.toHaveBeenCalled();
  });
});
```

### Scenario 2: Testing a component with asynchronous data loading

```javascript
// DataFetcher component
import React, { useState, useEffect } from 'react';

const DataFetcher = ({ url, renderItem }) => {
  const [data, setData] = useState([]);
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState(null);

  useEffect(() => {
    let isMounted = true;

    const fetchData = async () => {
      try {
        setLoading(true);
        const response = await fetch(url);

        if (!response.ok) {
          throw new Error(`HTTP error! status: ${response.status}`);
        }

        const result = await response.json();

        if (isMounted) {
          setData(result);
          setError(null);
        }
      } catch (err) {
        if (isMounted) {
          setError(err.message);
        }
      } finally {
        if (isMounted) {
          setLoading(false);
        }
      }
    };

    fetchData();

    return () => {
      isMounted = false;
    };
  }, [url]);

  if (loading) {
    return <div data-testid="loading">Loading data...</div>;
  }

  if (error) {
    return <div data-testid="error">Error: {error}</div>;
  }

  if (data.length === 0) {
    return <div data-testid="empty">No data available</div>;
  }

  return (
    <ul data-testid="data-list">
      {data.map((item, index) => (
        <li key={index}>{renderItem(item)}</li>
      ))}
    </ul>
  );
};

export default DataFetcher;
```

#### React Testing Library

```javascript
// Test component with asynchronous data loading - RTL
import { render, screen, waitForElementToBeRemoved } from '@testing-library/react';
import DataFetcher from './DataFetcher';

// Mock global functions
global.fetch = jest.fn();

describe('DataFetcher', () => {
  beforeEach(() => {
    global.fetch.mockClear();
  });

  test('displays loading state, then data', async () => {
    // Prepare the mock
    global.fetch.mockResolvedValueOnce({
      ok: true,
      json: async () => [{ name: 'Item 1' }, { name: 'Item 2' }],
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Check if loading state is displayed
    expect(screen.getByTestId('loading')).toBeInTheDocument();

    // Wait for loading to complete
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Check if data is displayed
    expect(screen.getByTestId('data-list')).toBeInTheDocument();
    expect(screen.getByText('Item 1')).toBeInTheDocument();
    expect(screen.getByText('Item 2')).toBeInTheDocument();

    // Check if fetch was called with the correct URL
    expect(global.fetch).toHaveBeenCalledWith('/api/data');
  });

  test('handles error during data loading', async () => {
    // Prepare mock with error
    global.fetch.mockResolvedValueOnce({
      ok: false,
      status: 500,
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Wait for loading to complete
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Check if error is displayed
    expect(screen.getByTestId('error')).toBeInTheDocument();
    expect(screen.getByText(/HTTP error! status: 500/i)).toBeInTheDocument();
  });

  test('handles empty data list', async () => {
    // Prepare mock with empty array
    global.fetch.mockResolvedValueOnce({
      ok: true,
      json: async () => [],
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Wait for loading to complete
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Check if empty data message is displayed
    expect(screen.getByTestId('empty')).toBeInTheDocument();
    expect(screen.getByText(/no data available/i)).toBeInTheDocument();
  });
});
```

#### Playwright

```javascript
// Test component with asynchronous data loading - Playwright
import { test, expect } from '@playwright/experimental-ct-react';
import DataFetcher from './DataFetcher';

test.describe('DataFetcher', () => {
  test('displays loading state, then data', async ({ mount, page }) => {
    // Mock API response
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify([{ name: 'Item 1' }, { name: 'Item 2' }]),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Check if loading state is displayed
    await expect(component.getByTestId('loading')).toBeVisible();

    // Wait for loading to complete and check results
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('data-list')).toBeVisible();
    await expect(component.getByText('Item 1')).toBeVisible();
    await expect(component.getByText('Item 2')).toBeVisible();
  });

  test('handles error during data loading', async ({ mount, page }) => {
    // Mock API error
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 500,
        contentType: 'application/json',
        body: JSON.stringify({ error: 'Internal Server Error' }),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Wait for loading to complete and check error
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('error')).toBeVisible();
    await expect(component.getByText(/HTTP error! status: 500/i)).toBeVisible();
  });

  test('handles empty data list', async ({ mount, page }) => {
    // Mock empty API response
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify([]),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Wait for loading to complete and check empty data message
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('empty')).toBeVisible();
    await expect(component.getByText(/no data available/i)).toBeVisible();
  });
});
```

# Summary and recommendations

Both React Testing Library (RTL) and Playwright offer robust solutions for testing React applications, though each has its optimal use cases. Below is a comprehensive summary and recommendations for choosing the right tool.

## Decision flow diagram

```mermaid
flowchart TB
    A[Start] --> B{What type of test?}
    B -->|Components and logic| C{Need real browser?}
    B -->|Integration and E2E| D[Playwright]
    C -->|No| E[React Testing Library]
    C -->|Yes| F{Multiple browsers?}
    F -->|No| G{Need advanced interactions?}
    F -->|Yes| D
    G -->|Yes| D
    G -->|No| E

    style D fill:#c9e3ff,stroke:#4a86e8
    style E fill:#d9ead3,stroke:#6aa84f

    subgraph H[Legend]
    I[RTL: Faster, for unit and integration tests of components]
    J[Playwright: For E2E tests, multi-browser simulation, and advanced interactions]
    end

```

## Comparison table

| Aspect                             | React Testing Library    | Playwright                           |
| ---------------------------------- | ------------------------ | ------------------------------------ |
| Test type                          | Unit, integration        | Integration, E2E, component          |
| Runtime environment                | JSDOM (DOM simulation)   | Real browsers                        |
| Browser support                    | Simulation in JSDOM      | Chrome, Firefox, Safari, Edge        |
| Configuration complexity           | Low                      | Medium                               |
| Execution speed                    | High                     | Medium                               |
| Debugging                          | Good                     | Very good                            |
| Dependency mocking                 | Easy (in JS environment) | More complex                         |
| User interactions                  | Basic to intermediate    | Advanced and comprehensive           |
| Visual testing                     | Limited                  | Extensive (screenshots, comparisons) |
| Performance with large test suites | Very good                | Good                                 |
| CI/CD integration                  | Simple                   | Requires additional configuration    |
| Learning curve                     | Flat                     | Steeper                              |

## When to choose React Testing Library:

1. **Component testing**: When the main goal is to test isolated React components.
2. **Fast tests**: When execution speed is crucial and you want the fastest feedback loop.
3. **Unit and integration tests**: When focusing on business logic verification and basic user interactions.
4. **Configuration simplicity**: For teams that need to implement tests quickly without complex configuration.
5. **Small and medium projects**: Particularly effective in projects where the component architecture is well-defined.

## When to choose Playwright:

1. **E2E tests**: When you need to test the entire user flow from beginning to end.
2. **Multi-browser testing**: When you need to ensure compatibility across different browsers.
3. **Advanced interactions**: When testing complex user interactions that require precise browser control.
4. **Visual testing**: When you need to compare the appearance of the user interface.
5. **Large projects and cross-platform applications**: For comprehensive projects with multiple user flows.

## Best practices:

1. **Hybrid approach**: In most projects, a combination of both tools works best - RTL for unit and component tests, Playwright for E2E tests.
2. **Test pyramid**: Maintain the classic test pyramid with more unit tests (RTL) and fewer E2E tests (Playwright).
3. **CI/CD automation**: Configure both tools in your CI/CD pipeline to ensure comprehensive verification.
4. **Performance optimization**: For large projects, consider parallel test execution, particularly with Playwright.
5. **Common language**: Regardless of the tool chosen, use a consistent approach to naming and organizing tests.

## Conclusion:

React Testing Library and Playwright are not competing tools, but rather complementary solutions in the React testing ecosystem. RTL excels at testing components and their logic, providing fast execution and simple configuration. Playwright, on the other hand, offers a comprehensive solution for E2E tests with advanced browser control features.

The choice of the right tool should be dictated by the project context, testing requirements, and team resources. In an ideal scenario, a testing strategy should include both component tests with RTL and E2E tests with Playwright, creating a solid foundation for ensuring React application quality.

---


## Testing Library vs Playwright

**URL:** https://portfolio.sdet.pl/articles/react-testing-library-vs-playwright
**Published:** 2025-03-31
**Language:** pl
Tags: react, testing, playwright, testing-library

Testowanie komponentów React z Testing Library vs Playwright - co wybrać i kiedy?

## Wprowadzenie

Testowanie front-endu, a w szczególności aplikacji zbudowanych w React, stało się jednym z kluczowych elementów procesu wytwarzania oprogramowania. Dwie popularne biblioteki testowe - React Testing Library oraz Playwright - oferują różne podejścia do weryfikacji poprawności działania interfejsu użytkownika. W tym artykule przeprowadzimy dogłębną analizę obu rozwiązań, wskazując ich mocne i słabe strony oraz scenariusze, w których sprawdzają się najlepiej.

## Spis treści:

1. [Charakterystyka narzędzi](#charakterystyka-narzędzi)
2. [Filozofia testowania](#filozofia-testowania)
3. [Konfiguracja środowiska](#konfiguracja-środowiska)
4. [Podstawowe przypadki testowe](#podstawowe-przypadki-testowe)
5. [Testowanie interakcji użytkownika](#testowanie-interakcji-użytkownika)
6. [Testowanie asynchroniczne](#testowanie-asynchroniczne)
7. [Mockowanie i izolacja testów](#mockowanie-i-izolacja-testów)
8. [Debugowanie testów](#debugowanie-testów)
9. [Wydajność i skalowalność](#wydajność-i-skalowalność)
10. [Integracja z CI/CD](#integracja-z-cicd)
11. [Porównanie na podstawie realnych scenariuszy](#porównanie-na-podstawie-realnych-scenariuszy)
12. [Podsumowanie i rekomendacje](#podsumowanie-i-rekomendacje)

## Charakterystyka narzędzi

### React Testing Library

React Testing Library (RTL) jest częścią większej rodziny bibliotek Testing Library, zaprojektowanych do testowania komponentów UI w sposób, który odzwierciedla rzeczywiste doświadczenia użytkownika. RTL kładzie nacisk na testowanie tego, co użytkownik widzi i z czym wchodzi w interakcję, zamiast koncentrować się na wewnętrznej implementacji komponentów.

```javascript
// Przykład podstawowego testu z React Testing Library
import { render, screen, fireEvent } from '@testing-library/react';
import Counter from './Counter';

test('inkrementacja licznika po kliknięciu', () => {
  render(<Counter />);

  // Znajdujemy elementy na podstawie tekstu/roli
  const counter = screen.getByText(/licznik: 0/i);
  const incrementButton = screen.getByRole('button', { name: /zwiększ/i });

  // Symulujemy kliknięcie
  fireEvent.click(incrementButton);

  // Sprawdzamy czy stan się zmienił
  expect(screen.getByText(/licznik: 1/i)).toBeInTheDocument();
});
```

### Playwright

Playwright to framework do automatyzacji przeglądarek, który umożliwia testowanie end-to-end (E2E) aplikacji webowych w wielu przeglądarkach (Chromium, Firefox, WebKit). Chociaż Playwright jest głównie narzędziem do testów E2E, można go również wykorzystać do testowania komponentów React z użyciem `@playwright/experimental-ct-react`.

```javascript
// Przykład podstawowego testu z Playwright dla komponentu
import { test, expect } from '@playwright/experimental-ct-react';
import Counter from './Counter';

test('inkrementacja licznika po kliknięciu', async ({ mount }) => {
  // Renderujemy komponent
  const component = await mount(<Counter />);

  // Sprawdzamy początkowy stan
  await expect(component.getByText(/licznik: 0/i)).toBeVisible();

  // Klikamy przycisk
  await component.getByRole('button', { name: /zwiększ/i }).click();

  // Sprawdzamy czy stan się zmienił
  await expect(component.getByText(/licznik: 1/i)).toBeVisible();
});
```

## Filozofia testowania

| Aspekt            | React Testing Library                                       | Playwright                                                      |
| ----------------- | ----------------------------------------------------------- | --------------------------------------------------------------- |
| Poziom testowania | Głównie testy jednostkowe i integracyjne                    | Głównie testy E2E, z możliwością testowania komponentów         |
| Podejście         | "Testing Library Way": testuj zachowanie, nie implementację | "Browser First": testuj jak prawdziwa przeglądarka              |
| Selektory         | Preferuje dostępne atrybuty (role, etykiety, tekst)         | Oferuje wiele strategii wyboru elementów (CSS, XPath, tekst)    |
| Izolacja          | Testuje komponenty w izolacji lub płytkie integracje        | Testuje całe aplikacje lub komponenty w kontekście przeglądarki |
| Focus             | Na zachowaniu dostępnym dla użytkownika                     | Na pełnej funkcjonalności dostępnej w przeglądarce              |

## Konfiguracja środowiska

### Konfiguracja React Testing Library

```javascript
// package.json
{
  "dependencies": {
    "react": "^18.2.0",
    "react-dom": "^18.2.0"
  },
  "devDependencies": {
    "@testing-library/jest-dom": "^6.1.4",
    "@testing-library/react": "^14.0.0",
    "@testing-library/user-event": "^14.5.1",
    "jest": "^29.7.0",
    "jest-environment-jsdom": "^29.7.0"
  }
}
```

```javascript
// jest.config.js
module.exports = {
  testEnvironment: 'jsdom',
  setupFilesAfterEnv: ['./jest.setup.js'],
  transform: {
    '^.+\\.(js|jsx|ts|tsx)$': 'babel-jest',
  },
};
```

```javascript
// jest.setup.js
import '@testing-library/jest-dom';
```

### Konfiguracja Playwright dla testowania komponentów

```javascript
// package.json
{
  "dependencies": {
    "react": "^18.2.0",
    "react-dom": "^18.2.0"
  },
  "devDependencies": {
    "@playwright/experimental-ct-react": "^1.40.0",
    "@playwright/test": "^1.40.0"
  }
}
```

```javascript
// playwright-ct.config.ts
import { defineConfig } from '@playwright/experimental-ct-react';
import { resolve } from 'path';

export default defineConfig({
  testDir: './tests',
  use: {
    ctPort: 3100,
    ctViteConfig: {
      resolve: {
        alias: {
          '@': resolve(__dirname, './src'),
        },
      },
    },
  },
  projects: [
    {
      name: 'chromium',
      use: { browserName: 'chromium' },
    },
    {
      name: 'firefox',
      use: { browserName: 'firefox' },
    },
    {
      name: 'webkit',
      use: { browserName: 'webkit' },
    },
  ],
});
```

```typescript
// playwright/index.html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Testing with Playwright</title>
</head>
<body>
  <div id="root"></div>
  <script type="module" src="./index.tsx"></script>
</body>
</html>
```

## Podstawowe przypadki testowe

### React Testing Library

```javascript
// Testowanie renderowania komponentu
import { render, screen } from '@testing-library/react';
import UserProfile from './UserProfile';

test('wyświetla dane użytkownika poprawnie', () => {
  const user = {
    name: 'Jan Kowalski',
    email: 'jan@example.com',
    role: 'Developer',
  };

  render(<UserProfile user={user} />);

  expect(screen.getByText('Jan Kowalski')).toBeInTheDocument();
  expect(screen.getByText('jan@example.com')).toBeInTheDocument();
  expect(screen.getByText('Developer')).toBeInTheDocument();
});

// Testowanie warunkowego renderowania
test('wyświetla komunikat, gdy brak danych użytkownika', () => {
  render(<UserProfile />);

  expect(screen.getByText(/brak danych użytkownika/i)).toBeInTheDocument();
});
```

### Playwright

```javascript
// Testowanie renderowania komponentu
import { test, expect } from '@playwright/experimental-ct-react';
import UserProfile from './UserProfile';

test('wyświetla dane użytkownika poprawnie', async ({ mount }) => {
  const user = {
    name: 'Jan Kowalski',
    email: 'jan@example.com',
    role: 'Developer',
  };

  const component = await mount(<UserProfile user={user} />);

  await expect(component.getByText('Jan Kowalski')).toBeVisible();
  await expect(component.getByText('jan@example.com')).toBeVisible();
  await expect(component.getByText('Developer')).toBeVisible();
});

// Testowanie warunkowego renderowania
test('wyświetla komunikat, gdy brak danych użytkownika', async ({ mount }) => {
  const component = await mount(<UserProfile />);

  await expect(component.getByText(/brak danych użytkownika/i)).toBeVisible();
});
```

## Testowanie interakcji użytkownika

### React Testing Library z user-event

```javascript
// Testowanie formularza logowania
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import LoginForm from './LoginForm';

test('wywołuje onSubmit z danymi logowania po naciśnięciu przycisku', async () => {
  const mockSubmit = jest.fn();
  render(<LoginForm onSubmit={mockSubmit} />);

  // Znajdź pola formularza
  const emailInput = screen.getByLabelText(/email/i);
  const passwordInput = screen.getByLabelText(/hasło/i);
  const submitButton = screen.getByRole('button', { name: /zaloguj/i });

  // Wpisz dane
  await userEvent.type(emailInput, 'test@example.com');
  await userEvent.type(passwordInput, 'password123');

  // Kliknij przycisk
  await userEvent.click(submitButton);

  // Sprawdź czy funkcja została wywołana z odpowiednimi argumentami
  expect(mockSubmit).toHaveBeenCalledWith({
    email: 'test@example.com',
    password: 'password123',
  });
});
```

### Playwright

```javascript
// Testowanie formularza logowania
import { test, expect } from '@playwright/experimental-ct-react';
import LoginForm from './LoginForm';

test('wywołuje onSubmit z danymi logowania po naciśnięciu przycisku', async ({ mount }) => {
  const onSubmitMock = { submit: ({ email, password }) => {} };
  const submitSpy = test.spyOn(onSubmitMock, 'submit');

  const component = await mount(<LoginForm onSubmit={onSubmitMock.submit} />);

  // Wpisz dane
  await component.getByLabel(/email/i).fill('test@example.com');
  await component.getByLabel(/hasło/i).fill('password123');

  // Kliknij przycisk
  await component.getByRole('button', { name: /zaloguj/i }).click();

  // Sprawdź czy funkcja została wywołana z odpowiednimi argumentami
  expect(submitSpy).toHaveBeenCalledWith({
    email: 'test@example.com',
    password: 'password123',
  });
});
```

## Testowanie asynchroniczne

### React Testing Library

```javascript
// Testowanie ładowania danych
import { render, screen, waitFor } from '@testing-library/react';
import UserList from './UserList';
import { fetchUsers } from './api';

// Mockowanie modułu API
jest.mock('./api');

test('wyświetla listę użytkowników po załadowaniu', async () => {
  // Przygotowanie mocka
  fetchUsers.mockResolvedValueOnce([
    { id: 1, name: 'Jan Kowalski' },
    { id: 2, name: 'Anna Nowak' },
  ]);

  render(<UserList />);

  // Sprawdzenie czy loader jest wyświetlany
  expect(screen.getByText(/ładowanie/i)).toBeInTheDocument();

  // Czekanie na dane
  await waitFor(() => {
    expect(screen.getByText('Jan Kowalski')).toBeInTheDocument();
    expect(screen.getByText('Anna Nowak')).toBeInTheDocument();
    expect(screen.queryByText(/ładowanie/i)).not.toBeInTheDocument();
  });
});
```

### Playwright

```javascript
// Testowanie ładowania danych
import { test, expect } from '@playwright/experimental-ct-react';
import { MockedApiProvider } from './test-utils';
import UserList from './UserList';

test('wyświetla listę użytkowników po załadowaniu', async ({ mount }) => {
  // Dane do mocka
  const mockUsers = [
    { id: 1, name: 'Jan Kowalski' },
    { id: 2, name: 'Anna Nowak' },
  ];

  // Renderowanie komponentu z prowiderem mocka
  const component = await mount(
    <MockedApiProvider
      mocks={{
        fetchUsers: async () => mockUsers,
      }}
    >
      <UserList />
    </MockedApiProvider>,
  );

  // Sprawdzenie czy loader jest wyświetlany
  await expect(component.getByText(/ładowanie/i)).toBeVisible();

  // Czekanie na dane
  await expect(component.getByText('Jan Kowalski')).toBeVisible();
  await expect(component.getByText('Anna Nowak')).toBeVisible();

  // Sprawdzenie czy loader zniknął
  await expect(component.getByText(/ładowanie/i)).not.toBeVisible();
});
```

## Mockowanie i izolacja testów

### React Testing Library

```javascript
// Mockowanie kontekstu React
import { render, screen, fireEvent } from '@testing-library/react';
import { ThemeContext } from './ThemeContext';
import ThemeSwitcher from './ThemeSwitcher';

test('przełącza motyw', () => {
  const mockSetTheme = jest.fn();

  render(
    <ThemeContext.Provider value={{ theme: 'light', setTheme: mockSetTheme }}>
      <ThemeSwitcher />
    </ThemeContext.Provider>,
  );

  // Kliknij przycisk przełącznika
  fireEvent.click(screen.getByRole('button', { name: /zmień motyw/i }));

  // Kliknij przycisk przełącznika
  fireEvent.click(screen.getByRole('button', { name: /zmień motyw/i }));

  // Sprawdź czy funkcja została wywołana z odpowiednim argumentem
  expect(mockSetTheme).toHaveBeenCalledWith('dark');
});

// Mockowanie modułów
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import WeatherWidget from './WeatherWidget';
import { getWeather } from './weatherService';

// Mockowanie modułu serwisu pogodowego
jest.mock('./weatherService');

test('wyświetla informacje o pogodzie po wyszukaniu miasta', async () => {
  // Ustawienie mocka
  getWeather.mockResolvedValueOnce({
    temperature: 21,
    conditions: 'Słonecznie',
    humidity: 45,
  });

  render(<WeatherWidget />);

  // Wprowadzenie nazwy miasta
  await userEvent.type(screen.getByLabelText(/miasto/i), 'Warszawa');

  // Kliknięcie przycisku wyszukiwania
  await userEvent.click(screen.getByRole('button', { name: /sprawdź/i }));

  // Oczekiwanie na wyniki
  expect(await screen.findByText(/temperatura: 21°C/i)).toBeInTheDocument();
  expect(screen.getByText(/warunki: słonecznie/i)).toBeInTheDocument();
  expect(screen.getByText(/wilgotność: 45%/i)).toBeInTheDocument();

  // Sprawdzenie czy serwis został wywołany z odpowiednim argumentem
  expect(getWeather).toHaveBeenCalledWith('Warszawa');
});
```

### Playwright

```javascript
// Mockowanie kontekstu React
import { test, expect } from '@playwright/experimental-ct-react';
import { ThemeContext } from './ThemeContext';
import ThemeSwitcher from './ThemeSwitcher';

test('przełącza motyw', async ({ mount }) => {
  const mockContextValue = {
    theme: 'light',
    setTheme: test.fn(),
  };

  const component = await mount(
    <ThemeContext.Provider value={mockContextValue}>
      <ThemeSwitcher />
    </ThemeContext.Provider>,
  );

  // Kliknij przycisk przełącznika
  await component.getByRole('button', { name: /zmień motyw/i }).click();

  // Sprawdź czy funkcja została wywołana z odpowiednim argumentem
  expect(mockContextValue.setTheme).toHaveBeenCalledWith('dark');
});

// Mockowanie żądań HTTP
import { test, expect } from '@playwright/experimental-ct-react';
import WeatherWidget from './WeatherWidget';

test('wyświetla informacje o pogodzie po wyszukaniu miasta', async ({ mount, page }) => {
  // Przygotowanie mocka dla API
  await page.route('**/api/weather?city=**', (route) => {
    route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({
        temperature: 21,
        conditions: 'Słonecznie',
        humidity: 45,
      }),
    });
  });

  const component = await mount(<WeatherWidget />);

  // Wprowadzenie nazwy miasta
  await component.getByLabel(/miasto/i).fill('Warszawa');

  // Kliknięcie przycisku wyszukiwania
  await component.getByRole('button', { name: /sprawdź/i }).click();

  // Oczekiwanie na wyniki
  await expect(component.getByText(/temperatura: 21°C/i)).toBeVisible();
  await expect(component.getByText(/warunki: słonecznie/i)).toBeVisible();
  await expect(component.getByText(/wilgotność: 45%/i)).toBeVisible();
});
```

## Debugowanie testów

### React Testing Library

```javascript
// Debugowanie testów
import { render, screen } from '@testing-library/react';
import ComplexComponent from './ComplexComponent';

test('renderuje złożony komponent', () => {
  render(<ComplexComponent />);

  // Wyświetlenie struktury DOM do konsoli
  screen.debug();

  // Wyświetlenie konkretnego elementu
  const header = screen.getByRole('heading', { name: /tytuł/i });
  screen.debug(header);

  // Logowanie dostępnych elementów i ich ról
  console.log(screen.logTestingPlaygroundURL());
});
```

### Playwright

```javascript
// Debugowanie testów
import { test, expect } from '@playwright/experimental-ct-react';
import ComplexComponent from './ComplexComponent';

test('renderuje złożony komponent', async ({ mount, page }) => {
  const component = await mount(<ComplexComponent />);

  // Przechwycenie zrzutu ekranu
  await page.screenshot({ path: 'screenshot.png' });

  // Włączenie trybu debugowania
  await page.pause();

  // Inspekcja dostępności
  const snapshot = await page.accessibility.snapshot();
  console.log(JSON.stringify(snapshot, null, 2));

  // Sprawdzenie struktury DOM
  const html = await page.content();
  console.log(html);
});
```

## Narzędzia debugowania

| Narzędzie                | React Testing Library          | Playwright                      |
| ------------------------ | ------------------------------ | ------------------------------- |
| Podgląd DOM              | `screen.debug()`               | `page.content()`                |
| Zrzuty ekranu            | Nie wspierane natywnie         | `page.screenshot()`             |
| Inspekcja dostępności    | `logRoles()`                   | `page.accessibility.snapshot()` |
| Interaktywne debugowanie | Przez breakpointy w IDE        | `page.pause()`                  |
| Nagrywanie wideo         | Nie wspierane natywnie         | `recordVideo` w konfiguracji    |
| Inspekcja konsoli        | `jest.spyOn(console, 'error')` | `page.on('console')`            |

## Wydajność i skalowalność

### Porównanie wydajności

| Aspekt                         | React Testing Library                       | Playwright                                    |
| ------------------------------ | ------------------------------------------- | --------------------------------------------- |
| Czas uruchomienia              | Szybki - uruchamia się w środowisku Node.js | Wolniejszy - wymaga uruchomienia przeglądarki |
| Pamięć                         | Niskie zużycie                              | Wyższe zużycie ze względu na przeglądarkę     |
| Równoległość                   | Wspierana przez Jest                        | Wbudowana obsługa shardingu                   |
| Odizolowanie testów            | Domyślnie izolowane                         | Możliwe współdzielenie stanu między testami   |
| Obsługa dużych zestawów testów | Dobra, ale może wymagać konfiguracji        | Bardzo dobra, z wbudowanymi narzędziami       |

### Przykłady optymalizacji

```javascript
// Optymalizacja RTL - ponowne użycie renderowania
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import Counter from './Counter';

describe('Counter', () => {
  const user = userEvent.setup();

  beforeEach(() => {
    render(<Counter />);
  });

  test('pokazuje początkową wartość', () => {
    expect(screen.getByText(/licznik: 0/i)).toBeInTheDocument();
  });

  test('zwiększa wartość po kliknięciu', async () => {
    await user.click(screen.getByRole('button', { name: /zwiększ/i }));
    expect(screen.getByText(/licznik: 1/i)).toBeInTheDocument();
  });
});
```

```javascript
// Optymalizacja Playwright - współdzielenie stanu
import { test, expect } from '@playwright/experimental-ct-react';
import ComplexApp from './ComplexApp';

// Wykonanie kosztownych operacji tylko raz
test.beforeAll(async ({ browser }) => {
  const page = await browser.newPage();
  await page.goto('http://localhost:3000');
  await page.evaluate(() => localStorage.setItem('token', 'test-token'));
  await page.context().storageState({ path: 'state.json' });
  await page.close();
});

// Użycie zapisanego stanu
test.use({
  storageState: 'state.json',
});

test('renderuje aplikację z zapisanym stanem', async ({ mount }) => {
  const component = await mount(<ComplexApp />);
  await expect(component.getByText(/zalogowany/i)).toBeVisible();
});
```

## Integracja z CI/CD

### React Testing Library w CI/CD

```yaml
# .github/workflows/react-testing-library.yml
name: RTL Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18.x'
          cache: 'npm'
      - run: npm ci
      - run: npm test
      - name: Upload test results
        uses: actions/upload-artifact@v3
        with:
          name: test-results
          path: coverage/
```

### Playwright w CI/CD

```yaml
# .github/workflows/playwright.yml
name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18.x'
          cache: 'npm'
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npm run test:ct
      - name: Upload test results
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: playwright-report
          path: playwright-report/
```

## Porównanie na podstawie realnych scenariuszy

### Scenariusz 1: Testowanie formularza z walidacją

```javascript
// Komponent FormWithValidation
import React, { useState } from 'react';

const FormWithValidation = ({ onSubmit }) => {
  const [formData, setFormData] = useState({
    name: '',
    email: '',
    password: '',
  });
  const [errors, setErrors] = useState({});

  const validate = () => {
    const newErrors = {};
    if (!formData.name) newErrors.name = 'Imię jest wymagane';
    if (!formData.email) newErrors.email = 'Email jest wymagany';
    if (!formData.email.includes('@')) newErrors.email = 'Nieprawidłowy format email';
    if (formData.password.length < 8) newErrors.password = 'Hasło musi mieć co najmniej 8 znaków';
    return newErrors;
  };

  const handleChange = (e) => {
    const { name, value } = e.target;
    setFormData((prev) => ({ ...prev, [name]: value }));
  };

  const handleSubmit = (e) => {
    e.preventDefault();
    const newErrors = validate();
    if (Object.keys(newErrors).length === 0) {
      onSubmit(formData);
    } else {
      setErrors(newErrors);
    }
  };

  return (
    <form onSubmit={handleSubmit} noValidate>
      <div>
        <label htmlFor="name">Imię:</label>
        <input
          id="name"
          name="name"
          value={formData.name}
          onChange={handleChange}
          aria-invalid={!!errors.name}
        />
        {errors.name && <p role="alert">{errors.name}</p>}
      </div>

      <div>
        <label htmlFor="email">Email:</label>
        <input
          id="email"
          name="email"
          type="email"
          value={formData.email}
          onChange={handleChange}
          aria-invalid={!!errors.email}
        />
        {errors.email && <p role="alert">{errors.email}</p>}
      </div>

      <div>
        <label htmlFor="password">Hasło:</label>
        <input
          id="password"
          name="password"
          type="password"
          value={formData.password}
          onChange={handleChange}
          aria-invalid={!!errors.password}
        />
        {errors.password && <p role="alert">{errors.password}</p>}
      </div>

      <button type="submit">Zapisz</button>
    </form>
  );
};

export default FormWithValidation;
```

#### React Testing Library

```javascript
// Test formularza z walidacją - RTL
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import FormWithValidation from './FormWithValidation';

describe('FormWithValidation', () => {
  test('wyświetla błędy walidacji przy pustych polach', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Kliknięcie przycisku submit bez wypełniania pól
    await user.click(screen.getByRole('button', { name: /zapisz/i }));

    // Sprawdzenie czy wyświetlane są komunikaty o błędach
    expect(screen.getByText(/imię jest wymagane/i)).toBeInTheDocument();
    expect(screen.getByText(/email jest wymagany/i)).toBeInTheDocument();
    expect(screen.getByText(/hasło musi mieć co najmniej 8 znaków/i)).toBeInTheDocument();

    // Sprawdzenie czy funkcja onSubmit nie została wywołana
    expect(handleSubmit).not.toHaveBeenCalled();
  });

  test('wysyła formularz z poprawnymi danymi', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Wypełnienie formularza poprawnymi danymi
    await user.type(screen.getByLabelText(/imię/i), 'Jan Kowalski');
    await user.type(screen.getByLabelText(/email/i), 'jan@example.com');
    await user.type(screen.getByLabelText(/hasło/i), 'haslo12345');

    // Kliknięcie przycisku submit
    await user.click(screen.getByRole('button', { name: /zapisz/i }));

    // Sprawdzenie czy funkcja onSubmit została wywołana z odpowiednimi danymi
    expect(handleSubmit).toHaveBeenCalledWith({
      name: 'Jan Kowalski',
      email: 'jan@example.com',
      password: 'haslo12345',
    });
  });

  test('wyświetla błąd przy niepoprawnym formacie email', async () => {
    const user = userEvent.setup();
    const handleSubmit = jest.fn();

    render(<FormWithValidation onSubmit={handleSubmit} />);

    // Wypełnienie formularza z niepoprawnym email
    await user.type(screen.getByLabelText(/imię/i), 'Jan Kowalski');
    await user.type(screen.getByLabelText(/email/i), 'niepoprawny-email');
    await user.type(screen.getByLabelText(/hasło/i), 'haslo12345');

    // Kliknięcie przycisku submit
    await user.click(screen.getByRole('button', { name: /zapisz/i }));

    // Sprawdzenie czy wyświetlany jest komunikat o błędzie
    expect(screen.getByText(/nieprawidłowy format email/i)).toBeInTheDocument();

    // Sprawdzenie czy funkcja onSubmit nie została wywołana
    expect(handleSubmit).not.toHaveBeenCalled();
  });
});
```

#### Playwright

```javascript
// Test formularza z walidacją - Playwright
import { test, expect } from '@playwright/experimental-ct-react';
import FormWithValidation from './FormWithValidation';

test.describe('FormWithValidation', () => {
  test('wyświetla błędy walidacji przy pustych polach', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Kliknięcie przycisku submit bez wypełniania pól
    await component.getByRole('button', { name: /zapisz/i }).click();

    // Sprawdzenie czy wyświetlane są komunikaty o błędach
    await expect(component.getByText(/imię jest wymagane/i)).toBeVisible();
    await expect(component.getByText(/email jest wymagany/i)).toBeVisible();
    await expect(component.getByText(/hasło musi mieć co najmniej 8 znaków/i)).toBeVisible();

    // Sprawdzenie czy funkcja onSubmit nie została wywołana
    expect(submitSpy).not.toHaveBeenCalled();
  });

  test('wysyła formularz z poprawnymi danymi', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Wypełnienie formularza poprawnymi danymi
    await component.getByLabel(/imię/i).fill('Jan Kowalski');
    await component.getByLabel(/email/i).fill('jan@example.com');
    await component.getByLabel(/hasło/i).fill('haslo12345');

    // Kliknięcie przycisku submit
    await component.getByRole('button', { name: /zapisz/i }).click();

    // Sprawdzenie czy funkcja onSubmit została wywołana z odpowiednimi danymi
    expect(submitSpy).toHaveBeenCalledWith({
      name: 'Jan Kowalski',
      email: 'jan@example.com',
      password: 'haslo12345',
    });
  });

  test('wyświetla błąd przy niepoprawnym formacie email', async ({ mount }) => {
    const onSubmitMock = { submit: (data) => {} };
    const submitSpy = test.spyOn(onSubmitMock, 'submit');

    const component = await mount(<FormWithValidation onSubmit={onSubmitMock.submit} />);

    // Wypełnienie formularza z niepoprawnym email
    await component.getByLabel(/imię/i).fill('Jan Kowalski');
    await component.getByLabel(/email/i).fill('niepoprawny-email');
    await component.getByLabel(/hasło/i).fill('haslo12345');

    // Kliknięcie przycisku submit
    await component.getByRole('button', { name: /zapisz/i }).click();

    // Sprawdzenie czy wyświetlany jest komunikat o błędzie
    await expect(component.getByText(/nieprawidłowy format email/i)).toBeVisible();

    // Sprawdzenie czy funkcja onSubmit nie została wywołana
    expect(submitSpy).not.toHaveBeenCalled();
  });
});
```

### Scenariusz 2: Testowanie komponentu z asynchronicznym ładowaniem danych

```javascript
// Komponent DataFetcher
import React, { useState, useEffect } from 'react';

const DataFetcher = ({ url, renderItem }) => {
  const [data, setData] = useState([]);
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState(null);

  useEffect(() => {
    let isMounted = true;

    const fetchData = async () => {
      try {
        setLoading(true);
        const response = await fetch(url);

        if (!response.ok) {
          throw new Error(`HTTP error! status: ${response.status}`);
        }

        const result = await response.json();

        if (isMounted) {
          setData(result);
          setError(null);
        }
      } catch (err) {
        if (isMounted) {
          setError(err.message);
        }
      } finally {
        if (isMounted) {
          setLoading(false);
        }
      }
    };

    fetchData();

    return () => {
      isMounted = false;
    };
  }, [url]);

  if (loading) {
    return <div data-testid="loading">Ładowanie danych...</div>;
  }

  if (error) {
    return <div data-testid="error">Błąd: {error}</div>;
  }

  if (data.length === 0) {
    return <div data-testid="empty">Brak dostępnych danych</div>;
  }

  return (
    <ul data-testid="data-list">
      {data.map((item, index) => (
        <li key={index}>{renderItem(item)}</li>
      ))}
    </ul>
  );
};

export default DataFetcher;
```

#### React Testing Library

```javascript
// Test komponentu z asynchronicznym ładowaniem danych - RTL
import { render, screen, waitForElementToBeRemoved } from '@testing-library/react';
import DataFetcher from './DataFetcher';

// Mocki globalnych funkcji
global.fetch = jest.fn();

describe('DataFetcher', () => {
  beforeEach(() => {
    global.fetch.mockClear();
  });

  test('wyświetla stan ładowania, a następnie dane', async () => {
    // Przygotowanie mocka
    global.fetch.mockResolvedValueOnce({
      ok: true,
      json: async () => [{ name: 'Item 1' }, { name: 'Item 2' }],
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Sprawdzenie czy wyświetlany jest stan ładowania
    expect(screen.getByTestId('loading')).toBeInTheDocument();

    // Czekanie na zakończenie ładowania
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Sprawdzenie czy dane są wyświetlane
    expect(screen.getByTestId('data-list')).toBeInTheDocument();
    expect(screen.getByText('Item 1')).toBeInTheDocument();
    expect(screen.getByText('Item 2')).toBeInTheDocument();

    // Sprawdzenie czy fetch został wywołany z odpowiednim URL
    expect(global.fetch).toHaveBeenCalledWith('/api/data');
  });

  test('obsługuje błąd podczas ładowania danych', async () => {
    // Przygotowanie mocka z błędem
    global.fetch.mockResolvedValueOnce({
      ok: false,
      status: 500,
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Czekanie na zakończenie ładowania
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Sprawdzenie czy wyświetlany jest błąd
    expect(screen.getByTestId('error')).toBeInTheDocument();
    expect(screen.getByText(/HTTP error! status: 500/i)).toBeInTheDocument();
  });

  test('obsługuje pustą listę danych', async () => {
    // Przygotowanie mocka z pustą tablicą
    global.fetch.mockResolvedValueOnce({
      ok: true,
      json: async () => [],
    });

    render(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Czekanie na zakończenie ładowania
    await waitForElementToBeRemoved(() => screen.queryByTestId('loading'));

    // Sprawdzenie czy wyświetlana jest informacja o braku danych
    expect(screen.getByTestId('empty')).toBeInTheDocument();
    expect(screen.getByText(/brak dostępnych danych/i)).toBeInTheDocument();
  });
});
```

#### Playwright

```javascript
// Test komponentu z asynchronicznym ładowaniem danych - Playwright
import { test, expect } from '@playwright/experimental-ct-react';
import DataFetcher from './DataFetcher';

test.describe('DataFetcher', () => {
  test('wyświetla stan ładowania, a następnie dane', async ({ mount, page }) => {
    // Mockowanie odpowiedzi API
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify([{ name: 'Item 1' }, { name: 'Item 2' }]),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Sprawdzenie czy wyświetlany jest stan ładowania
    await expect(component.getByTestId('loading')).toBeVisible();

    // Czekanie na zakończenie ładowania i sprawdzenie rezultatów
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('data-list')).toBeVisible();
    await expect(component.getByText('Item 1')).toBeVisible();
    await expect(component.getByText('Item 2')).toBeVisible();
  });

  test('obsługuje błąd podczas ładowania danych', async ({ mount, page }) => {
    // Mockowanie błędu API
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 500,
        contentType: 'application/json',
        body: JSON.stringify({ error: 'Internal Server Error' }),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Czekanie na zakończenie ładowania i sprawdzenie błędu
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('error')).toBeVisible();
    await expect(component.getByText(/HTTP error! status: 500/i)).toBeVisible();
  });

  test('obsługuje pustą listę danych', async ({ mount, page }) => {
    // Mockowanie pustej odpowiedzi API
    await page.route('**/api/data', (route) => {
      route.fulfill({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify([]),
      });
    });

    const component = await mount(<DataFetcher url="/api/data" renderItem={(item) => item.name} />);

    // Czekanie na zakończenie ładowania i sprawdzenie informacji o braku danych
    await expect(component.getByTestId('loading')).not.toBeVisible();
    await expect(component.getByTestId('empty')).toBeVisible();
    await expect(component.getByText(/brak dostępnych danych/i)).toBeVisible();
  });
});
```

# Podsumowanie i rekomendacje

Zarówno React Testing Library (RTL) jak i Playwright oferują solidne rozwiązania do testowania aplikacji React, choć każde z nich ma swoje optymalne zastosowania. Poniżej przedstawiam kompleksowe podsumowanie i rekomendacje dotyczące wyboru odpowiedniego narzędzia.

## Diagram przepływu decyzji

```mermaid
flowchart TB
    A[Start] --> B{Jaki typ testu?}
    B -->|Komponenty i logika| C{Wymagana prawdziwa przeglądarka?}
    B -->|Integracja i E2E| D[Playwright]
    C -->|Nie| E[React Testing Library]
    C -->|Tak| F{Wiele przeglądarek?}
    F -->|Nie| G{Potrzeba zaawansowanych interakcji?}
    F -->|Tak| D
    G -->|Tak| D
    G -->|Nie| E

    style D fill:#c9e3ff,stroke:#4a86e8
    style E fill:#d9ead3,stroke:#6aa84f

    subgraph H[Legenda]
    I[RTL: Szybszy, dla testów jednostkowych i integracyjnych komponentów]
    J[Playwright: Dla testów E2E, symulacji wielu przeglądarek i zaawansowanych interakcji]
    end

```

## Tabela porównawcza

| Aspekt                              | React Testing Library                | Playwright                              |
| ----------------------------------- | ------------------------------------ | --------------------------------------- |
| Typ testów                          | Jednostkowe, integracyjne            | Integracyjne, E2E, komponentowe         |
| Środowisko uruchomieniowe           | JSDOM (symulacja DOM)                | Prawdziwe przeglądarki                  |
| Wsparcie dla przeglądarek           | Symulacja w JSDOM                    | Chrome, Firefox, Safari, Edge           |
| Złożoność konfiguracji              | Niska                                | Średnia                                 |
| Szybkość wykonania                  | Wysoka                               | Średnia                                 |
| Debugowanie                         | Dobre                                | Bardzo dobre                            |
| Mockowanie zależności               | Łatwe (w środowisku JS)              | Bardziej skomplikowane                  |
| Interakcje użytkownika              | Podstawowe do średnio zaawansowanych | Zaawansowane i kompleksowe              |
| Testowanie wizualne                 | Ograniczone                          | Rozbudowane (zrzuty ekranu, porównania) |
| Wydajność przy dużej liczbie testów | Bardzo dobra                         | Dobra                                   |
| Integracja z CI/CD                  | Prosta                               | Wymaga dodatkowej konfiguracji          |
| Krzywa uczenia                      | Płaska                               | Bardziej stroma                         |

## Kiedy wybrać React Testing Library:

1. **Testowanie komponentów**: Gdy głównym celem jest testowanie izolowanych komponentów React.
2. **Szybkie testy**: Gdy kluczowa jest szybkość wykonania testów i chcesz mieć jak najszybszą pętlę zwrotną.
3. **Testy jednostkowe i integracyjne**: Gdy koncentrujesz się na weryfikacji logiki biznesowej i podstawowych interakcji użytkownika.
4. **Prostota konfiguracji**: Dla zespołów, które potrzebują szybko wdrożyć testy bez złożonej konfiguracji.
5. **Małe i średnie projekty**: Szczególnie efektywne w projektach, gdzie architektura komponentów jest dobrze zdefiniowana.

## Kiedy wybrać Playwright:

1. **Testy E2E**: Gdy potrzebujesz testować cały przepływ użytkownika od początku do końca.
2. **Testowanie w wielu przeglądarkach**: Gdy musisz zapewnić kompatybilność z różnymi przeglądarkami.
3. **Zaawansowane interakcje**: Gdy testujesz złożone interakcje użytkownika, wymagające precyzyjnej kontroli nad przeglądarką.
4. **Testowanie wizualne**: Gdy potrzebujesz porównywać wygląd interfejsu użytkownika.
5. **Duże projekty i aplikacje wieloplatformowe**: Dla kompleksowych projektów z wieloma przepływami użytkownika.

## Najlepsze praktyki:

1. **Podejście hybrydowe**: W większości projektów najlepiej sprawdza się połączenie obu narzędzi - RTL do testów jednostkowych i komponentowych, Playwright do testów E2E.
2. **Piramida testów**: Zachowaj klasyczną piramidę testów z większą liczbą testów jednostkowych (RTL) i mniejszą liczbą testów E2E (Playwright).
3. **Automatyzacja CI/CD**: Skonfiguruj oba narzędzia w swoim potoku CI/CD, aby zapewnić kompleksową weryfikację.
4. **Optymalizacja wydajności**: Dla dużych projektów rozważ równoległe wykonywanie testów, szczególnie w przypadku Playwright.
5. **Wspólny język**: Niezależnie od wybranego narzędzia, stosuj spójne podejście do nazywania i organizacji testów.

## Wnioski końcowe:

React Testing Library i Playwright nie są konkurencyjnymi narzędziami, lecz raczej uzupełniającymi się rozwiązaniami w ekosystemie testowania React. RTL doskonale sprawdza się w testowaniu komponentów i ich logiki, zapewniając szybkie wykonanie i prostą konfigurację. Playwright natomiast oferuje kompleksowe rozwiązanie do testów E2E z zaawansowanymi funkcjami kontroli przeglądarki.

Wybór odpowiedniego narzędzia powinien być podyktowany kontekstem projektu, wymaganiami dotyczącymi testów oraz zasobami zespołu. W idealnym scenariuszu, strategia testowania powinna obejmować zarówno testy komponentów z RTL, jak i testy E2E z Playwright, tworząc solidną podstawę do zapewnienia jakości aplikacji React.

---


## OVH Server - Zero to Hero

**URL:** https://portfolio.sdet.it/articles/ovh-server-zero-to-hero
**Published:** 2025-03-27
**Language:** en
Tags: ovh, vps, infrastructure, devops

A comprehensive configuration manual for OVH VPS - from bare install to production-ready infrastructure.

Below is a detailed guide on how to configure an OVH server from zero, covering all the basic steps.

## Table of Contents

1. [Basic User and Security Configuration](#1-basic-user-and-security-configuration)
2. [Docker Environment Installation and Configuration](#2-docker-environment-installation-and-configuration)
3. [NGINX Configuration as a Reverse Proxy](#3-nginx-configuration-as-a-reverse-proxy)
4. [Let's Encrypt Implementation for SSL](#4-lets-encrypt-implementation-for-ssl)
5. [Monitoring Stack Configuration](#5-monitoring-stack-configuration)
6. [File Server Configuration (Nextcloud)](#6-file-server-configuration-nextcloud)
7. [Mail Server Configuration](#7-mail-server-configuration)
8. [Portfolio Configuration](#8-portfolio-configuration)
9. [Additional Information](#9-additional-information)

## 1. Basic User and Security Configuration

### 1.1 Creating a `deployer` User

```bash
# Add a new user
sudo adduser deployer

# Add the user to the sudo group
sudo usermod -aG sudo deployer
```

### 1.2 SSH and sudo Configuration for the New User

```bash
# Switch to the deployer user
sudo su - deployer

# Create the .ssh directory and set appropriate permissions
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Create the authorized_keys file
touch ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Add your public SSH key to authorized_keys
echo "YOUR_SSH_PUBLIC_KEY" > ~/.ssh/authorized_keys

# Return to the previous user
exit
```

### 1.3 Securing SSH Configuration

```bash
# Edit the SSH configuration file
sudo nano /etc/ssh/sshd_config
```

Modify the following lines:

```
# Disable root login
PermitRootLogin no

# Disable password login (keys only)
PasswordAuthentication no

# Specify SSH port (optionally, you can change from 22 to another)
Port 22

# Idle time limitations
ClientAliveInterval 300
ClientAliveCountMax 2
```

Restart the SSH service:

```bash
sudo systemctl restart sshd
```

### 1.4 sudo Configuration for the deployer User

```bash
# Create a sudo configuration file for the deployer user
sudo visudo -f /etc/sudoers.d/deployer
```

Add the following line:

```
deployer ALL=(ALL) NOPASSWD: ALL
```

### 1.5 Firewall Configuration (UFW)

```bash
# Install UFW
sudo apt update
sudo apt install ufw

# Set default rules
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow ssh

# Allow HTTP/HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable the firewall
sudo ufw enable

# Check status
sudo ufw status verbose
```

## 2. Docker Environment Installation and Configuration

### 2.1 Docker Engine Installation

```bash
# Update packages
sudo apt update

# Install required packages
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common

# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Add Docker repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Update package list
sudo apt update

# Install Docker
sudo apt install -y docker-ce docker-ce-cli containerd.io
```

### 2.2 Docker Compose Installation

```bash
# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

# Set execute permissions
sudo chmod +x /usr/local/bin/docker-compose

# Check version
docker-compose --version
```

### 2.3 Docker Permissions Configuration for the User

```bash
# Add the user to the docker group
sudo usermod -aG docker $USER
sudo usermod -aG docker deployer

# Apply changes (log out and log in again)
# or run the following command
newgrp docker
```

### 2.4 Creating Directory Structure for Containers

```bash
# Create main Docker directories
sudo mkdir -p /opt/docker/monitoring/grafana
sudo mkdir -p /opt/docker/monitoring/loki
sudo mkdir -p /opt/docker/monitoring/prometheus
sudo mkdir -p /opt/docker/file
sudo mkdir -p /opt/docker/mail
sudo mkdir -p /opt/docker/apps/portfolio

# Change directory owner
sudo chown -R $USER:$USER /opt/docker

# Set appropriate permissions
sudo chmod -R 755 /opt/docker
```

## 3. NGINX Configuration as a Reverse Proxy

### 3.1 NGINX Installation

```bash
# Update packages
sudo apt update

# Install NGINX
sudo apt install -y nginx

# Start NGINX and enable autostart
sudo systemctl start nginx
sudo systemctl enable nginx
```

### 3.2 Basic NGINX Configuration

```bash
# Create directories for configuration
sudo mkdir -p /etc/nginx/sites-available
sudo mkdir -p /etc/nginx/sites-enabled
sudo mkdir -p /etc/nginx/snippets

# Edit the main configuration file
sudo nano /etc/nginx/nginx.conf
```

Paste the following configuration:

```nginx
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 1024;
    multi_accept on;
}

http {
    # Basic settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    # MIME
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logs
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    # Gzip
    gzip on;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Including configurations
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;

    # Proxy timeout limits
    proxy_connect_timeout 300;
    proxy_send_timeout 300;
    proxy_read_timeout 300;
    send_timeout 300;
}
```

### 3.3 Preparing SSL Snippets

```bash
# Creating SSL snippet
sudo nano /etc/nginx/snippets/ssl-params.conf
```

Paste the following configuration:

```nginx
# SSL/TLS protocols
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;

# Diffie-Hellman parameters
ssl_dhparam /etc/nginx/dhparam.pem;

# SSL sessions
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;

# HSTS (15768000 seconds = 6 months)
add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload";

# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;

# DNS resolver
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

# Additional security headers
add_header X-Frame-Options SAMEORIGIN;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
```

### 3.4 Generating Strong Diffie-Hellman Parameters

```bash
# Generating Diffie-Hellman parameters
sudo openssl dhparam -out /etc/nginx/dhparam.pem 2048
```

### 3.5 Creating Default Domain Configuration

```bash
# Creating configuration for the main domain
sudo nano /etc/nginx/sites-available/example.com  # Change to your domain
```

Paste the following configuration (change the domain to yours):

```nginx
server {
    listen 80;
    listen [::]:80;

    server_name example.com www.example.com;  # Change to your domain

    # Redirect to HTTPS (will be activated after SSL configuration)
    # return 301 https://$host$request_uri;

    location / {
        root /var/www/html;
        index index.html index.htm;
    }
}
```

```bash
# Create a directory for HTML files
sudo mkdir -p /var/www/html

# Create a basic index.html file
sudo nano /var/www/html/index.html
```

Add a simple HTML file:

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Site Under Construction</title>
    <style>
      body {
        font-family: Arial, sans-serif;
        text-align: center;
        padding: 50px;
      }
      h1 {
        color: #333;
      }
    </style>
  </head>
  <body>
    <h1>Site Under Construction</h1>
    <p>Server is working correctly. Site is being configured.</p>
  </body>
</html>
```

Activating the configuration:

```bash
sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/  # Change to your domain
sudo rm -f /etc/nginx/sites-enabled/default  # Removing default configuration
sudo nginx -t  # Checking syntax
sudo systemctl reload nginx  # Reloading configuration
```

## 4. Let's Encrypt Implementation for SSL

### 4.1 Certbot Installation

```bash
# Update packages
sudo apt update

# Install Certbot and NGINX plugin
sudo apt install -y certbot python3-certbot-nginx
```

### 4.2 Preparing Configuration for Subdomains

Before obtaining certificates, create a basic NGINX configuration for all subdomains:

```bash
# Create a configuration file for all subdomains
sudo nano /etc/nginx/sites-available/subdomains.example.com  # Change to your domain
```

Paste the following configuration (adjust domain names):

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name prometheus.example.com loki.example.com grafana.example.com mail.example.com files.example.com portfolio.example.com monitoring.example.com;

    location / {
        root /var/www/html;
        index index.html;
    }
}
```

Activate the configuration:

```bash
sudo ln -s /etc/nginx/sites-available/subdomains.example.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```

### 4.3 Obtaining Certificates for Domains

```bash
# Obtaining certificate for the main domain and all subdomains
sudo certbot --nginx -d example.com -d www.example.com -d prometheus.example.com -d loki.example.com -d grafana.example.com -d mail.example.com -d files.example.com -d portfolio.example.com -d monitoring.example.com
```

During the process, you will be asked to:

- Provide an email address for notifications
- Accept the terms of service
- Choose whether to redirect HTTP to HTTPS

### 4.4 Configuring Automatic Certificate Renewal

```bash
# Check if automatic renewal is configured
sudo systemctl status certbot.timer

# Test the renewal process (without actually renewing)
sudo certbot renew --dry-run
```

### 4.5 Configuring Subdomains

After obtaining certificates, create detailed configuration files for each subdomain:

```bash
# Monitoring (Grafana, Prometheus, Loki)
sudo nano /etc/nginx/sites-available/monitoring.example.com  # Change to your domain
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name prometheus.example.com loki.example.com grafana.example.com monitoring.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name monitoring.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name grafana.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name prometheus.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:9090;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name loki.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:3100;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

```bash
# File server
sudo nano /etc/nginx/sites-available/files.example.com  # Change to your domain
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name files.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name files.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "File server placeholder";
    }
}
```

```bash
# Mail server
sudo nano /etc/nginx/sites-available/mail.example.com  # Change to your domain
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name mail.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name mail.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "Mail server placeholder";
    }
}
```

```bash
# Portfolio
sudo nano /etc/nginx/sites-available/portfolio.example.com  # Change to your domain
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name portfolio.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name portfolio.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "Portfolio placeholder";
    }
}
```

Activate all configurations:

```bash
sudo ln -s /etc/nginx/sites-available/monitoring.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/files.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/mail.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/portfolio.example.com /etc/nginx/sites-enabled/

# Remove temporary configuration file
sudo rm /etc/nginx/sites-enabled/subdomains.example.com

sudo nginx -t
sudo systemctl reload nginx
```

## 5. Monitoring Stack Configuration

### 5.1 Preparing docker-compose.yml for the Monitoring Stack

```bash
# Navigate to the monitoring directory
cd /opt/docker/monitoring

# Create docker-compose.yml file
nano docker-compose.yml
```

Paste the following configuration:

```yaml
version: '3.8'

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data: {}
  grafana_data: {}
  # loki_data: {}  # Uncomment if you want to use Loki

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - '9090:9090'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - '9100:9100'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - '8080:8080'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=<strong-password> # Change this to a secure password!
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - '3000:3000'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  # Loki and Promtail are optional - uncomment if you want to use them
  # loki:
  #   image: grafana/loki:latest
  #   container_name: loki
  #   restart: unless-stopped
  #   volumes:
  #     - ./loki/config.yml:/etc/loki/config.yml
  #     - loki_data:/loki
  #   command: -config.file=/etc/loki/config.yml
  #   ports:
  #     - "3100:3100"
  #   networks:
  #     - monitoring
  #   labels:
  #     org.label-schema.group: "monitoring"
  #
  # promtail:
  #   image: grafana/promtail:latest
  #   container_name: promtail
  #   restart: unless-stopped
  #   volumes:
  #     - /var/log:/var/log
  #     - ./loki/promtail-config.yml:/etc/promtail/config.yml
  #   command: -config.file=/etc/promtail/config.yml
  #   networks:
  #     - monitoring
  #   labels:
  #     org.label-schema.group: "monitoring"
```

### 5.2 Preparing Prometheus Configuration

```bash
# Create directory for Prometheus configuration
mkdir -p /opt/docker/monitoring/prometheus

# Create Prometheus configuration file
nano /opt/docker/monitoring/prometheus/prometheus.yml
```

Paste the following configuration:

```yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
```

### 5.3 Preparing Grafana Configuration

```bash
# Create directories for Grafana configuration
mkdir -p /opt/docker/monitoring/grafana/provisioning/datasources
mkdir -p /opt/docker/monitoring/grafana/provisioning/dashboards

# Create data sources configuration file
nano /opt/docker/monitoring/grafana/provisioning/datasources/datasource.yml
```

Paste the following configuration:

```yaml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
```

### 5.4 Starting the Monitoring Stack

```bash
# Start containers
cd /opt/docker/monitoring
docker-compose up -d
```

### 5.5 Securing Access to Prometheus

```bash
# Install password generation tool
sudo apt install -y apache2-utils

# Create password file (replace 'admin_user' and 'your_password' with your own values)
sudo htpasswd -c /etc/nginx/.htpasswd admin_user
```

## 6. File Server Configuration (Nextcloud)

### 6.1 Preparing docker-compose.yml for Nextcloud

```bash
# Create directory for Nextcloud
cd /opt/docker/file

# Create docker-compose.yml file
nano docker-compose.yml
```

Paste the following configuration:

```yaml
version: '3'

volumes:
  nextcloud_data:
  nextcloud_db:

services:
  db:
    image: mariadb:10.6
    container_name: nextcloud-db
    command: --transaction-isolation=READ-COMMITTED --log-bin=binlog --binlog-format=ROW
    restart: always
    volumes:
      - nextcloud_db:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=<strong-db-root-password> # Change this!
      - MYSQL_PASSWORD=<strong-db-password> # Change this!
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    networks:
      - nextcloud_network

  app:
    image: nextcloud:stable
    container_name: nextcloud-app
    restart: always
    depends_on:
      - db
    volumes:
      - nextcloud_data:/var/www/html
    environment:
      - MYSQL_PASSWORD=<strong-db-password> # Change this!
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
      - MYSQL_HOST=db
      - NEXTCLOUD_TRUSTED_DOMAINS=files.example.com # Change to your domain
      - NEXTCLOUD_ADMIN_USER=admin
      - NEXTCLOUD_ADMIN_PASSWORD=<strong-admin-password> # Change this!
    networks:
      - nextcloud_network
    ports:
      - 8081:80 # We use port 8081 because 8080 is already used by cAdvisor

networks:
  nextcloud_network:
```

### 6.2 NGINX Configuration as a Reverse Proxy for Nextcloud

```bash
# Edit configuration file for the file server
sudo nano /etc/nginx/sites-available/files.example.com  # Change to your domain
```

Update the configuration:

````nginx
## 6.2 NGINX Configuration as a Reverse Proxy for Nextcloud (continuation)

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name files.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name files.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    # Nextcloud specific settings
    add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload;" always;
    add_header Referrer-Policy "no-referrer" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Download-Options "noopen" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Permitted-Cross-Domain-Policies "none" always;
    add_header X-Robots-Tag "noindex, nofollow" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Disable size limits for file uploads
    client_max_body_size 0;

    location / {
        proxy_pass http://localhost:8081;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_buffering off;
        proxy_request_buffering off;

        # Increase timeouts for long-running operations
        proxy_connect_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_read_timeout 3600s;
    }
}
````

### 6.3 Starting Nextcloud

```bash
# Start containers
cd /opt/docker/file
docker-compose up -d
```

Check Nextcloud at https://files.example.com and log in using:

- Username: admin
- Password: <strong-admin-password> (or the one you set in docker-compose.yml)

## 7. Mail Server Configuration

### 7.1 Preparing docker-compose.yml for the Mail Server

```bash
# Navigate to the mail server directory
cd /opt/docker/mail

# Create docker-compose.yml file
nano docker-compose.yml
```

Paste the following configuration:

```yaml
version: '3'

networks:
  mail_network:
    driver: bridge

services:
  mailserver:
    image: docker.io/mailserver/docker-mailserver:latest
    container_name: mailserver
    hostname: mail.example.com # Change to your domain
    domainname: example.com # Change to your domain
    ports:
      - '25:25' # SMTP
      - '143:143' # IMAP
      - '587:587' # Submission
      - '993:993' # IMAPS
    volumes:
      - ./mail-data:/var/mail
      - ./mail-state:/var/mail-state
      - ./config:/tmp/docker-mailserver/
      - /etc/localtime:/etc/localtime:ro
    environment:
      - ENABLE_SPAMASSASSIN=1
      - ENABLE_CLAMAV=1
      - ENABLE_FAIL2BAN=1
      - SSL_TYPE=manual
      - SSL_CERT_PATH=/tmp/docker-mailserver/cert/cert.pem
      - SSL_KEY_PATH=/tmp/docker-mailserver/cert/privkey.pem
      - PERMIT_DOCKER=connected-networks
    networks:
      mail_network:
    restart: always

  webmail:
    image: roundcube/roundcubemail:latest
    container_name: roundcube
    depends_on:
      - mailserver
    environment:
      - ROUNDCUBEMAIL_DEFAULT_HOST=mailserver
      - ROUNDCUBEMAIL_DEFAULT_PORT=143
      - ROUNDCUBEMAIL_SMTP_SERVER=mailserver
      - ROUNDCUBEMAIL_SMTP_PORT=587
    ports:
      - '8082:80'
    networks:
      mail_network:
    restart: always
```

### 7.2 Preparing Certificates for the Mail Server

```bash
# Create directory structure
mkdir -p /opt/docker/mail/config/cert

# Copy certificates
sudo cp /etc/letsencrypt/live/example.com/fullchain.pem /opt/docker/mail/config/cert/cert.pem
sudo cp /etc/letsencrypt/live/example.com/privkey.pem /opt/docker/mail/config/cert/privkey.pem

# Set appropriate permissions
sudo chown -R $USER:$USER /opt/docker/mail
sudo chmod -R 600 /opt/docker/mail/config/cert
```

### 7.3 Starting the Mail Server and Configuring Accounts

```bash
# Create required directories
mkdir -p /opt/docker/mail/mail-data
mkdir -p /opt/docker/mail/mail-state
mkdir -p /opt/docker/mail/mail-logs
mkdir -p /opt/docker/mail/config

# Start containers
cd /opt/docker/mail
docker-compose up -d

# Download setup.sh tool
curl -L https://raw.githubusercontent.com/docker-mailserver/docker-mailserver/master/setup.sh -o setup.sh
chmod +x setup.sh

# Create an email account (example)
docker exec -it mailserver setup email add admin@example.com  # You will be prompted for a password
```

### 7.4 DKIM Configuration

```bash
# Generate DKIM keys
docker exec -it mailserver setup config dkim

# Display the generated DKIM key
docker exec -it mailserver cat /tmp/docker-mailserver/opendkim/keys/example.com/mail.txt
```

### 7.5 NGINX Configuration as a Reverse Proxy for Roundcube

```bash
# Edit configuration file for mail.example.com
sudo nano /etc/nginx/sites-available/mail.example.com  # Change to your domain
```

Paste the updated configuration:

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name mail.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name mail.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:8082;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

### 7.6 DNS Configuration for Mail

In your domain's DNS management panel, add the following records:

1. **MX Record**:
   - Type: MX
   - Name: @ (or your domain)
   - Value: mail.example.com
   - Priority: 10

2. **SPF Record**:
   - Type: TXT
   - Name: @ (or your domain)
   - Value: `v=spf1 a mx ip4:<your-server-ip> -all`

3. **DKIM Record**:
   - Type: TXT
   - Name: mail.\_domainkey
   - Value: copy from the output of `cat /tmp/docker-mailserver/opendkim/keys/example.com/mail.txt`

4. **DMARC Record**:
   - Type: TXT
   - Name: \_dmarc
   - Value: `v=DMARC1; p=none; rua=mailto:admin@example.com; ruf=mailto:admin@example.com; pct=100`

## 8. Portfolio Configuration

### 8.1 Directory Structure

```bash
# Create directory for the application
mkdir -p /opt/docker/apps/portfolio
```

### 8.2 Docker Configuration for the Portfolio

```bash
# Navigate to the portfolio directory
cd /opt/docker/apps/portfolio

# Create docker-compose.yml file
nano docker-compose.yml
```

Content of the docker-compose.yml file:

```yaml
version: '3.8'
services:
  portfolio:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - '3001:80'
    restart: always
```

### 8.3 Creating Dockerfile for React Application

```bash
# Create Dockerfile
nano Dockerfile
```

```dockerfile
FROM node:18 as build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
```

### 8.4 Preparing NGINX Configuration for the Container

```bash
# Create nginx.conf file
nano nginx.conf
```

```nginx
server {
    listen 80;

    location / {
        root /usr/share/nginx/html;
        index index.html index.htm;
        try_files $uri $uri/ /index.html;
    }
}
```

### 8.5 NGINX Configuration as a Proxy for Portfolio

```bash
# Edit NGINX configuration file for portfolio
sudo nano /etc/nginx/sites-available/portfolio.example.com  # Change to your domain
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name portfolio.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name portfolio.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

### 8.6 Starting the Portfolio

```bash
# Reload NGINX configuration
sudo nginx -t
sudo systemctl reload nginx

# Start portfolio container
cd /opt/docker/apps/portfolio
docker-compose up -d
```

## 9. Additional Information

### 9.1 Security Notes

1. **Regularly update the system and applications**:

   ```bash
   sudo apt update && sudo apt upgrade -y
   ```

2. **Monitor system logs**:

   ```bash
   sudo journalctl -f
   ```

3. **Create backups of important data**:
   - SSL certificates
   - NGINX configuration
   - Docker container data

### 9.2 Troubleshooting

1. **Checking service status**:

   ```bash
   systemctl status nginx
   docker ps
   ```

2. **Checking logs**:

   ```bash
   docker logs [container_name]
   sudo tail -f /var/log/nginx/error.log
   ```

3. **Testing NGINX configuration**:

   ```bash
   sudo nginx -t
   ```

4. **Restarting services**:
   ```bash
   sudo systemctl restart nginx
   docker-compose down && docker-compose up -d
   ```

### 9.3 Useful Commands

- **Updating Docker containers**:

  ```bash
  docker-compose pull
  docker-compose up -d
  ```

- **Cleaning unused Docker images**:

  ```bash
  docker system prune -a
  ```

- **Renewing SSL certificates**:

  ```bash
  sudo certbot renew
  ```

- **Adding new email accounts**:

  ```bash
  docker exec -it mailserver setup email add [email]
  ```

- **Checking disk usage**:
  ```bash
  df -h
  du -sh /opt/docker/*
  ```

This guide should enable a complete OVH server configuration, from basic system setup, through installation and configuration of all necessary services, to security measures and troubleshooting common issues.

---


## OVH - od zera do eksperta

**URL:** https://portfolio.sdet.pl/articles/ovh-server-zero-to-hero
**Published:** 2025-03-27
**Language:** pl
Tags: ovh, vps, infrastructure, devops

Kompleksowa instrukcja konfiguracji serwera OVH VPS - od czystej instalacji do produkcyjnej infrastruktury.

Poniżej znajduje się szczegółowa instrukcja, jak skonfigurować serwer na OVH od zera, obejmująca wszystkie podstawowe kroki.

## Spis treści

1. [Podstawowa konfiguracja użytkownika i bezpieczeństwa](#1-podstawowa-konfiguracja-użytkownika-i-bezpieczeństwa)
2. [Instalacja i konfiguracja środowiska Docker](#2-instalacja-i-konfiguracja-środowiska-docker)
3. [Konfiguracja NGINX jako reverse proxy](#3-konfiguracja-nginx-jako-reverse-proxy)
4. [Wdrożenie Let's Encrypt dla SSL](#4-wdrożenie-lets-encrypt-dla-ssl)
5. [Konfiguracja stosu monitorowania](#5-konfiguracja-stosu-monitorowania)
6. [Konfiguracja serwera plików (Nextcloud)](#6-konfiguracja-serwera-plików-nextcloud)
7. [Konfiguracja serwera poczty](#7-konfiguracja-serwera-poczty)
8. [Konfiguracja portfolio](#8-konfiguracja-portfolio)
9. [Informacje dodatkowe](#9-informacje-dodatkowe)

## 1. Podstawowa konfiguracja użytkownika i bezpieczeństwa

### 1.1 Utworzenie użytkownika \***\*deployer\*\***

```bash
# Dodaj nowego użytkownika
sudo adduser deployer

# Dodaj użytkownika do grupy sudo
sudo usermod -aG sudo deployer
```

### 1.2 Konfiguracja SSH i sudo dla nowego użytkownika

```bash
# Przełącz się na użytkownika deployer
sudo su - deployer

# Utwórz katalog .ssh i ustaw odpowiednie uprawnienia
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Utwórz plik authorized_keys
touch ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Dodaj swój klucz publiczny do authorized_keys
echo "TWÓJ_KLUCZ_PUBLICZNY_SSH" > ~/.ssh/authorized_keys

# Wróć do poprzedniego użytkownika
exit
```

### 1.3 Zabezpieczenie konfiguracji SSH

```bash
# Edytuj plik konfiguracyjny SSH
sudo nano /etc/ssh/sshd_config
```

Zmodyfikuj następujące linie:

```
# Wyłącz logowanie na roota
PermitRootLogin no

# Wyłącz logowanie hasłem (tylko klucze)
PasswordAuthentication no

# Określ port SSH (opcjonalnie możesz zmienić z 22 na inny)
Port 22

# Ograniczenie czasu bezczynności
ClientAliveInterval 300
ClientAliveCountMax 2
```

Zrestartuj usługę SSH:

```bash
sudo systemctl restart sshd
```

### 1.4 Konfiguracja sudo dla użytkownika deployer

```bash
# Utwórz plik konfiguracyjny sudo dla użytkownika deployer
sudo visudo -f /etc/sudoers.d/deployer
```

Dodaj następującą linię:

```
deployer ALL=(ALL) NOPASSWD: ALL
```

### 1.5 Konfiguracja firewall (UFW)

```bash
# Zainstaluj UFW
sudo apt update
sudo apt install ufw

# Ustaw domyślne zasady
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Zezwól na SSH
sudo ufw allow ssh

# Zezwól na HTTP/HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Włącz firewall
sudo ufw enable

# Sprawdź status
sudo ufw status verbose
```

## 2. Instalacja i konfiguracja środowiska Docker

### 2.1 Instalacja Docker Engine

```bash
# Aktualizacja pakietów
sudo apt update

# Instalacja wymaganych pakietów
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common

# Dodanie oficjalnego klucza GPG Dockera
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Dodanie repozytorium Docker
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Aktualizacja listy pakietów
sudo apt update

# Instalacja Dockera
sudo apt install -y docker-ce docker-ce-cli containerd.io
```

### 2.2 Instalacja Docker Compose

```bash
# Instalacja Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

# Nadanie uprawnień wykonywania
sudo chmod +x /usr/local/bin/docker-compose

# Sprawdzenie wersji
docker-compose --version
```

### 2.3 Konfiguracja uprawnień Docker dla użytkownika

```bash
# Dodanie użytkownika do grupy docker
sudo usermod -aG docker $USER
sudo usermod -aG docker deployer

# Zastosowanie zmian (wylogowanie i zalogowanie ponownie)
# lub uruchomienie poniższej komendy
newgrp docker
```

### 2.4 Utworzenie struktury katalogów dla kontenerów

```bash
# Utworzenie głównych katalogów dla Dockera
sudo mkdir -p /opt/docker/monitoring/grafana
sudo mkdir -p /opt/docker/monitoring/loki
sudo mkdir -p /opt/docker/monitoring/prometheus
sudo mkdir -p /opt/docker/file
sudo mkdir -p /opt/docker/mail
sudo mkdir -p /opt/docker/apps/portfolio

# Zmiana właściciela katalogów
sudo chown -R $USER:$USER /opt/docker

# Nadanie odpowiednich uprawnień
sudo chmod -R 755 /opt/docker
```

## 3. Konfiguracja NGINX jako reverse proxy

### 3.1 Instalacja NGINX

```bash
# Aktualizacja pakietów
sudo apt update

# Instalacja NGINX
sudo apt install -y nginx

# Uruchomienie NGINX i włączenie autostartu
sudo systemctl start nginx
sudo systemctl enable nginx
```

### 3.2 Podstawowa konfiguracja NGINX

```bash
# Tworzenie katalogów dla konfiguracji
sudo mkdir -p /etc/nginx/sites-available
sudo mkdir -p /etc/nginx/sites-enabled
sudo mkdir -p /etc/nginx/snippets

# Edycja głównego pliku konfiguracyjnego
sudo nano /etc/nginx/nginx.conf
```

Wklej poniższą konfigurację:

```nginx
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 1024;
    multi_accept on;
}

http {
    # Podstawowe ustawienia
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    # MIME
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logi
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    # Gzip
    gzip on;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Dołączanie konfiguracji
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;

    # Limity czasowe dla proxy
    proxy_connect_timeout 300;
    proxy_send_timeout 300;
    proxy_read_timeout 300;
    send_timeout 300;
}
```

### 3.3 Przygotowanie snippetów pod SSL

```bash
# Tworzenie snippetu SSL
sudo nano /etc/nginx/snippets/ssl-params.conf
```

Wklej poniższą konfigurację:

```nginx
# Protokoły SSL/TLS
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;

# Parametry Diffie-Hellman
ssl_dhparam /etc/nginx/dhparam.pem;

# Sesje SSL
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;

# HSTS (15768000 sekund = 6 miesięcy)
add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload";

# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;

# Resolver DNS
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

# Dodatkowe nagłówki bezpieczeństwa
add_header X-Frame-Options SAMEORIGIN;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
```

### 3.4 Generowanie silnych parametrów Diffie-Hellman

```bash
# Generowanie parametrów Diffie-Hellman
sudo openssl dhparam -out /etc/nginx/dhparam.pem 2048
```

### 3.5 Utworzenie domyślnej konfiguracji dla domen

```bash
# Tworzenie konfiguracji dla domeny głównej
sudo nano /etc/nginx/sites-available/example.com  # Zmień na swoją domenę
```

Wklej poniższą konfigurację (zmień domenę na swoją):

```nginx
server {
    listen 80;
    listen [::]:80;

    server_name example.com www.example.com;  # Zmień na swoją domenę

    # Przekierowanie na HTTPS (będzie aktywowane po skonfigurowaniu SSL)
    # return 301 https://$host$request_uri;

    location / {
        root /var/www/html;
        index index.html index.htm;
    }
}
```

```bash
# Utwórz katalog dla plików HTML
sudo mkdir -p /var/www/html

# Utwórz podstawowy plik index.html
sudo nano /var/www/html/index.html
```

Dodaj prosty plik HTML:

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Strona w budowie</title>
    <style>
      body {
        font-family: Arial, sans-serif;
        text-align: center;
        padding: 50px;
      }
      h1 {
        color: #333;
      }
    </style>
  </head>
  <body>
    <h1>Strona w budowie</h1>
    <p>Serwer działa poprawnie. Strona w trakcie konfiguracji.</p>
  </body>
</html>
```

Aktywowanie konfiguracji:

```bash
sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/  # Zmień na swoją domenę
sudo rm -f /etc/nginx/sites-enabled/default  # Usunięcie domyślnej konfiguracji
sudo nginx -t  # Sprawdzenie składni
sudo systemctl reload nginx  # Przeładowanie konfiguracji
```

## 4. Wdrożenie Let's Encrypt dla SSL

### 4.1 Instalacja Certbot

```bash
# Aktualizacja pakietów
sudo apt update

# Instalacja Certbot i wtyczki dla NGINX
sudo apt install -y certbot python3-certbot-nginx
```

### 4.2 Przygotowanie konfiguracji dla subdomen

Przed uzyskaniem certyfikatów, utwórz podstawową konfigurację NGINX dla wszystkich subdomen:

```bash
# Utwórz plik konfiguracyjny dla wszystkich subdomen
sudo nano /etc/nginx/sites-available/subdomains.example.com  # Zmień na swoją domenę
```

Wklej następującą konfigurację (dostosuj nazwy domen):

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name prometheus.example.com loki.example.com grafana.example.com mail.example.com files.example.com portfolio.example.com monitoring.example.com;

    location / {
        root /var/www/html;
        index index.html;
    }
}
```

Aktywuj konfigurację:

```bash
sudo ln -s /etc/nginx/sites-available/subdomains.example.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```

### 4.3 Pozyskanie certyfikatów dla domen

```bash
# Pozyskanie certyfikatu dla domeny głównej i wszystkich subdomen
sudo certbot --nginx -d example.com -d www.example.com -d prometheus.example.com -d loki.example.com -d grafana.example.com -d mail.example.com -d files.example.com -d portfolio.example.com -d monitoring.example.com
```

Podczas procesu zostaniesz poproszony o:

- Podanie adresu e-mail do powiadomień
- Akceptację warunków korzystania z usługi
- Wybór, czy chcesz przekierowywać HTTP na HTTPS

### 4.4 Konfiguracja automatycznego odnawiania certyfikatów

```bash
# Sprawdzenie czy automatyczne odnawianie jest skonfigurowane
sudo systemctl status certbot.timer

# Test procesu odnowienia (bez faktycznego odnowienia)
sudo certbot renew --dry-run
```

### 4.5 Konfiguracja subdomen

Po uzyskaniu certyfikatów, utwórz szczegółowe pliki konfiguracyjne dla każdej subdomeny:

```bash
# Monitoring (Grafana, Prometheus, Loki)
sudo nano /etc/nginx/sites-available/monitoring.example.com  # Zmień na swoją domenę
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name prometheus.example.com loki.example.com grafana.example.com monitoring.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name monitoring.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name grafana.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name prometheus.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:9090;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name loki.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:3100;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

```bash
# Serwer plików
sudo nano /etc/nginx/sites-available/files.example.com  # Zmień na swoją domenę
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name files.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name files.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "File server placeholder";
    }
}
```

```bash
# Serwer poczty
sudo nano /etc/nginx/sites-available/mail.example.com  # Zmień na swoją domenę
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name mail.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name mail.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "Mail server placeholder";
    }
}
```

```bash
# Portfolio
sudo nano /etc/nginx/sites-available/portfolio.example.com  # Zmień na swoją domenę
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name portfolio.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name portfolio.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        return 200 "Portfolio placeholder";
    }
}
```

Aktywuj wszystkie konfiguracje:

```bash
sudo ln -s /etc/nginx/sites-available/monitoring.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/files.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/mail.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/portfolio.example.com /etc/nginx/sites-enabled/

# Usuń tymczasowy plik konfiguracyjny
sudo rm /etc/nginx/sites-enabled/subdomains.example.com

sudo nginx -t
sudo systemctl reload nginx
```

## 5. Konfiguracja stosu monitorowania

### 5.1 Przygotowanie pliku docker-compose.yml dla stosu monitorowania

```bash
# Przejdź do katalogu monitorowania
cd /opt/docker/monitoring

# Utwórz plik docker-compose.yml
nano docker-compose.yml
```

Wklej poniższą konfigurację:

```yaml
version: '3.8'

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data: {}
  grafana_data: {}
  # loki_data: {}  # Odkomentuj, jeśli chcesz używać Loki

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - '9090:9090'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - '9100:9100'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - '8080:8080'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=<strong-password> # Zmień to na bezpieczne hasło!
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - '3000:3000'
    networks:
      - monitoring
    labels:
      org.label-schema.group: 'monitoring'

  # Loki i Promtail są opcjonalne - odkomentuj jeśli chcesz ich używać
  # loki:
  #   image: grafana/loki:latest
  #   container_name: loki
  #   restart: unless-stopped
  #   volumes:
  #     - ./loki/config.yml:/etc/loki/config.yml
  #     - loki_data:/loki
  #   command: -config.file=/etc/loki/config.yml
  #   ports:
  #     - "3100:3100"
  #   networks:
  #     - monitoring
  #   labels:
  #     org.label-schema.group: "monitoring"
  #
  # promtail:
  #   image: grafana/promtail:latest
  #   container_name: promtail
  #   restart: unless-stopped
  #   volumes:
  #     - /var/log:/var/log
  #     - ./loki/promtail-config.yml:/etc/promtail/config.yml
  #   command: -config.file=/etc/promtail/config.yml
  #   networks:
  #     - monitoring
  #   labels:
  #     org.label-schema.group: "monitoring"
```

### 5.2 Przygotowanie konfiguracji Prometheus

```bash
# Utwórz katalog dla konfiguracji Prometheus
mkdir -p /opt/docker/monitoring/prometheus

# Utwórz plik konfiguracyjny Prometheus
nano /opt/docker/monitoring/prometheus/prometheus.yml
```

Wklej poniższą konfigurację:

```yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
```

### 5.3 Przygotowanie konfiguracji Grafana

```bash
# Utwórz katalogi dla konfiguracji Grafana
mkdir -p /opt/docker/monitoring/grafana/provisioning/datasources
mkdir -p /opt/docker/monitoring/grafana/provisioning/dashboards

# Utwórz plik konfiguracyjny źródeł danych
nano /opt/docker/monitoring/grafana/provisioning/datasources/datasource.yml
```

Wklej poniższą konfigurację:

```yaml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
```

### 5.4 Uruchomienie stosu monitorowania

```bash
# Uruchom kontenery
cd /opt/docker/monitoring
docker-compose up -d
```

### 5.5 Zabezpieczenie dostępu do Prometheus

```bash
# Zainstaluj narzędzie do generowania haseł
sudo apt install -y apache2-utils

# Utwórz plik z hasłami (zastąp 'admin_user' i 'your_password' własnymi wartościami)
sudo htpasswd -c /etc/nginx/.htpasswd admin_user
```

## 6. Konfiguracja serwera plików (Nextcloud)

### 6.1 Przygotowanie docker-compose.yml dla Nextcloud

```bash
# Utwórz katalog dla Nextcloud
cd /opt/docker/file

# Utwórz plik docker-compose.yml
nano docker-compose.yml
```

Wklej poniższą konfigurację:

```yaml
version: '3'

volumes:
  nextcloud_data:
  nextcloud_db:

services:
  db:
    image: mariadb:10.6
    container_name: nextcloud-db
    command: --transaction-isolation=READ-COMMITTED --log-bin=binlog --binlog-format=ROW
    restart: always
    volumes:
      - nextcloud_db:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=<strong-db-root-password> # Zmień to!
      - MYSQL_PASSWORD=<strong-db-password> # Zmień to!
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    networks:
      - nextcloud_network

  app:
    image: nextcloud:stable
    container_name: nextcloud-app
    restart: always
    depends_on:
      - db
    volumes:
      - nextcloud_data:/var/www/html
    environment:
      - MYSQL_PASSWORD=<strong-db-password> # Zmień to!
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
      - MYSQL_HOST=db
      - NEXTCLOUD_TRUSTED_DOMAINS=files.example.com # Zmień na swoją domenę
      - NEXTCLOUD_ADMIN_USER=admin
      - NEXTCLOUD_ADMIN_PASSWORD=<strong-admin-password> # Zmień to!
    networks:
      - nextcloud_network
    ports:
      - 8081:80 # Używamy portu 8081, ponieważ 8080 jest już używany przez cAdvisor

networks:
  nextcloud_network:
```

### 6.2 Konfiguracja NGINX jako reverse proxy dla Nextcloud

```bash
# Edytuj plik konfiguracyjny dla serwera plików
sudo nano /etc/nginx/sites-available/files.example.com  # Zmień na swoją domenę
```

Zaktualizuj konfigurację:

````nginx
## 6.2 Konfiguracja NGINX jako reverse proxy dla Nextcloud (kontynuacja)

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name files.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name files.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    # Ustawienia specyficzne dla Nextcloud
    add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload;" always;
    add_header Referrer-Policy "no-referrer" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Download-Options "noopen" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Permitted-Cross-Domain-Policies "none" always;
    add_header X-Robots-Tag "noindex, nofollow" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Wyłączenie ograniczeń rozmiaru dla uploadu plików
    client_max_body_size 0;

    location / {
        proxy_pass http://localhost:8081;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_buffering off;
        proxy_request_buffering off;

        # Zwiększenie timeoutów dla długo trwających operacji
        proxy_connect_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_read_timeout 3600s;
    }
}
````

### 6.3 Uruchomienie Nextcloud

```bash
# Uruchom kontenery
cd /opt/docker/file
docker-compose up -d
```

Sprawdź działanie Nextcloud pod adresem https://files.example.com i zaloguj się używając danych:

- Użytkownik: admin
- Hasło: <strong-admin-password> (lub to, które ustawiłeś w docker-compose.yml)

## 7. Konfiguracja serwera poczty

### 7.1 Przygotowanie docker-compose.yml dla serwera poczty

```bash
# Przejdź do katalogu serwera poczty
cd /opt/docker/mail

# Utwórz plik docker-compose.yml
nano docker-compose.yml
```

Wklej poniższą konfigurację:

```yaml
version: '3'

networks:
  mail_network:
    driver: bridge

services:
  mailserver:
    image: docker.io/mailserver/docker-mailserver:latest
    container_name: mailserver
    hostname: mail.example.com # Zmień na swoją domenę
    domainname: example.com # Zmień na swoją domenę
    ports:
      - '25:25' # SMTP
      - '143:143' # IMAP
      - '587:587' # Submission
      - '993:993' # IMAPS
    volumes:
      - ./mail-data:/var/mail
      - ./mail-state:/var/mail-state
      - ./config:/tmp/docker-mailserver/
      - /etc/localtime:/etc/localtime:ro
    environment:
      - ENABLE_SPAMASSASSIN=1
      - ENABLE_CLAMAV=1
      - ENABLE_FAIL2BAN=1
      - SSL_TYPE=manual
      - SSL_CERT_PATH=/tmp/docker-mailserver/cert/cert.pem
      - SSL_KEY_PATH=/tmp/docker-mailserver/cert/privkey.pem
      - PERMIT_DOCKER=connected-networks
    networks:
      mail_network:
    restart: always

  webmail:
    image: roundcube/roundcubemail:latest
    container_name: roundcube
    depends_on:
      - mailserver
    environment:
      - ROUNDCUBEMAIL_DEFAULT_HOST=mailserver
      - ROUNDCUBEMAIL_DEFAULT_PORT=143
      - ROUNDCUBEMAIL_SMTP_SERVER=mailserver
      - ROUNDCUBEMAIL_SMTP_PORT=587
    ports:
      - '8082:80'
    networks:
      mail_network:
    restart: always
```

### 7.2 Przygotowanie certyfikatów dla serwera poczty

```bash
# Utwórz strukturę katalogów
mkdir -p /opt/docker/mail/config/cert

# Skopiuj certyfikaty
sudo cp /etc/letsencrypt/live/example.com/fullchain.pem /opt/docker/mail/config/cert/cert.pem
sudo cp /etc/letsencrypt/live/example.com/privkey.pem /opt/docker/mail/config/cert/privkey.pem

# Ustaw odpowiednie uprawnienia
sudo chown -R $USER:$USER /opt/docker/mail
sudo chmod -R 600 /opt/docker/mail/config/cert
```

### 7.3 Uruchomienie serwera poczty i konfiguracja kont

```bash
# Utwórz potrzebne katalogi
mkdir -p /opt/docker/mail/mail-data
mkdir -p /opt/docker/mail/mail-state
mkdir -p /opt/docker/mail/mail-logs
mkdir -p /opt/docker/mail/config

# Uruchom kontenery
cd /opt/docker/mail
docker-compose up -d

# Pobierz narzędzie setup.sh
curl -L https://raw.githubusercontent.com/docker-mailserver/docker-mailserver/master/setup.sh -o setup.sh
chmod +x setup.sh

# Utwórz konto e-mail (przykład)
docker exec -it mailserver setup email add admin@example.com  # Zostaniesz poproszony o hasło
```

### 7.4 Konfiguracja DKIM

```bash
# Wygeneruj klucze DKIM
docker exec -it mailserver setup config dkim

# Wyświetl wygenerowany klucz DKIM
docker exec -it mailserver cat /tmp/docker-mailserver/opendkim/keys/example.com/mail.txt
```

### 7.5 Konfiguracja NGINX jako reverse proxy dla Roundcube

```bash
# Edytuj plik konfiguracyjny dla mail.example.com
sudo nano /etc/nginx/sites-available/mail.example.com  # Zmień na swoją domenę
```

Wklej zaktualizowaną konfigurację:

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name mail.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name mail.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:8082;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

### 7.6 Konfiguracja DNS dla poczty

W panelu zarządzania DNS swojej domeny, dodaj następujące rekordy:

1. **Rekord MX**:
   - Typ: MX
   - Nazwa: @ (lub twoja domena)
   - Wartość: mail.example.com
   - Priorytet: 10

2. **Rekord SPF**:
   - Typ: TXT
   - Nazwa: @ (lub twoja domena)
   - Wartość: `v=spf1 a mx ip4:<your-server-ip> -all`

3. **Rekord DKIM**:
   - Typ: TXT
   - Nazwa: mail.\_domainkey
   - Wartość: skopiuj z wyjścia polecenia `cat /tmp/docker-mailserver/opendkim/keys/example.com/mail.txt`

4. **Rekord DMARC**:
   - Typ: TXT
   - Nazwa: \_dmarc
   - Wartość: `v=DMARC1; p=none; rua=mailto:admin@example.com; ruf=mailto:admin@example.com; pct=100`

## 8. Konfiguracja portfolio

### 8.1 Struktura katalogów

```bash
# Utwórz katalog dla aplikacji
mkdir -p /opt/docker/apps/portfolio
```

### 8.2 Konfiguracja Docker dla portfolio

```bash
# Przejdź do katalogu portfolio
cd /opt/docker/apps/portfolio

# Utwórz plik docker-compose.yml
nano docker-compose.yml
```

Zawartość pliku docker-compose.yml:

```yaml
version: '3.8'
services:
  portfolio:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - '3001:80'
    restart: always
```

### 8.3 Utworzenie Dockerfile dla aplikacji React

```bash
# Utwórz plik Dockerfile
nano Dockerfile
```

```dockerfile
FROM node:18 as build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
```

### 8.4 Przygotowanie konfiguracji NGINX dla kontenera

```bash
# Utwórz plik nginx.conf
nano nginx.conf
```

```nginx
server {
    listen 80;

    location / {
        root /usr/share/nginx/html;
        index index.html index.htm;
        try_files $uri $uri/ /index.html;
    }
}
```

### 8.5 Konfiguracja NGINX jako proxy dla portfolio

```bash
# Edytuj plik konfiguracyjny NGINX dla portfolio
sudo nano /etc/nginx/sites-available/portfolio.example.com  # Zmień na swoją domenę
```

```nginx
server {
    listen 80;
    listen [::]:80;
    server_name portfolio.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name portfolio.example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_pass http://localhost:3001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

### 8.6 Uruchomienie portfolio

```bash
# Przeładuj konfigurację NGINX
sudo nginx -t
sudo systemctl reload nginx

# Uruchom kontener portfolio
cd /opt/docker/apps/portfolio
docker-compose up -d
```

## 9. Informacje dodatkowe

### 9.1 Uwagi dotyczące bezpieczeństwa

1. **Regularnie aktualizuj system i aplikacje**:

   ```bash
   sudo apt update && sudo apt upgrade -y
   ```

2. **Monitoruj logi systemowe**:

   ```bash
   sudo journalctl -f
   ```

3. **Utwórz kopie zapasowe ważnych danych**:
   - Certyfikaty SSL
   - Konfiguracja NGINX
   - Dane z kontenerów Docker

### 9.2 Rozwiązywanie problemów

1. **Sprawdzanie statusu usług**:

   ```bash
   systemctl status nginx
   docker ps
   ```

2. **Sprawdzanie logów**:

   ```bash
   docker logs [nazwa_kontenera]
   sudo tail -f /var/log/nginx/error.log
   ```

3. **Testowanie konfiguracji NGINX**:

   ```bash
   sudo nginx -t
   ```

4. **Restart usług**:
   ```bash
   sudo systemctl restart nginx
   docker-compose down && docker-compose up -d
   ```

### 9.3 Przydatne polecenia

- **Aktualizacja kontenerów Docker**:

  ```bash
  docker-compose pull
  docker-compose up -d
  ```

- **Czyszczenie nieużywanych obrazów Docker**:

  ```bash
  docker system prune -a
  ```

- **Odnowienie certyfikatów SSL**:

  ```bash
  sudo certbot renew
  ```

- **Dodawanie nowych kont e-mail**:

  ```bash
  docker exec -it mailserver setup email add [email]
  ```

- **Sprawdzanie użycia dysku**:
  ```bash
  df -h
  du -sh /opt/docker/*
  ```

Ta instrukcja powinna umożliwić pełną konfigurację serwera na OVH, od podstawowej konfiguracji systemu, przez instalację i konfigurację wszystkich potrzebnych usług, aż po zabezpieczenia i rozwiązywanie typowych problemów.

---


## API Tests Playwright - MAF

**URL:** https://portfolio.sdet.it/articles/api-tests-playwright-maf
**Published:** 2025-03-24
**Language:** en
Tags: playwright, api-testing, typescript

Backend testing in the MAF application - a Playwright-based approach with scalable structure.

## Introduction

API testing is a key element of quality assurance in modern web applications. In the MAF project (My Invoice Application), we adopted a comprehensive approach to backend testing using Playwright as the testing tool. In this article, I will discuss the implementation of API tests, present the test project structure, and show specific examples of tests with varying levels of complexity.

## Test Architecture

Backend tests in MAF were designed with modularity, code reusability, and clarity in mind. The project structure reflects a logical division into business domains (invoices, contractors) and separate shared components.

```
maf-api-tests/
├── common/
│   ├── api-base.ts         # Base class for all API action classes
│   └── types.ts            # Types and enums shared between modules
├── invoices/
│   ├── actions.ts          # API actions for invoices
│   ├── data.ts             # Test data generators
│   ├── test.ts             # Basic CRUD tests
│   └── complex.test.ts     # Complex test scenarios
└── contractors/
    ├── actions.ts          # API actions for contractors
    ├── data.ts             # Test data generators
    └── test.ts             # CRUD tests
```

This division allows for easy test management, high code readability, and the ability to quickly extend the test suite with new functionality.

## Layered Approach

A key element of our test architecture is the division into four layers:

1. **Base Class** - provides common functionalities for all action classes
2. **Action Classes** - implement methods for interacting with specific APIs
3. **Data Generators** - provide test data
4. **Tests** - use the above elements to write test scenarios

### 1. Base Class (ApiBase)

The `ApiBase` class serves as the foundation for all other action classes, providing HTTP response handling and result formatting.

```typescript
export class ApiBase {
  protected readonly request: APIRequestContext;
  protected readonly baseUrl: string;

  constructor(request: APIRequestContext, baseUrl: string) {
    this.request = request;
    this.baseUrl = baseUrl;
  }

  protected async handleResponse(response: any) {
    const status = response.status();
    let responseData;

    try {
      if (status >= 200 && status < 300) {
        if (status === 204) {
          responseData = null;
        } else {
          responseData = await response.json();
        }
      } else {
        const errorText = await response.text();
        console.error(`API Error (${status}):`, errorText);
        responseData = {
          error: true,
          statusCode: status,
          message: errorText.substring(0, 500),
        };
      }
    } catch (error) {
      const textContent = await response.text();
      console.error('Failed to parse response:', textContent);
      responseData = {
        error: true,
        message: `Failed to parse JSON: ${textContent.substring(0, 200)}...`,
        parseError: error.message,
      };
    }

    return {
      status,
      data: responseData,
    };
  }
}
```

### 2. Action Classes

Action classes such as `InvoiceActions` or `ContractorActions` inherit from `ApiBase` and implement methods for performing specific operations on the API.

```typescript
export class InvoiceActions extends ApiBase {
  // ... other methods

  async createInvoice(contractorId: number, invoiceData?: Invoice) {
    let data: Invoice;

    const lastNumberResult = await this.getLastInvoiceNumber();
    let nextNumber = 'FV/1/' + new Date().getFullYear();

    if (lastNumberResult.status === 200 && lastNumberResult.data) {
      // Logic for generating the next invoice number
      const parts = lastNumberResult.data.split('/');
      if (parts.length === 3) {
        const prefix = parts[0];
        const number = parseInt(parts[1], 10);
        const year = parts[2];
        nextNumber = `${prefix}/${number + 1}/${year}`;
      }
    }

    if (invoiceData) {
      data = { ...invoiceData };
      data.number = nextNumber;
    } else {
      data = InvoiceData.generateRandomInvoice(contractorId);
      data.number = nextNumber;
      data = InvoiceData.calculateInvoiceTotals(data);
    }

    (data as any).createdAt = new Date().toISOString();

    const response = await this.request.post(`${this.baseUrl}/api/Invoices`, {
      data: data,
      headers: {
        'Content-Type': 'application/json',
      },
    });

    const result = await this.handleResponse(response);
    return {
      ...result,
      requestData: data,
    };
  }
}
```

### 3. Test Data Generators

To create realistic test data, we use the `faker.js` library, which allows for generating random but meaningful values for our entities.

```typescript
export class InvoiceData {
  static generateRandomInvoice(contractorId: number): Invoice {
    const issueDate = new Date();
    const dueDate = new Date();
    dueDate.setDate(dueDate.getDate() + 14);

    return {
      number: `FV/${faker.number.int({ min: 1, max: 9999 })}/${new Date().getFullYear()}`,
      issueDate: issueDate.toISOString(),
      dueDate: dueDate.toISOString(),
      totalAmount: 0,
      paymentStatus: this.getRandomPaymentStatus(),
      paidAmount: 0,
      description: faker.commerce.productDescription(),
      contractorId: contractorId,
      paymentMethod: this.getRandomPaymentMethod(),
      invoiceItems: this.generateRandomInvoiceItems(faker.number.int({ min: 1, max: 5 })),
    };
  }

  static calculateInvoiceTotals(invoice: Invoice): Invoice {
    // Calculation logic
    let totalNet = 0;
    let totalVat = 0;

    for (const item of invoice.invoiceItems) {
      const itemNet = item.quantity * item.netPrice;
      let vatRateValue = 0;

      switch (item.vatRate) {
        case VatRate.Zero:
          vatRateValue = 0;
          break;
        case VatRate.Three:
          vatRateValue = 3;
          break;
        case VatRate.Five:
          vatRateValue = 5;
          break;
        case VatRate.Eight:
          vatRateValue = 8;
          break;
        case VatRate.TwentyThree:
          vatRateValue = 23;
          break;
      }

      const itemVat = itemNet * (vatRateValue / 100);
      totalNet += itemNet;
      totalVat += itemVat;
    }

    const totalGross = totalNet + totalVat;
    invoice.totalAmount = parseFloat(totalGross.toFixed(2));

    // Payment handling logic
    if (invoice.paymentStatus === PaymentStatus.Paid) {
      invoice.paidAmount = invoice.totalAmount;
    } else if (invoice.paymentStatus === PaymentStatus.PartiallyPaid) {
      invoice.paidAmount = parseFloat(
        (
          invoice.totalAmount * faker.number.float({ min: 0.1, max: 0.9, fractionDigits: 2 })
        ).toFixed(2),
      );
    } else {
      invoice.paidAmount = 0;
    }

    return invoice;
  }

  // ... other helper methods
}
```

## Test Examples

### CRUD Tests

Basic CRUD (Create, Read, Update, Delete) tests verify that basic entity operations work correctly:

```typescript
test('should create a new invoice', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);
  const result = await api.createInvoice(contractorId);

  expect(result.status).toBe(201);
  expect(result.data).toHaveProperty('id');
  expect(result.data.number).toBe(result.requestData.number);
  expect(result.data.contractorId).toBe(contractorId);
});

test('should get an invoice by ID', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);

  const createResult = await api.createInvoice(contractorId);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;
  const getResult = await api.getInvoiceById(invoiceId);

  expect(getResult.status).toBe(200);
  expect(getResult.data).toHaveProperty('id', invoiceId);
  expect(getResult.data).toHaveProperty('invoiceItems');
  expect(Array.isArray(getResult.data.invoiceItems)).toBeTruthy();
});

test('should update an invoice', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);

  const createResult = await api.createInvoice(contractorId);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;
  const originalInvoice = createResult.data;

  const updatedInvoice = {
    ...originalInvoice,
    description: 'Updated description',
    paymentStatus: PaymentStatus.Paid,
    paidAmount: originalInvoice.totalAmount,
  };

  const updateResult = await api.updateInvoice(invoiceId, updatedInvoice);
  expect(updateResult.status).toBe(204);

  const getResult = await api.getInvoiceById(invoiceId);
  expect(getResult.status).toBe(200);
  expect(getResult.data.description).toBe(updatedInvoice.description);
  expect(getResult.data.paymentStatus).toBe(PaymentStatus.Paid);
  expect(getResult.data.paidAmount).toBe(updatedInvoice.paidAmount);
});
```

### Complex Scenario Tests

In the MAF application, we also test more complex business scenarios that reflect real use cases.

```typescript
test('should handle invoice payment status changes', async ({ request }) => {
  const invoiceApi = new InvoiceActions(request, API_BASE_URL);
  const contractorApi = new ContractorActions(request, API_BASE_URL);

  const createContractorResult = await contractorApi.createContractor();
  const contractorId = createContractorResult.data.id;

  const invoiceData = InvoiceData.generateRandomInvoice(contractorId);
  invoiceData.paymentStatus = PaymentStatus.Unpaid;
  invoiceData.paidAmount = 0;

  // 1. Creating an unpaid invoice
  const createResult = await invoiceApi.createInvoice(contractorId, invoiceData);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;

  // 2. Partial payment
  const partialInvoice = {
    ...createResult.data,
    paymentStatus: PaymentStatus.PartiallyPaid,
    paidAmount: createResult.data.totalAmount / 2,
  };

  const partialResult = await invoiceApi.updateInvoice(invoiceId, partialInvoice);
  expect(partialResult.status).toBe(204);

  const getPartialResult = await invoiceApi.getInvoiceById(invoiceId);
  expect(getPartialResult.data.paymentStatus).toBe(PaymentStatus.PartiallyPaid);

  // 3. Full payment
  const paidInvoice = {
    ...getPartialResult.data,
    paymentStatus: PaymentStatus.Paid,
    paidAmount: getPartialResult.data.totalAmount,
  };

  const paidResult = await invoiceApi.updateInvoice(invoiceId, paidInvoice);
  expect(paidResult.status).toBe(204);

  const getPaidResult = await invoiceApi.getInvoiceById(invoiceId);
  expect(getPaidResult.data.paymentStatus).toBe(PaymentStatus.Paid);
  expect(getPaidResult.data.paidAmount).toBe(getPaidResult.data.totalAmount);
});
```

### Mass Data Loading Tests

In the MAF project, we've also implemented tests that serve to generate a larger number of test data, which is useful during both development and application demonstrations:

```typescript
test('Mass create invoices for database population', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);
  const contractorApi = new ContractorActions(request, API_BASE_URL);

  let contractorIds: number[] = [];
  const getContractorsResult = await contractorApi.getAllContractors();

  if (getContractorsResult.status === 200 && Array.isArray(getContractorsResult.data)) {
    contractorIds = getContractorsResult.data.map((c) => c.id);
  }

  if (contractorIds.length < 5) {
    const newContractors = await contractorApi.createMultipleContractors(10);
    contractorIds = [
      ...contractorIds,
      ...newContractors.map((c) => c.id).filter((id): id is number => id !== undefined),
    ];
  }

  const createdInvoices = await api.createMultipleInvoices(contractorIds, MASS_DATA_COUNT);

  expect(createdInvoices.length).toBe(MASS_DATA_COUNT);
  for (const invoice of createdInvoices) {
    expect(invoice).toHaveProperty('id');
  }
});
```

## Why Playwright for API Tests?

Although Playwright is primarily known as a UI testing tool, it also excels in API testing:

1. **Integrated HTTP client** - allows for easy execution of REST requests
2. **Consistent test environment** - we can use the same tool for front-end and back-end tests
3. **Excellent asynchronicity handling** - which is important when testing APIs
4. **Rich assertion set** - through integration with expect
5. **Parallel test execution** - for faster test suite runs

## Type Model

In our tests, we use strong TypeScript typing, which ensures consistency and helps detect potential issues at the compilation stage:

```typescript
export enum PaymentStatus {
  Paid = 'Paid',
  PartiallyPaid = 'PartiallyPaid',
  Unpaid = 'Unpaid',
  Overdue = 'Overdue',
}

export interface Invoice {
  id?: number;
  createdAt?: string;
  number: string;
  issueDate: string;
  dueDate: string;
  totalAmount: number;
  paymentStatus: PaymentStatus;
  paidAmount: number;
  description: string;
  contractorId: number;
  paymentMethod: PaymentMethod;
  invoiceItems: InvoiceItem[];
}
```

## Conclusions

The approach to API testing in the MAF project provides:

1. **Modularity** - each business domain has its own set of tests
2. **Reusability** - action classes and data generators are shared between tests
3. **Readability** - thanks to a clear structure and separation of responsibilities
4. **Completeness** - tests cover both basic CRUD operations and complex business scenarios
5. **Scalability** - we can easily add new tests and extend existing ones

This organization of tests allows for effective regression detection, documentation of expected API behavior, and ensuring that introduced changes do not violate existing functionality.

API tests form one of many layers of quality assurance in the MAF project, complementing unit, integration, and end-to-end tests, collectively creating a complete application testing strategy.

---

---


## API Tests Playwright - MAF

**URL:** https://portfolio.sdet.pl/articles/api-tests-playwright-maf
**Published:** 2025-03-24
**Language:** pl
Tags: playwright, api-testing, typescript

Testy backendu w aplikacji MAF - podejście oparte o Playwright, skalowalna struktura.

## Wprowadzenie

Testowanie API jest kluczowym elementem zapewnienia jakości w nowoczesnych aplikacjach webowych. W projekcie MAF (Moja Aplikacja Faktur) postawiliśmy na kompleksowe podejście do testów backendu, wykorzystując Playwright jako narzędzie testowe. W tym artykule omówię implementację testów API, przedstawię strukturę projektu testowego oraz pokażę konkretne przykłady testów z różnymi poziomami złożoności.

## Architektura testów

Testy backendu w MAF zostały zaprojektowane z myślą o modularności, możliwości ponownego użycia kodu oraz przejrzystości. Struktura projektu odzwierciedla logiczny podział na domeny biznesowe (faktury, kontrahenci) oraz wydzielone komponenty wspólne.

```
maf-api-tests/
├── common/
│   ├── api-base.ts         # Klasa bazowa dla wszystkich klas akcji API
│   └── types.ts            # Typy i enumy współdzielone między modułami
├── invoices/
│   ├── actions.ts          # Akcje API dla faktur
│   ├── data.ts             # Generatory danych testowych
│   ├── test.ts             # Podstawowe testy CRUD
│   └── complex.test.ts     # Złożone scenariusze testowe
└── contractors/
    ├── actions.ts          # Akcje API dla kontrahentów
    ├── data.ts             # Generatory danych testowych
    └── test.ts             # Testy CRUD
```

Ten podział pozwala na łatwe zarządzanie testami, utrzymanie wysokiej czytelności kodu oraz możliwość szybkiego rozszerzania zestawu testowego o nowe funkcjonalności.

## Podejście warstwowe

Kluczowym elementem naszej architektury testowej jest podział na trzy warstwy:

1. **Klasa bazowa** - zapewnia wspólne funkcjonalności dla wszystkich klas akcji
2. **Klasy akcji** - implementują metody do interakcji z konkretnym API
3. **Generatory danych** - dostarczają dane testowe
4. **Testy** - wykorzystują powyższe elementy do pisania scenariuszy testowych

### 1. Klasa bazowa (ApiBase)

Klasa `ApiBase` stanowi fundament wszystkich innych klas akcji, zapewniając obsługę odpowiedzi HTTP i formatowanie wyników.

```typescript
export class ApiBase {
  protected readonly request: APIRequestContext;
  protected readonly baseUrl: string;

  constructor(request: APIRequestContext, baseUrl: string) {
    this.request = request;
    this.baseUrl = baseUrl;
  }

  protected async handleResponse(response: any) {
    const status = response.status();
    let responseData;

    try {
      if (status >= 200 && status < 300) {
        if (status === 204) {
          responseData = null;
        } else {
          responseData = await response.json();
        }
      } else {
        const errorText = await response.text();
        console.error(`API Error (${status}):`, errorText);
        responseData = {
          error: true,
          statusCode: status,
          message: errorText.substring(0, 500),
        };
      }
    } catch (error) {
      const textContent = await response.text();
      console.error('Failed to parse response:', textContent);
      responseData = {
        error: true,
        message: `Failed to parse JSON: ${textContent.substring(0, 200)}...`,
        parseError: error.message,
      };
    }

    return {
      status,
      data: responseData,
    };
  }
}
```

### 2. Klasy akcji

Klasy akcji, jak `InvoiceActions` czy `ContractorActions`, dziedziczą po `ApiBase` i implementują metody do wykonywania konkretnych operacji na API.

```typescript
export class InvoiceActions extends ApiBase {
  // ... inne metody

  async createInvoice(contractorId: number, invoiceData?: Invoice) {
    let data: Invoice;

    const lastNumberResult = await this.getLastInvoiceNumber();
    let nextNumber = 'FV/1/' + new Date().getFullYear();

    if (lastNumberResult.status === 200 && lastNumberResult.data) {
      // Logika generowania kolejnego numeru faktury
      const parts = lastNumberResult.data.split('/');
      if (parts.length === 3) {
        const prefix = parts[0];
        const number = parseInt(parts[1], 10);
        const year = parts[2];
        nextNumber = `${prefix}/${number + 1}/${year}`;
      }
    }

    if (invoiceData) {
      data = { ...invoiceData };
      data.number = nextNumber;
    } else {
      data = InvoiceData.generateRandomInvoice(contractorId);
      data.number = nextNumber;
      data = InvoiceData.calculateInvoiceTotals(data);
    }

    (data as any).createdAt = new Date().toISOString();

    const response = await this.request.post(`${this.baseUrl}/api/Invoices`, {
      data: data,
      headers: {
        'Content-Type': 'application/json',
      },
    });

    const result = await this.handleResponse(response);
    return {
      ...result,
      requestData: data,
    };
  }
}
```

### 3. Generatory danych testowych

Do tworzenia realistycznych danych testowych wykorzystujemy bibliotekę `faker.js`, która pozwala na generowanie losowych, ale sensownych wartości dla naszych encji.

```typescript
export class InvoiceData {
  static generateRandomInvoice(contractorId: number): Invoice {
    const issueDate = new Date();
    const dueDate = new Date();
    dueDate.setDate(dueDate.getDate() + 14);

    return {
      number: `FV/${faker.number.int({ min: 1, max: 9999 })}/${new Date().getFullYear()}`,
      issueDate: issueDate.toISOString(),
      dueDate: dueDate.toISOString(),
      totalAmount: 0,
      paymentStatus: this.getRandomPaymentStatus(),
      paidAmount: 0,
      description: faker.commerce.productDescription(),
      contractorId: contractorId,
      paymentMethod: this.getRandomPaymentMethod(),
      invoiceItems: this.generateRandomInvoiceItems(faker.number.int({ min: 1, max: 5 })),
    };
  }

  static calculateInvoiceTotals(invoice: Invoice): Invoice {
    // Logika kalkulacji sum
    let totalNet = 0;
    let totalVat = 0;

    for (const item of invoice.invoiceItems) {
      const itemNet = item.quantity * item.netPrice;
      let vatRateValue = 0;

      switch (item.vatRate) {
        case VatRate.Zero:
          vatRateValue = 0;
          break;
        case VatRate.Three:
          vatRateValue = 3;
          break;
        case VatRate.Five:
          vatRateValue = 5;
          break;
        case VatRate.Eight:
          vatRateValue = 8;
          break;
        case VatRate.TwentyThree:
          vatRateValue = 23;
          break;
      }

      const itemVat = itemNet * (vatRateValue / 100);
      totalNet += itemNet;
      totalVat += itemVat;
    }

    const totalGross = totalNet + totalVat;
    invoice.totalAmount = parseFloat(totalGross.toFixed(2));

    // Logika obsługi płatności
    if (invoice.paymentStatus === PaymentStatus.Paid) {
      invoice.paidAmount = invoice.totalAmount;
    } else if (invoice.paymentStatus === PaymentStatus.PartiallyPaid) {
      invoice.paidAmount = parseFloat(
        (
          invoice.totalAmount * faker.number.float({ min: 0.1, max: 0.9, fractionDigits: 2 })
        ).toFixed(2),
      );
    } else {
      invoice.paidAmount = 0;
    }

    return invoice;
  }

  // ... inne metody pomocnicze
}
```

## Przykłady testów

### Testy CRUD

Podstawowe testy CRUD (Create, Read, Update, Delete) weryfikują, czy podstawowe operacje na encjach działają poprawnie:

```typescript
test('should create a new invoice', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);
  const result = await api.createInvoice(contractorId);

  expect(result.status).toBe(201);
  expect(result.data).toHaveProperty('id');
  expect(result.data.number).toBe(result.requestData.number);
  expect(result.data.contractorId).toBe(contractorId);
});

test('should get an invoice by ID', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);

  const createResult = await api.createInvoice(contractorId);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;
  const getResult = await api.getInvoiceById(invoiceId);

  expect(getResult.status).toBe(200);
  expect(getResult.data).toHaveProperty('id', invoiceId);
  expect(getResult.data).toHaveProperty('invoiceItems');
  expect(Array.isArray(getResult.data.invoiceItems)).toBeTruthy();
});

test('should update an invoice', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);

  const createResult = await api.createInvoice(contractorId);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;
  const originalInvoice = createResult.data;

  const updatedInvoice = {
    ...originalInvoice,
    description: 'Updated description',
    paymentStatus: PaymentStatus.Paid,
    paidAmount: originalInvoice.totalAmount,
  };

  const updateResult = await api.updateInvoice(invoiceId, updatedInvoice);
  expect(updateResult.status).toBe(204);

  const getResult = await api.getInvoiceById(invoiceId);
  expect(getResult.status).toBe(200);
  expect(getResult.data.description).toBe(updatedInvoice.description);
  expect(getResult.data.paymentStatus).toBe(PaymentStatus.Paid);
  expect(getResult.data.paidAmount).toBe(updatedInvoice.paidAmount);
});
```

### Testy złożonych scenariuszy

W aplikacji MAF testujemy także bardziej złożone scenariusze biznesowe, które odzwierciedlają realne przypadki użycia.

```typescript
test('should handle invoice payment status changes', async ({ request }) => {
  const invoiceApi = new InvoiceActions(request, API_BASE_URL);
  const contractorApi = new ContractorActions(request, API_BASE_URL);

  const createContractorResult = await contractorApi.createContractor();
  const contractorId = createContractorResult.data.id;

  const invoiceData = InvoiceData.generateRandomInvoice(contractorId);
  invoiceData.paymentStatus = PaymentStatus.Unpaid;
  invoiceData.paidAmount = 0;

  // 1. Tworzenie faktury nieopłaconej
  const createResult = await invoiceApi.createInvoice(contractorId, invoiceData);
  expect(createResult.status).toBe(201);

  const invoiceId = createResult.data.id;

  // 2. Częściowa płatność
  const partialInvoice = {
    ...createResult.data,
    paymentStatus: PaymentStatus.PartiallyPaid,
    paidAmount: createResult.data.totalAmount / 2,
  };

  const partialResult = await invoiceApi.updateInvoice(invoiceId, partialInvoice);
  expect(partialResult.status).toBe(204);

  const getPartialResult = await invoiceApi.getInvoiceById(invoiceId);
  expect(getPartialResult.data.paymentStatus).toBe(PaymentStatus.PartiallyPaid);

  // 3. Pełna płatność
  const paidInvoice = {
    ...getPartialResult.data,
    paymentStatus: PaymentStatus.Paid,
    paidAmount: getPartialResult.data.totalAmount,
  };

  const paidResult = await invoiceApi.updateInvoice(invoiceId, paidInvoice);
  expect(paidResult.status).toBe(204);

  const getPaidResult = await invoiceApi.getInvoiceById(invoiceId);
  expect(getPaidResult.data.paymentStatus).toBe(PaymentStatus.Paid);
  expect(getPaidResult.data.paidAmount).toBe(getPaidResult.data.totalAmount);
});
```

### Testy masowego zasilania danymi

W projekcie MAF zaimplementowaliśmy także testy, które służą do generowania większej liczby danych testowych, co jest przydatne zarówno podczas rozwoju, jak i demonstracji aplikacji:

```typescript
test('Mass create invoices for database population', async ({ request }) => {
  const api = new InvoiceActions(request, API_BASE_URL);
  const contractorApi = new ContractorActions(request, API_BASE_URL);

  let contractorIds: number[] = [];
  const getContractorsResult = await contractorApi.getAllContractors();

  if (getContractorsResult.status === 200 && Array.isArray(getContractorsResult.data)) {
    contractorIds = getContractorsResult.data.map((c) => c.id);
  }

  if (contractorIds.length < 5) {
    const newContractors = await contractorApi.createMultipleContractors(10);
    contractorIds = [
      ...contractorIds,
      ...newContractors.map((c) => c.id).filter((id): id is number => id !== undefined),
    ];
  }

  const createdInvoices = await api.createMultipleInvoices(contractorIds, MASS_DATA_COUNT);

  expect(createdInvoices.length).toBe(MASS_DATA_COUNT);
  for (const invoice of createdInvoices) {
    expect(invoice).toHaveProperty('id');
  }
});
```

## Dlaczego Playwright do testów API?

Chociaż Playwright jest znany głównie jako narzędzie do testowania interfejsu użytkownika, świetnie sprawdza się również w testowaniu API:

1. **Zintegrowany klient HTTP** - pozwala na łatwe wykonywanie zapytań REST
2. **Spójne środowisko testowe** - możemy korzystać z tego samego narzędzia do testów front-end i back-end
3. **Doskonała obsługa asynchroniczności** - co jest ważne przy testowaniu API
4. **Bogaty zestaw asercji** - poprzez integrację z expect
5. **Równoległe wykonywanie testów** - dla szybszego uruchamiania zestawu testowego

## Model typów

W naszych testach wykorzystujemy silne typowanie TypeScript, co zapewnia spójność i pomaga wykryć potencjalne problemy już na etapie kompilacji:

```typescript
export enum PaymentStatus {
  Paid = 'Paid',
  PartiallyPaid = 'PartiallyPaid',
  Unpaid = 'Unpaid',
  Overdue = 'Overdue',
}

export interface Invoice {
  id?: number;
  createdAt?: string;
  number: string;
  issueDate: string;
  dueDate: string;
  totalAmount: number;
  paymentStatus: PaymentStatus;
  paidAmount: number;
  description: string;
  contractorId: number;
  paymentMethod: PaymentMethod;
  invoiceItems: InvoiceItem[];
}
```

## Wnioski

Podejście do testów API w projekcie MAF zapewnia:

1. **Modularność** - każda domena biznesowa ma swój zestaw testów
2. **Możliwość ponownego użycia** - klasy akcji i generatory danych są współdzielone między testami
3. **Czytelność** - dzięki jasnej strukturze i separacji odpowiedzialności
4. **Kompletność** - testy pokrywają zarówno podstawowe operacje CRUD, jak i złożone scenariusze biznesowe
5. **Skalowalność** - łatwo możemy dodawać nowe testy i rozszerzać istniejące

Taka organizacja testów pozwala na skuteczne wykrywanie regresji, dokumentowanie oczekiwanego zachowania API oraz zapewnienie, że wprowadzane zmiany nie naruszają istniejącej funkcjonalności.

Testy API stanowią jedną z wielu warstw zapewnienia jakości w projekcie MAF, uzupełniając testy jednostkowe, integracyjne i end-to-end, co wspólnie tworzy kompletną strategię testowania aplikacji.

---

---


## Reverse Proxy with NGINX

**URL:** https://portfolio.sdet.it/articles/reverse-proxy-nginx
**Published:** 2025-02-25
**Language:** en
Tags: nginx, infrastructure, reverse-proxy

How to get it? - SPA + Reverse Proxy with NGINX, production patterns.

## What is a Reverse Proxy?

A reverse proxy is an intermediary server that acts as a "gateway" between clients and application servers. Unlike a regular proxy that works on the client side, a reverse proxy is located on the server side. When a client sends a request to a server, it first reaches the reverse proxy, which then redirects it to the appropriate backend server, retrieves the response, and sends it back to the client.

## How Does a Reverse Proxy Work in NGINX?

1. Client sends an HTTP request to a domain (e.g., example.com)
2. The request reaches the NGINX server acting as a reverse proxy
3. NGINX, based on its configuration, redirects the request to the appropriate application server
4. The application server processes the request and sends a response to NGINX
5. NGINX forwards the response back to the client

The client communicates exclusively with NGINX, without having direct access to the backend servers.

## Advantages of Using NGINX as a Reverse Proxy

### 1. Enhanced Security

- **Backend server isolation** - application servers are not directly exposed to the internet
- **Traffic filtering** - ability to block malicious requests using security modules
- **Protection against DDoS attacks** - limiting the number of connections and requests
- **Hiding infrastructure details** - clients don't know the internal network structure

### 2. Load Balancing

- Built-in mechanisms for distributing traffic among multiple servers
- Various balancing algorithms (round-robin, least connections, ip-hash)
- Server health checking and automatic disabling of malfunctioning instances

### 3. Performance Improvement

- **Efficient caching** - storing static files and frequent responses
- **Gzip/Brotli compression** - reducing the size of transferred data
- **SSL Termination** - offloading cryptographic operations from application servers
- **HTTP/2 and HTTP/3** - support for modern protocols that increase performance

### 4. Application Management

- **Virtual hosting** - handling multiple domains on a single server
- **URL redirections** - easy change of address structure without modifying the application
- **Header modification** - adding or modifying HTTP headers

## Basic NGINX Configuration as a Reverse Proxy

Here's a basic NGINX configuration acting as a reverse proxy:

```nginx
server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://backend_server:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

This configuration:

- Listens on port 80 for requests to example.com
- Redirects all requests to the backend server operating on http://backend_server:8080
- Passes original headers so the backend server knows the client's real IP address and other information

## NGINX Configuration for Single Page Applications (SPA)

Single Page Applications (SPAs) require special configuration because routing is handled on the client side by JavaScript, not by the server. It's crucial to redirect all requests to the index.html file to handle client-side routing.

Here's a complete NGINX configuration for an SPA operating behind a reverse proxy:

```nginx
server {
    listen 80;
    server_name spa-example.com;

    # Main folder with SPA static files
    root /var/www/spa-app/dist;

    # Forwarding API to the backend server
    location /api/ {
        proxy_pass http://api-server:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Handling static files with cache
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
        expires 30d;
        add_header Cache-Control "public, no-transform";
    }

    # Key configuration for SPA - redirecting to index.html
    location / {
        try_files $uri $uri/ /index.html;
    }
}
```

### Explanation of SPA Configuration:

1. **`try_files $uri $uri/ /index.html;`** - This is the key directive for SPA:
   - First, NGINX tries to find a file matching the requested URI ($uri)
   - If it doesn't find a file, it tries to find a directory ($uri/)
   - If neither file nor directory exists, it redirects to /index.html

2. Forwarding API requests:
   - All requests to /api/ are directed to the actual API server
   - The rest is handled as part of the SPA

3. Static file optimization:
   - We add cache headers for static files to increase performance

## Advanced NGINX Configuration for SPA with Reverse Proxy

Here's a more elaborate configuration with additional optimizations:

```nginx
server {
    listen 80;
    server_name spa.example.com;

    # HTTP to HTTPS redirect
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name spa.example.com;

    # SSL Configuration
    ssl_certificate /etc/nginx/ssl/example.com.crt;
    ssl_certificate_key /etc/nginx/ssl/example.com.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';

    # Main folder with SPA files
    root /var/www/spa-app/dist;
    index index.html;

    # Security headers
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';";

    # Compression
    gzip on;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Proxy buffering
    proxy_buffers 16 16k;
    proxy_buffer_size 16k;

    # Forwarding to API
    location /api/ {
        proxy_pass http://backend:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Static files with long cache
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
        expires max;
        add_header Cache-Control "public, immutable, max-age=31536000";
        try_files $uri =404;
    }

    # Robots.txt file
    location = /robots.txt {
        add_header Content-Type text/plain;
        return 200 "User-agent: *\nDisallow: /api/\n";
    }

    # SPA handling
    location / {
        try_files $uri $uri/ /index.html;
        add_header Cache-Control "no-cache, no-store, must-revalidate";
    }
}
```

This elaborate configuration includes:

- HTTP to HTTPS redirect
- HTTP/2 support for better performance
- Security headers
- Compression for all text file types
- Advanced buffering settings
- Different caching strategies for static files and the main index.html file
- Automatic robots.txt file generation

## NGINX Configuration for SPA with Multiple Environments

Often we need to handle different environments (development, staging, production) on the same server:

```nginx
# Upstream servers
upstream backend_production {
    server production-api:3000;
}

upstream backend_staging {
    server staging-api:3000;
}

# Production
server {
    listen 443 ssl http2;
    server_name app.example.com;

    root /var/www/production/dist;

    location /api/ {
        proxy_pass http://backend_production;
        # standard proxy headers
    }

    location / {
        try_files $uri $uri/ /index.html;
    }
}

# Staging
server {
    listen 443 ssl http2;
    server_name staging.example.com;

    # Basic authentication for staging environment
    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/.htpasswd;

    root /var/www/staging/dist;

    location /api/ {
        proxy_pass http://backend_staging;
        # standard proxy headers
    }

    location / {
        try_files $uri $uri/ /index.html;
    }
}
```

## Handling Multiple SPA Applications

If we have multiple SPA applications that we want to host under different paths:

```nginx
server {
    listen 443 ssl http2;
    server_name apps.example.com;

    # First SPA application
    location /app1/ {
        alias /var/www/app1/dist/;
        try_files $uri $uri/ /app1/index.html;
    }

    # Second SPA application
    location /app2/ {
        alias /var/www/app2/dist/;
        try_files $uri $uri/ /app2/index.html;
    }

    # Backend API
    location /api/ {
        proxy_pass http://backend:3000;
        # standard proxy headers
    }
}
```

## Best Practices for NGINX with SPA

1. **Always use HTTPS** - nowadays HTTPS is the standard
2. **Set appropriate cache headers**:
   - Long cache for immutable files (js, css with hash in the name)
   - No cache for index.html to ensure quick application updates
3. **Use HTTP/2** - significantly improves performance, especially for SPAs
4. **Enable compression** - reduces the size of transferred data
5. **Monitor performance** - use available monitoring tools
6. **Optimize timeouts** - adjust timeout settings to the specifics of your application
7. **Implement security measures** - use security headers, CORS, etc.
8. **Test configuration** - use `nginx -t` before applying changes

## Troubleshooting NGINX and SPA Issues

### Problem: Page refresh leads to 404 error

**Solution**: Make sure the `try_files $uri $uri/ /index.html;` directive is correctly configured. This is responsible for redirecting all unfound paths to the main SPA file.

### Problem: SPA cannot communicate with the API

**Solution**: Check CORS configuration and ensure that proxy_pass is correctly configured:

```nginx
location /api/ {
    proxy_pass http://backend:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;

    # CORS headers
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS, PUT, DELETE' always;
    add_header 'Access-Control-Allow-Headers' 'Origin, X-Requested-With, Content-Type, Accept, Authorization' always;
}
```

### Problem: Application loading time is too long

**Solution**: Optimize cache and compression settings:

```nginx
# Compression
gzip on;
gzip_comp_level 5;
gzip_min_length 256;
gzip_proxied any;
gzip_vary on;
gzip_types
  application/javascript
  application/json
  application/x-javascript
  text/css
  text/javascript
  text/plain;

# Static files with appropriate cache
location ~* \.(js|css)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}
```

## Summary

NGINX is a powerful tool for handling SPA applications as a reverse proxy. The key directive `try_files $uri $uri/ /index.html;` ensures proper operation of client-side routing, which is essential for SPAs.

Thanks to advanced NGINX features such as load balancing, caching, compression, and SSL handling, we can significantly increase the performance, security, and reliability of our SPA applications.

Remember that each configuration should be tailored to the specific needs of your application, but the presented patterns provide a solid foundation on which you can build solutions adapted to specific requirements.

---


## Reverse Proxy z NGINX

**URL:** https://portfolio.sdet.pl/articles/reverse-proxy-nginx
**Published:** 2025-02-25
**Language:** pl
Tags: nginx, infrastructure, reverse-proxy

Jak to ugryźć? - SPA + Reverse Proxy z NGINX, produkcyjne wzorce.

## Czym jest Reverse Proxy?

Reverse proxy to serwer pośredniczący, który działa jako "brama" między klientami a serwerami aplikacji. W przeciwieństwie do zwykłego proxy, które działa po stronie klienta, reverse proxy znajduje się po stronie serwera. Gdy klient wysyła żądanie do serwera, trafia ono najpierw do reverse proxy, który następnie przekierowuje je do odpowiedniego serwera backend, pobiera odpowiedź i przesyła ją z powrotem do klienta.

## Jak działa Reverse Proxy w NGINX?

1. Klient wysyła żądanie HTTP do domeny (np. example.com)
2. Żądanie trafia do serwera NGINX działającego jako reverse proxy
3. NGINX, na podstawie konfiguracji, przekierowuje żądanie do odpowiedniego serwera aplikacji
4. Serwer aplikacji przetwarza żądanie i wysyła odpowiedź do NGINX
5. NGINX przekazuje odpowiedź z powrotem do klienta

Klient komunikuje się wyłącznie z NGINX, nie mając bezpośredniego dostępu do serwerów backend.

## Zalety używania NGINX jako Reverse Proxy

### 1. Zwiększone bezpieczeństwo

- **Izolacja serwerów backend** - serwery aplikacji nie są bezpośrednio wystawione na internet
- **Filtrowanie ruchu** - możliwość blokowania złośliwych żądań za pomocą modułów bezpieczeństwa
- **Ochrona przed atakami DDoS** - limitowanie liczby połączeń i żądań
- **Ukrywanie szczegółów infrastruktury** - klienci nie znają wewnętrznej struktury sieci

### 2. Równoważenie obciążenia (Load Balancing)

- Wbudowane mechanizmy dystrybucji ruchu między wieloma serwerami
- Różne algorytmy równoważenia (round-robin, least connections, ip-hash)
- Sprawdzanie stanu serwerów i automatyczne wyłączanie niesprawnych instancji

### 3. Poprawa wydajności

- **Wydajny caching** - przechowywanie statycznych plików i częstych odpowiedzi
- **Kompresja Gzip/Brotli** - zmniejszanie rozmiaru transferowanych danych
- **SSL Termination** - odciążenie serwerów aplikacji z operacji kryptograficznych
- **HTTP/2 i HTTP/3** - obsługa nowoczesnych protokołów zwiększających wydajność

### 4. Zarządzanie aplikacjami

- **Virtual hosting** - obsługa wielu domen na jednym serwerze
- **Przekierowania URL** - łatwa zmiana struktury adresów bez modyfikacji aplikacji
- **Zmiana nagłówków** - dodawanie lub modyfikacja nagłówków HTTP

## Podstawowa konfiguracja NGINX jako Reverse Proxy

Oto podstawowa konfiguracja NGINX działającego jako reverse proxy:

```nginx
server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://backend_server:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

Ta konfiguracja:

- Nasłuchuje na porcie 80 dla żądań do example.com
- Przekierowuje wszystkie żądania do serwera backend działającego na http://backend_server:8080
- Przekazuje oryginalne nagłówki, aby serwer backend znał prawdziwy adres IP klienta i inne informacje

## Konfiguracja NGINX dla Single Page Applications (SPA)

Aplikacje typu Single Page Application (SPA) wymagają specjalnej konfiguracji, ponieważ routing jest obsługiwany po stronie klienta przez JavaScript, a nie przez serwer. Kluczowe jest przekierowanie wszystkich żądań do pliku index.html, aby obsłużyć routing kliencki.

Oto kompletna konfiguracja NGINX dla SPA działającej za reverse proxy:

```nginx
server {
    listen 80;
    server_name spa-example.com;

    # Główny folder z plikami statycznymi SPA
    root /var/www/spa-app/dist;

    # Przekazywanie API do serwera backend
    location /api/ {
        proxy_pass http://api-server:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Obsługa plików statycznych z cache
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
        expires 30d;
        add_header Cache-Control "public, no-transform";
    }

    # Kluczowa konfiguracja dla SPA - przekierowanie do index.html
    location / {
        try_files $uri $uri/ /index.html;
    }
}
```

### Wyjaśnienie konfiguracji dla SPA:

1. **`try_files $uri $uri/ /index.html;`** - Jest to kluczowa dyrektywa dla SPA:
   - Najpierw NGINX próbuje znaleźć plik odpowiadający żądanemu URI ($uri)
   - Jeśli nie znajdzie pliku, próbuje znaleźć katalog ($uri/)
   - Jeśli ani plik ani katalog nie istnieją, przekierowuje do /index.html

2. Przekazywanie żądań API:
   - Wszystkie żądania do /api/ są kierowane do rzeczywistego serwera API
   - Reszta jest obsługiwana jako część SPA

3. Optymalizacja plików statycznych:
   - Dodajemy nagłówki cache dla plików statycznych, aby zwiększyć wydajność

## Zaawansowana konfiguracja NGINX dla SPA z Reverse Proxy

Oto bardziej rozbudowana konfiguracja z dodatkowymi optymalizacjami:

```nginx
server {
    listen 80;
    server_name spa.example.com;

    # Przekierowanie HTTP na HTTPS
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name spa.example.com;

    # Konfiguracja SSL
    ssl_certificate /etc/nginx/ssl/example.com.crt;
    ssl_certificate_key /etc/nginx/ssl/example.com.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';

    # Główny folder z plikami SPA
    root /var/www/spa-app/dist;
    index index.html;

    # Nagłówki bezpieczeństwa
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';";

    # Kompresja
    gzip on;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buforowanie proxy
    proxy_buffers 16 16k;
    proxy_buffer_size 16k;

    # Przekazywanie do API
    location /api/ {
        proxy_pass http://backend:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouty
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Pliki statyczne z długim cache
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
        expires max;
        add_header Cache-Control "public, immutable, max-age=31536000";
        try_files $uri =404;
    }

    # Plik Robots.txt
    location = /robots.txt {
        add_header Content-Type text/plain;
        return 200 "User-agent: *\nDisallow: /api/\n";
    }

    # Obsługa SPA
    location / {
        try_files $uri $uri/ /index.html;
        add_header Cache-Control "no-cache, no-store, must-revalidate";
    }
}
```

Ta rozbudowana konfiguracja zawiera:

- Przekierowanie HTTP na HTTPS
- Obsługę HTTP/2 dla lepszej wydajności
- Nagłówki bezpieczeństwa
- Kompresję dla wszystkich typów plików tekstowych
- Zaawansowane ustawienia buforowania
- Różne strategie cache dla plików statycznych i głównego pliku index.html
- Automatyczną generację pliku robots.txt

## Konfiguracja NGINX dla SPA z wieloma środowiskami

Często potrzebujemy obsługiwać różne środowiska (development, staging, production) na tym samym serwerze:

```nginx
# Upstream servers
upstream backend_production {
    server production-api:3000;
}

upstream backend_staging {
    server staging-api:3000;
}

# Produkcja
server {
    listen 443 ssl http2;
    server_name app.example.com;

    root /var/www/production/dist;

    location /api/ {
        proxy_pass http://backend_production;
        # standardowe nagłówki proxy
    }

    location / {
        try_files $uri $uri/ /index.html;
    }
}

# Staging
server {
    listen 443 ssl http2;
    server_name staging.example.com;

    # Podstawowa autentykacja dla środowiska staging
    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/.htpasswd;

    root /var/www/staging/dist;

    location /api/ {
        proxy_pass http://backend_staging;
        # standardowe nagłówki proxy
    }

    location / {
        try_files $uri $uri/ /index.html;
    }
}
```

## Obsługa wielu aplikacji SPA

Jeśli mamy wiele aplikacji SPA, które chcemy hostować pod różnymi ścieżkami:

```nginx
server {
    listen 443 ssl http2;
    server_name apps.example.com;

    # Pierwsza aplikacja SPA
    location /app1/ {
        alias /var/www/app1/dist/;
        try_files $uri $uri/ /app1/index.html;
    }

    # Druga aplikacja SPA
    location /app2/ {
        alias /var/www/app2/dist/;
        try_files $uri $uri/ /app2/index.html;
    }

    # Backend API
    location /api/ {
        proxy_pass http://backend:3000;
        # standardowe nagłówki proxy
    }
}
```

## Dobre praktyki dla NGINX z SPA

1. **Zawsze stosuj HTTPS** - w dzisiejszych czasach HTTPS jest standardem
2. **Ustawiaj odpowiednie nagłówki cache**:
   - Długie cache dla niezmiennych plików (js, css z hash w nazwie)
   - Brak cache dla index.html, aby zapewnić szybkie aktualizacje aplikacji
3. **Używaj HTTP/2** - znacząco poprawia wydajność, szczególnie dla SPA
4. **Włącz kompresję** - zmniejsza rozmiar przesyłanych danych
5. **Monitoruj wydajność** - korzystaj z dostępnych narzędzi do monitorowania
6. **Optymalizuj timeouty** - dostosuj ustawienia timeoutów do specyfiki aplikacji
7. **Implementuj zabezpieczenia** - używaj nagłówków bezpieczeństwa, CORS, itp.
8. **Testuj konfigurację** - używaj `nginx -t` przed zastosowaniem zmian

## Rozwiązywanie problemów z NGINX i SPA

### Problem: Odświeżanie strony prowadzi do błędu 404

**Rozwiązanie**: Upewnij się, że dyrektywa `try_files $uri $uri/ /index.html;` jest poprawnie skonfigurowana. To ona odpowiada za przekierowanie wszystkich nieznalezionych ścieżek do głównego pliku SPA.

### Problem: Aplikacja SPA nie może komunikować się z API

**Rozwiązanie**: Sprawdź konfigurację CORS i upewnij się, że proxy_pass jest poprawnie skonfigurowane:

```nginx
location /api/ {
    proxy_pass http://backend:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;

    # Nagłówki CORS
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS, PUT, DELETE' always;
    add_header 'Access-Control-Allow-Headers' 'Origin, X-Requested-With, Content-Type, Accept, Authorization' always;
}
```

### Problem: Zbyt długi czas ładowania aplikacji

**Rozwiązanie**: Zoptymalizuj ustawienia cache i kompresji:

```nginx
# Kompresja
gzip on;
gzip_comp_level 5;
gzip_min_length 256;
gzip_proxied any;
gzip_vary on;
gzip_types
  application/javascript
  application/json
  application/x-javascript
  text/css
  text/javascript
  text/plain;

# Pliki statyczne z odpowiednim cache
location ~* \.(js|css)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}
```

## Podsumowanie

NGINX jest potężnym narzędziem do obsługi aplikacji SPA jako reverse proxy. Kluczowa dyrektywa `try_files $uri $uri/ /index.html;` zapewnia prawidłowe działanie routingu po stronie klienta, co jest niezbędne dla SPA.

Dzięki zaawansowanym funkcjom NGINX, takim jak równoważenie obciążenia, cache, kompresja i obsługa SSL, możemy znacznie zwiększyć wydajność, bezpieczeństwo i niezawodność naszych aplikacji SPA.

Pamiętaj, że każda konfiguracja powinna być dostosowana do konkretnych potrzeb aplikacji, ale przedstawione wzorce stanowią solidną podstawę, na której można budować rozwiązania dostosowane do specyficznych wymagań.

---


## Poring Over Code in 2025 - Sensible or Outdated?

**URL:** https://portfolio.sdet.it/articles/learning-coding-and-AI
**Published:** 2025-02-24
**Language:** en
Tags: ai, career, philosophy

Is it worth poring over code in the era of AI? Programmer vs machine - where manual effort still pays off.

Do you remember all those sleepless nights over code? Hours spent on Stack Overflow, desperately looking for answers to a bug that turned your project upside down? Or maybe those moments of enlightenment after your third coffee, when you finally find a typo in a variable? Well, welcome to the world of traditional programming! But wait... It's 2024, and artificial intelligence is knocking on our IDEs with the promise of ending all these frustrations. Does this mean we can put documentation on the shelf and let AI take the wheel?

## Old School, or How Steel Was Tempered

Let's start with what learning programming looked like "in the old days" (read: just a few years ago). Imagine a young adept of the programming art, who excitedly opens their first documentation. In front of them are hundreds of pages of technical jargon, and each line of code is like a hieroglyph requiring decryption. It was a world where Google and Stack Overflow were a programmer's best friends, and every problem solution resembled a detective investigation.

The traditional learning method has something of military training - it's difficult, sometimes painful, but it builds character. You spend hours analyzing code, debugging an application line by line, and each success tastes like a personal victory. It's in these moments of frustration that real problem-solving skills are born.

### The Charms of the "Old School"

Programming by the traditional method is like learning to ride a bicycle without training wheels. The beginnings are difficult and painful, but once you catch your balance, no hill is scary for you. Every error in the code is a lesson, every application crash is an opportunity to learn. I remember once spending three days looking for a bug in the code, only to discover that I had forgotten a semicolon. But do I regret that time? Absolutely not! It's exactly these experiences that teach humility and precision.

## New Era: AI Enters the Game

And now let's move to the present, where AI is like that smarter classmate who always has a solution at hand. GitHub Copilot suggests code before you have time to think about what you need. Claude and ChatGPT are ready to explain the most intricate programming concepts to you, and various AI tools practically write code for you.

Sounds like a programming utopia, right? Well, not so fast...

### The Bittersweet Taste of AI

Imagine the situation: you're working on a new project, and your AI friend generates code faster than you can read it. Everything works perfectly until... it doesn't. And then the real fun begins. Because how do you debug code that you don't fully understand? It's a bit like trying to repair a car when all you know how to do is press the gas and brake.

AI can be like an overprotective parent - it solves all your problems, but is it really doing you a favor? Sure, code is created instantly, but do you really understand what's happening "under the hood"?

## Real Stories from the Front

Meet Mark, a junior who decided to learn programming exclusively with AI help. Initially, everything went smoothly - projects were created quickly, and the code looked professional. The problem appeared during his first job interview when a question about the basics of asynchronicity in JavaScript came up. AI couldn't prompt him, and Mark... well, let's just say he didn't get that job.

On the other hand, we have Anna, a senior developer with 10 years of experience, who treats AI as her assistant. She uses it to automate tedious tasks, generate tests and documentation, but always carefully verifies each line of code. As she says herself: "AI is a great tool, but you need to know when and how to use it."

## The Golden Mean, or How Not to Get Lost in the AI World

The truth is that we don't have to choose between being a programming purist and complete dependence on AI. The best approach is... common sense! Think of AI as a very intelligent assistant. It's great that it helps you in your work, but you should make the final decisions yourself.

### Recipe for Success in the AI Era

Start with solid foundations - yes, this means some "slaving over code" and reading documentation. It's like learning the alphabet before trying to write a novel. Once you understand the basics, AI becomes your ally, not a ball and chain.

Use AI wisely - let it generate code for you, but always analyze it. Treat it like checking homework - trust, but verify. Remember that AI is a tool, not a magic wand solving all problems.

## What Will the Future Bring?

Will traditional programming be like knowledge of Latin in 10 years - respected but impractical? I seriously doubt it. Paradoxically, with the development of AI, a fundamental understanding of programming becomes even more important. Because who will verify and optimize the code generated by AI? Who will design the architecture of systems? Who will make key technological decisions?

### Epilogue: Programmer 2.0

The programmer of the future is someone who can combine the "old school" with new technologies. It's a person who understands the fundamentals but isn't afraid to use modern tools. It's someone who knows when it's worth spending an hour debugging and when to ask AI for help.

So is it worth learning programming by traditional methods in the AI era? The answer is: absolutely yes! But with wise use of new tools. Because in the end, it's not about being a purist or a technological revolutionary, but about being an effective programmer.

And now, if you'll excuse me, I have to get back to debugging code... which this time was generated by AI. Life can be ironic, can't it?

---


## Ślęczenie nad kodem w 2025 - sensowne czy przestarzałe?

**URL:** https://portfolio.sdet.pl/articles/learning-coding-and-AI
**Published:** 2025-02-24
**Language:** pl
Tags: ai, career, philosophy

Czy warto ślęczeć nad kodem w erze AI? Programista kontra maszyna - gdzie manualny wysiłek nadal się opłaca.

Pamiętasz te wszystkie nieprzespane noce nad kodem? Godziny spędzone na Stack Overflow, desperacko szukając odpowiedzi na błąd, który wywrócił twój projekt do góry nogami? A może te momenty olśnienia po trzeciej kawie, gdy wreszcie znajdujesz literówkę w zmiennej? Cóż, witaj w świecie tradycyjnego programowania! Ale czekaj... Jest rok 2024, a sztuczna inteligencja puka do naszych IDE z obietnicą końca tych wszystkich frustracji. Czy to oznacza, że możemy odłożyć dokumentację na półkę i pozwolić AI przejąć stery?

## Stara szkoła, czyli jak hartowała się stal

Zacznijmy od tego, jak wyglądała nauka programowania "za dawnych czasów" (czytaj: jeszcze kilka lat temu). Wyobraź sobie młodego adepta sztuki programowania, który z wypiekami na twarzy otwiera swoją pierwszą dokumentację. Przed nim setki stron technicznego żargonu, a każda linijka kodu to jak hieroglif wymagający deszyfracji. To był świat, gdzie Google i Stack Overflow były najlepszymi przyjaciółmi programisty, a każde rozwiązanie problemu przypominało detektywistyczne śledztwo.

Tradycyjna metoda nauki ma w sobie coś z treningu wojskowego - jest trudno, czasem boli, ale buduje charakter. Spędzasz godziny analizując kod, debugując aplikację linijka po linijce, a każdy sukces smakuje jak osobiste zwycięstwo. To właśnie w tych momentach frustracji rodzą się prawdziwe umiejętności rozwiązywania problemów.

### Uroki "starej szkoły"

Programowanie metodą tradycyjną to jak nauka jazdy na rowerze bez kółek bocznych. Początki są trudne i bolesne, ale gdy już złapiesz balans, żadna górka nie jest ci straszna. Każdy błąd w kodzie to lekcja, każdy crash aplikacji to okazja do nauki. Pamiętam, jak kiedyś spędziłem trzy dni szukając błędu w kodzie, tylko po to, by odkryć, że zapomniałem średnika. Ale czy żałuję tego czasu? Absolutnie nie! To właśnie takie doświadczenia uczą pokory i dokładności.

## Nowa era: AI wkracza do gry

A teraz przenieśmy się do teraźniejszości, gdzie AI jest jak ten mądrzejszy kolega z roku, który zawsze ma rozwiązanie pod ręką. GitHub Copilot podpowiada ci kod, zanim zdążysz pomyśleć, czego potrzebujesz. Claude i ChatGPT są gotowe wytłumaczyć ci najbardziej zawiłe koncepty programowania, a różne narzędzia AI praktycznie piszą kod za ciebie.

Brzmi jak programistyczna utopia, prawda? Cóż, nie tak szybko...

### Słodko-gorzki smak AI

Wyobraź sobie sytuację: siedzisz nad nowym projektem, a twój przyjaciel AI generuje kod szybciej niż możesz go przeczytać. Wszystko działa idealnie, dopóki... nie przestaje. I wtedy zaczyna się prawdziwa zabawa. Bo jak zdebugujesz kod, którego do końca nie rozumiesz? To trochę jak próba naprawy samochodu, gdy jedyne co umiesz, to wciskać gaz i hamulec.

AI potrafi być jak nadopiekuńczy rodzic - rozwiązuje wszystkie twoje problemy, ale czy na pewno robi ci tym przysługę? Jasne, kod powstaje błyskawicznie, ale czy na pewno rozumiesz, co się dzieje "pod maską"?

## Prawdziwe historie z frontu

Poznajmy Marka, juniora, który postanowił nauczyć się programowania wyłącznie z pomocą AI. Początkowo wszystko szło jak z płatka - projekty powstawały błyskawicznie, a kod wyglądał profesjonalnie. Problem pojawił się podczas pierwszej rozmowy rekrutacyjnej, gdy padło pytanie o podstawy działania asynchroiczności w JavaScript. AI nie mogło podpowiedzieć, a Marek... cóż, powiedzmy, że nie dostał tej pracy.

Z drugiej strony mamy Annę, senior developerkę z 10-letnim stażem, która traktuje AI jak swojego asystenta. Używa go do automatyzacji żmudnych zadań, generowania testów i dokumentacji, ale zawsze dokładnie weryfikuje każdą linijkę kodu. Jak sama mówi: "AI to świetne narzędzie, ale trzeba wiedzieć, kiedy i jak go używać".

## Złoty środek, czyli jak się nie zagubić w świecie AI

Prawda jest taka, że nie musimy wybierać między byciem programistycznym purystą a całkowitym uzależnieniem od AI. Najlepsze podejście to... zdrowy rozsądek! Wyobraź sobie AI jako bardzo inteligentnego asystenta. Świetnie, że pomaga ci w pracy, ale ostateczne decyzje powinieneś podejmować sam.

### Przepis na sukces w erze AI

Zacznij od solidnych podstaw - tak, to oznacza trochę "ślęczenia nad kodem" i czytania dokumentacji. To jak nauka alfabetu przed próbą napisania powieści. Gdy już rozumiesz podstawy, AI staje się twoim sprzymierzeńcem, a nie kulą u nogi.

Używaj AI mądrze - niech generuje dla ciebie kod, ale zawsze go analizuj. Traktuj to jak sprawdzanie pracy domowej - ufaj, ale sprawdzaj. Pamiętaj, że AI to narzędzie, a nie magiczna różdżka rozwiązująca wszystkie problemy.

## Co przyniesie przyszłość?

Czy za 10 lat tradycyjne programowanie będzie jak znajomość łaciny - szanowana, ale niepraktyczna? Szczerze wątpię. Paradoksalnie, wraz z rozwojem AI, fundamentalne zrozumienie programowania staje się jeszcze ważniejsze. Bo kto będzie weryfikował i optymalizował kod generowany przez AI? Kto będzie projektował architekturę systemów? Kto będzie podejmował kluczowe decyzje technologiczne?

### Epilog: Programista 2.0

Programista przyszłości to ktoś, kto potrafi połączyć "starą szkołę" z nowymi technologiami. To osoba, która rozumie fundamenty, ale nie boi się wykorzystywać nowoczesnych narzędzi. To ktoś, kto wie, kiedy warto spędzić godzinę na debugowaniu, a kiedy poprosić AI o pomoc.

Więc czy warto uczyć się programowania tradycyjnymi metodami w erze AI? Odpowiedź brzmi: absolutnie tak! Ale z mądrym wykorzystaniem nowych narzędzi. Bo w końcu nie chodzi o to, by być purystą albo technologicznym rewolucjonistą, ale o to, by być skutecznym programistą.

A teraz, jeśli wybaczysz, muszę wrócić do debugowania kodu... który tym razem wygenerowało AI. Życie bywa ironiczne, prawda?

Kompletnie przepisałem artykuł, nadając mu bardziej osobisty i narracyjny charakter. Teraz czyta się go bardziej jak opowieść czy felieton, zachowując jednocześnie wartość merytoryczną. Dodałem elementy humoru i prawdziwe przykłady, które pomagają lepiej zrozumieć omawiane zagadnienia.

Czy taki styl bardziej odpowiada temu, co miałeś na myśli? Mogę jeszcze dostosować ton lub dodać więcej przykładów z życia, jeśli chcesz.

---


## Avoiding waitForTimeout in tests

**URL:** https://portfolio.sdet.it/articles/avoiding-wait-for-timeout
**Published:** 2025-02-02
**Language:** en
Tags: playwright, testing, automation

Avoiding waitForTimeout in Playwright Testing: advantages, disadvantages, and alternatives.

In our daily work on test automation, we often encounter the need to wait for certain changes in the user interface. A standard approach is to use fixed timeouts, specifically the [waitForTimeout](https://playwright.dev/docs/api/class-page#page-wait-for-timeout) method. However, such a solution can lead to issues related to test performance and stability. In this article, we'll explore why it's worth avoiding fixed timeouts, what are their advantages and disadvantages, and present alternative approaches based on dynamic waiting.

## Why Should We Avoid Fixed Timeouts?

Fixed timeouts, meaning constant delays implemented using **waitForTimeout**, have several significant limitations:

- **Suboptimal test execution time** - A fixed waiting time may be too long or too short. If we set a timeout that's too long, tests will run slower; too short, and elements might not have enough time to load.
- **Lower determinism** - Tests based on fixed waits can be unreliable because they depend on timing assumptions rather than the actual state of elements.
- **Maintenance difficulties** - When the interface changes, timeouts need to be manually adjusted in multiple places.

## Alternatives - Waiting for Specific Element States

Instead of using fixed timeouts, it's better to implement approaches based on waiting for a specific element state, such as **visible**, or for changes in element attributes. This approach increases test stability and determinism. An example is using the **waitForState** method in combination with universal helper methods:

### Example Helper Methods

#### Method for Waiting for a Value

```typescript
/**
 * Internal helper method that waits for an expected value.
 *
 * @param getValue - Function returning the current value (Promise<T | null>).
 * @param expectedValue - Expected value.
 * @param timeout - Maximum waiting time (default 5000 ms).
 * @param interval - Interval between attempts (default 100 ms).
 * @param useIncludes - If true, we check if the value contains expectedValue.
 * @returns Returns the found value or null.
 */
private static async waitForValueInternal<T>(
  getValue: () => Promise<T | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
): Promise<T | null> {
  const startTime = Date.now();

  while (Date.now() - startTime < timeout) {
    const value = await getValue();
    if (useIncludes
      ? typeof value === "string" && value.includes(expectedValue)
      : value === expectedValue) {
      return value;
    }
    await new Promise((resolve) => setTimeout(resolve, interval));
  }

  return null;
}

/**
 * Public method waiting for a value.
 *
 * @param getValue - Function returning the current value.
 * @param expectedValue - Expected value.
 * @param timeout - Maximum waiting time.
 * @param interval - Interval between attempts.
 * @param useIncludes - Whether to use includes method for comparison.
 * @returns Found value.
 * @throws Error if the value doesn't appear within the given time.
 */
static async waitForValue<T>(
  getValue: () => Promise<T | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
): Promise<T> {
  const value = await Pb.waitForValueInternal(getValue, expectedValue, timeout, interval, useIncludes);
  if (value === null) {
    throw new Error(`Expected value "${expectedValue}" did not appear within the timeout period`);
  }
  return value;
}

/**
 * Method returning boolean based on waiting for a value.
 *
 * @param getValue - Function returning the current value.
 * @param expectedValue - Expected value.
 * @param timeout - Maximum waiting time.
 * @param interval - Interval between attempts.
 * @param useIncludes - Whether to use includes for comparison.
 * @returns True if the value appeared, otherwise false.
 */
static async waitForValueBoolean(
  getValue: () => Promise<string | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
) {
  const value = await Pb.waitForValueInternal(getValue, expectedValue, timeout, interval, useIncludes);
  return value !== null;
}
```

#### Methods Waiting for Element States

#### Below are examples of methods that don't use waitForTimeout, but wait for a specific element state:

```typescript
/**
 * Universal method that waits until an element reaches a specific state,
 * then performs a given action.
 *
 * @param locator - Locator object.
 * @param state - Expected element state (e.g., Visible).
 * @param action - Callback with the action to perform.
 * @param timeout - Maximum waiting time.
 */
private static async performAction(
  locator: Locator,
  state: LocatorState,
  action: () => Promise<void>,
  timeout: number = 5000
) {
  await this.waitForState(locator, state, timeout);
  await action();
}

/**
 * Waits until an element becomes visible, then performs a click.
 *
 * @param locator - Locator object.
 */
static async waitAndClick(locator: Locator) {
  await this.performAction(locator, LocatorState.Visible, () => locator.click());
}

/**
 * Waits until an element becomes visible, fills it with the given value,
 * performs blur, then verifies the value.
 *
 * @param locator - Locator object.
 * @param value - Value to enter.
 * @throws Error if the entered value doesn't match the expected one.
 */
static async waitAndFill(locator: Locator, value: string) {
  await this.performAction(locator, LocatorState.Visible, async () => {
    await locator.fill(value);
    await locator.blur();
  });
  const inputValue = await locator.inputValue();
  if (inputValue !== value) {
    throw new Error(`Input value mismatch: expected "${value}", but got "${inputValue}"`);
  }
}

/**
 * Method waiting for an element state.
 *
 * @param locator - Locator object.
 * @param state - Expected state.
 * @param timeout - Maximum waiting time.
 * @throws Error if the element doesn't reach the expected state.
 */
static async waitForState(locator: Locator, state: LocatorState, timeout: number = 5000): Promise<void> {
  const count = await locator.count();
  for (let i = 0; i < count; i++) {
    const element = locator.nth(i);
    try {
      await element.waitFor({ state, timeout });
    } catch (error) {
      throw new Error(
        `Element at index ${i} with selector "${locator["_selector"]}" did not reach state "${state}" within ${timeout}ms.`
      );
    }
  }
}
```

#### Additionally, for situations when we're waiting for a specific number of elements to appear, this method can be helpful:

```typescript
/**
 * Waits until the number of elements matching the locator reaches a minimum value.
 *
 * @param locator - Locator object.
 * @param minCount - Minimum required number of elements.
 * @param timeout - Maximum waiting time.
 * @returns True when the condition is met.
 * @throws Error if the condition is not met.
 */
static async waitForMinimumCount(locator: Locator, minCount: number, timeout: number = 5000) {
  const startTime = Date.now();
  let currentCount = 0;

  while (Date.now() - startTime < timeout) {
    currentCount = await locator.count();
    if (currentCount >= minCount) {
      return true;
    }
    await new Promise((resolve) => setTimeout(resolve, 100));
  }

  throw new Error(`Expected at least ${minCount} elements, but found ${currentCount}.`);
}

```

# Advantages and Disadvantages of the Dynamic Waiting Approach

## Advantages

- **Test stability** - Actions are performed only when elements reach the expected state (e.g., become visible), which minimizes the risk of errors.
- **Better performance** - No fixed delays (hardcoded waits) means tests finish faster when elements load faster than anticipated.
- **Easier maintenance** - Changes in waiting logic can be introduced in central methods, affecting the entire test base.

## Disadvantages

- **Additional implementation** - Implementing waiting methods may require extra effort and modifications to existing code.
- **Complex debugging** - In case of failures, it may be more difficult to diagnose why an element didn't reach the expected state.
- **Possibility of unexpected timeouts** - If interface conditions change or delays occur, waiting methods may cause timeout exceedances.

## Summary

Avoiding fixed timeouts (**waitForTimeout**) in favor of dynamically waiting for specific element states significantly increases test stability and performance. By using universal helper methods that wait for specific conditions - such as element visibility, appearance of a specific value, or reaching a minimum number of elements - we can build a more deterministic test base that's resistant to minor application changes.

We encourage you to try these techniques in your Playwright projects to experience the benefits of a more intelligent approach to waiting for element states.

**Happy testing!**

---


## Unikanie waitForTimeout w testach Playwright

**URL:** https://portfolio.sdet.pl/articles/avoiding-wait-for-timeout
**Published:** 2025-02-02
**Language:** pl
Tags: playwright, testing, automation

Unikanie waitForTimeout w testach Playwright - zalety, wady i alternatywne podejścia.

W codziennej pracy nad automatyzacją testów często spotykamy się z koniecznością oczekiwania na pewne zmiany w interfejsie użytkownika. Standardowym podejściem bywa używanie sztywnych timeoutów, czyli metody [waitForTimeout](https://playwright.dev/docs/api/class-page#page-wait-for-timeout), jednak takie rozwiązanie może prowadzić do problemów związanych z wydajnością i stabilnością testów. W tym artykule przyjrzymy się, dlaczego warto unikać sztywnych timeoutów, jakie są ich zalety i wady oraz przedstawimy alternatywne podejścia oparte na dynamicznym oczekiwaniu.

## Dlaczego nie warto używać sztywnych timeoutów?

Sztywne timeouty, czyli stałe opóźnienia wprowadzane za pomocą **waitForTimeout**, mają kilka istotnych ograniczeń:

- **Nieoptymalny czas wykonania testów** - Ustalony czas oczekiwania może być zbyt długi lub za krótki. Jeśli ustawimy zbyt długi timeout, testy będą wykonywały się wolniej; zbyt krótki może spowodować, że elementy nie zdążą się załadować.
- **Niższa deterministyczność** - Testy oparte na sztywnych czeknięciach mogą być zawodnymi, gdyż opierają się na założeniach dotyczących czasu, a nie na faktycznym stanie elementów.
- **Trudności w utrzymaniu** - W przypadku zmian w interfejsie, konieczne jest ręczne modyfikowanie timeoutów w wielu miejscach.

## Alternatywy - oczekiwanie na konkretny stan elementu

Zamiast korzystać z sztywnych timeoutów, warto wdrożyć podejścia oparte na oczekiwaniu na określony stan elementu, na przykład **visible**, lub na zmianę atrybutów elementu. Takie podejście zwiększa stabilność i deterministyczność testów. Przykładem może być wykorzystanie metody **waitForState** w połączeniu z uniwersalnymi metodami pomocniczymi:

### Przykładowe metody pomocnicze

#### Metoda oczekująca na wartość

```typescript
/**
 * Wewnętrzna metoda pomocnicza, która czeka na oczekiwaną wartość.
 *
 * @param getValue - Funkcja zwracająca aktualną wartość (Promise<T | null>).
 * @param expectedValue - Oczekiwana wartość.
 * @param timeout - Maksymalny czas oczekiwania (domyślnie 5000 ms).
 * @param interval - Interwał pomiędzy kolejnymi próbami (domyślnie 100 ms).
 * @param useIncludes - Jeśli true, sprawdzamy czy wartość zawiera expectedValue.
 * @returns Zwraca znalezioną wartość lub null.
 */
private static async waitForValueInternal<T>(
  getValue: () => Promise<T | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
): Promise<T | null> {
  const startTime = Date.now();

  while (Date.now() - startTime < timeout) {
    const value = await getValue();
    if (useIncludes
      ? typeof value === "string" && value.includes(expectedValue)
      : value === expectedValue) {
      return value;
    }
    await new Promise((resolve) => setTimeout(resolve, interval));
  }

  return null;
}

/**
 * Publiczna metoda oczekująca na wartość.
 *
 * @param getValue - Funkcja zwracająca aktualną wartość.
 * @param expectedValue - Oczekiwana wartość.
 * @param timeout - Maksymalny czas oczekiwania.
 * @param interval - Interwał pomiędzy próbami.
 * @param useIncludes - Czy używać metody includes przy porównaniu.
 * @returns Znaleziona wartość.
 * @throws Błąd, jeśli wartość nie pojawi się w zadanym czasie.
 */
static async waitForValue<T>(
  getValue: () => Promise<T | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
): Promise<T> {
  const value = await Pb.waitForValueInternal(getValue, expectedValue, timeout, interval, useIncludes);
  if (value === null) {
    throw new Error(`Expected value "${expectedValue}" did not appear within the timeout period`);
  }
  return value;
}

/**
 * Metoda zwracająca boolean w oparciu o oczekiwanie na wartość.
 *
 * @param getValue - Funkcja zwracająca aktualną wartość.
 * @param expectedValue - Oczekiwana wartość.
 * @param timeout - Maksymalny czas oczekiwania.
 * @param interval - Interwał pomiędzy próbami.
 * @param useIncludes - Czy używać includes przy porównaniu.
 * @returns True, jeśli wartość pojawiła się, w przeciwnym razie false.
 */
static async waitForValueBoolean(
  getValue: () => Promise<string | null>,
  expectedValue: string,
  timeout: number = 5000,
  interval: number = 100,
  useIncludes: boolean = false
) {
  const value = await Pb.waitForValueInternal(getValue, expectedValue, timeout, interval, useIncludes);
  return value !== null;
}
```

#### Metody oczekujące na stan elementu

#### Poniżej przedstawiamy przykłady metod, które nie używają waitForTimeout, lecz czekają na określony stan elementu:

```typescript
/**
 * Uniwersalna metoda, która czeka, aż element osiągnie określony stan,
 * a następnie wykonuje zadaną akcję.
 *
 * @param locator - Obiekt Locator.
 * @param state - Oczekiwany stan elementu (np. Visible).
 * @param action - Callback z akcją do wykonania.
 * @param timeout - Maksymalny czas oczekiwania.
 */
private static async performAction(
  locator: Locator,
  state: LocatorState,
  action: () => Promise<void>,
  timeout: number = 5000
) {
  await this.waitForState(locator, state, timeout);
  await action();
}

/**
 * Czeka aż element stanie się widoczny, a następnie wykonuje kliknięcie.
 *
 * @param locator - Obiekt Locator.
 */
static async waitAndClick(locator: Locator) {
  await this.performAction(locator, LocatorState.Visible, () => locator.click());
}

/**
 * Czeka, aż element stanie się widoczny, wypełnia go podaną wartością,
 * wykonuje blur, a następnie weryfikuje wartość.
 *
 * @param locator - Obiekt Locator.
 * @param value - Wartość do wpisania.
 * @throws Błąd, jeśli wartość wpisana nie odpowiada oczekiwanej.
 */
static async waitAndFill(locator: Locator, value: string) {
  await this.performAction(locator, LocatorState.Visible, async () => {
    await locator.fill(value);
    await locator.blur();
  });
  const inputValue = await locator.inputValue();
  if (inputValue !== value) {
    throw new Error(`Input value mismatch: expected "${value}", but got "${inputValue}"`);
  }
}

/**
 * Metoda oczekująca na stan elementu.
 *
 * @param locator - Obiekt Locator.
 * @param state - Oczekiwany stan.
 * @param timeout - Maksymalny czas oczekiwania.
 * @throws Błąd, jeśli element nie osiągnie oczekiwanego stanu.
 */
static async waitForState(locator: Locator, state: LocatorState, timeout: number = 5000): Promise<void> {
  const count = await locator.count();
  for (let i = 0; i < count; i++) {
    const element = locator.nth(i);
    try {
      await element.waitFor({ state, timeout });
    } catch (error) {
      throw new Error(
        `Element at index ${i} with selector "${locator["_selector"]}" did not reach state "${state}" within ${timeout}ms.`
      );
    }
  }
}
```

#### Dodatkowo, dla sytuacji gdy oczekujemy na pojawienie się określonej liczby elementów, pomocna może być metoda:

```typescript
/**
 * Czeka, aż liczba elementów odpowiadających locatorowi osiągnie minimalną wartość.
 *
 * @param locator - Obiekt Locator.
 * @param minCount - Minimalna wymagana liczba elementów.
 * @param timeout - Maksymalny czas oczekiwania.
 * @returns True, gdy warunek zostanie spełniony.
 * @throws Błąd, jeśli warunek nie zostanie spełniony.
 */
static async waitForMinimumCount(locator: Locator, minCount: number, timeout: number = 5000) {
  const startTime = Date.now();
  let currentCount = 0;

  while (Date.now() - startTime < timeout) {
    currentCount = await locator.count();
    if (currentCount >= minCount) {
      return true;
    }
    await new Promise((resolve) => setTimeout(resolve, 100));
  }

  throw new Error(`Expected at least ${minCount} elements, but found ${currentCount}.`);
}

```

# Zalety i wady podejścia opartego na dynamicznym oczekiwaniu

## Zalety

- **Stabilność testów** - Akcje są wykonywane dopiero, gdy elementy osiągną oczekiwany stan (np. stają się widoczne), co minimalizuje ryzyko wystąpienia błędów.
- **Lepsza wydajność** - Brak sztywnych opóźnień (hardcoded wait) sprawia, że testy kończą się szybciej, gdy elementy ładują się szybciej niż zakładano.
- **Łatwiejsze utrzymanie** - Zmiany w logice oczekiwania można wprowadzić w centralnych metodach, co wpływa na całą bazę testów.

## Wady

- **Dodatkowa implementacja** - Wdrożenie metod oczekujących może wymagać dodatkowego wysiłku i modyfikacji istniejącego kodu.
- **Skomplikowane debugowanie** - W przypadku awarii może być trudniej zdiagnozować, dlaczego element nie osiągnął oczekiwanego stanu.
- **Możliwość nieoczekiwanych timeoutów** - Jeśli warunki w interfejsie ulegną zmianie lub wystąpią opóźnienia, metody oczekujące mogą spowodować przekroczenie czasu oczekiwania.

## Podsumowanie

Unikanie sztywnych timeoutów (**waitForTimeout**) na rzecz dynamicznego oczekiwania na określony stan elementów znacząco podnosi stabilność i wydajność testów. Stosując uniwersalne metody pomocnicze, które czekają na określone warunki - takie jak widoczność elementów, pojawienie się konkretnej wartości lub osiągnięcie minimalnej liczby elementów - możemy zbudować bardziej deterministyczną i odporną na drobne zmiany aplikacji bazę testów.

Zachęcamy do wypróbowania przedstawionych technik w swoich projektach Playwright, by doświadczyć korzyści płynących z bardziej inteligentnego podejścia do oczekiwania na stan elementów.

**Miłego testowania!**

---


---

# From the Field series (36)

Flagship EN series - case studies from applying AI to QA and developer
workflow. Every 2 weeks. 3 LinkedIn parts (Tue/Wed/Thu) + full case study
on portfolio.


## 6 portals agent-ready in 70 minutes: the discovery layer 99% miss

**URL:** https://portfolio.sdet.it/from-the-field/agent-ready-portals-70min
**Published:** 2026-06-23
**Language:** en
Tags: agentic-web, discovery-layer, llms-txt, mcp, from-the-field

Agentic web is not a future trend. I shipped 6 portals for it in 70 minutes. Here is what the discovery layer actually looks like in production.

# 6 portals agent-ready in 70 minutes: the discovery layer 99% miss

**Coming Tuesday 23 June 2026, 9:00 CET**

Agentic web is not a future trend - it's a present capability most sites are not ready for. I shipped 6 production portals with proper discovery layer (llms.txt, MCP read-only, sitemap, JSON-LD) in 70 minutes. Here is the build, the receipts, and the 5 lessons.

## What you'll get in 3 parts

- **Part 1 (23.06)** - The discovery layer 99% of sites miss
- **Part 2 (24.06)** - 70 minutes for 6 portals - the build
- **Part 3 (25.06)** - Smoke tests + 5 lessons

---

**Bookmark this page.** First part publishes on 23 June 2026 at 9:00 CET.

→ See the [Series #03 manifesto](/from-the-field/context-first-qa-part-1) for the bigger picture this episode fits into.

---


## 6 portali agent-ready w 70 minut: discovery layer którego 99% nie widzi

**URL:** https://portfolio.sdet.pl/from-the-field/agent-ready-portals-70min
**Published:** 2026-06-23
**Language:** pl
Tags: agentic-web, discovery-layer, llms-txt, mcp, from-the-field

Agentic web to nie przyszły trend. Wysłałem 6 portali do niego w 70 minut. Oto jak naprawdę wygląda discovery layer na produkcji.

# 6 portali agent-ready w 70 minut: discovery layer którego 99% nie widzi

**Premiera wtorek 23 czerwca 2026, 9:00 CET**

Agentic web to nie przyszły trend - to teraźniejsza zdolność na którą większość stron nie jest gotowa. Wysłałem 6 produkcyjnych portali z porządną discovery layer (llms.txt, MCP read-only, sitemap, JSON-LD) w 70 minut. Tu jest build, paragony i 5 lekcji.

## Co dostajesz w 3 częściach

- **Część 1 (23.06)** - Discovery layer którego 99% stron nie widzi
- **Część 2 (24.06)** - 70 minut na 6 portali - build
- **Część 3 (25.06)** - Smoke testy + 5 lekcji

---

**Zakładkuj.** Pierwsza część publikuje się 23 czerwca 2026 o 9:00 CET.

→ Zobacz [manifest Series #03](/pl/from-the-field/context-first-qa-part-1) jeśli chcesz większy obraz w którym ten odcinek się mieści.

---


## Figma-to-code deterministic: no LLM at the data layer

**URL:** https://portfolio.sdet.it/from-the-field/figma-to-code-deterministic
**Published:** 2026-06-16
**Language:** en
Tags: figma, design-tokens, mcp, codegen, from-the-field

Design tokens, CSS diff, pixel-perfect validation. No LLM at the data layer. A deterministic Figma pipeline that does not hallucinate.

# Figma-to-code deterministic: no LLM at the data layer

**Coming Tuesday 16 June 2026, 9:00 CET**

Design tokens flow into CSS diff, pixel-perfect validation closes the loop, and the LLM never reads raw Figma JSON. The data layer is deterministic by construction - that is why this pipeline doesn't hallucinate even on long components.

## What you'll get in 3 parts

- **Part 1 (16.06)** - Why LLM at data layer hallucinates
- **Part 2 (17.06)** - MCP + element auto-detection
- **Part 3 (18.06)** - Codegen spec + commercial

---

**Bookmark this page.** First part publishes on 16 June 2026 at 9:00 CET.

→ See the [Series #03 manifesto](/from-the-field/context-first-qa-part-1) for the bigger picture this episode fits into.

---


## Figma-to-code deterministycznie: bez LLM w warstwie danych

**URL:** https://portfolio.sdet.pl/from-the-field/figma-to-code-deterministic
**Published:** 2026-06-16
**Language:** pl
Tags: figma, design-tokens, mcp, codegen, from-the-field

Design tokens, CSS diff, pixel-perfect validation. Bez LLM w warstwie danych. Deterministyczny pipeline Figma który nie halucynuje.

# Figma-to-code deterministycznie: bez LLM w warstwie danych

**Premiera wtorek 16 czerwca 2026, 9:00 CET**

Design tokeny lecą do CSS diff, pixel-perfect validation zamyka pętlę, a LLM nigdy nie czyta surowego Figma JSON. Warstwa danych jest deterministyczna z założenia - dlatego ten pipeline nie halucynuje nawet na długich komponentach.

## Co dostajesz w 3 częściach

- **Część 1 (16.06)** - Dlaczego LLM w warstwie danych halucynuje
- **Część 2 (17.06)** - MCP + element auto-detection
- **Część 3 (18.06)** - Codegen spec + commercial

---

**Zakładkuj.** Pierwsza część publikuje się 16 czerwca 2026 o 9:00 CET.

→ Zobacz [manifest Series #03](/pl/from-the-field/context-first-qa-part-1) jeśli chcesz większy obraz w którym ten odcinek się mieści.

---


## CDAT pattern: Page Objects reinvented in 4 layers

**URL:** https://portfolio.sdet.it/from-the-field/cdat-pattern
**Published:** 2026-06-09
**Language:** en
Tags: testing, playwright, patterns, page-objects, from-the-field

Page Objects do not scale past 50 tests. Here is a 4-layer pattern (data, actions, components, test) battle-tested across 9 projects.

# CDAT pattern: Page Objects reinvented in 4 layers

**Coming Tuesday 9 June 2026, 9:00 CET**

Page Objects don't scale past ~50 tests. The 4-layer CDAT pattern (data, actions, components, test) keeps contracts clean, mocking honest, and migrations cheap. Battle-tested across 9 projects since 2024, now packaged as a destylat repo.

## What you'll get in 3 parts

- **Part 1 (09.06)** - Where Page Objects break
- **Part 2 (10.06)** - 4 layers, clear contracts
- **Part 3 (11.06)** - Migration path + commercial

---

**Bookmark this page.** First part publishes on 9 June 2026 at 9:00 CET.

→ See the [Series #03 manifesto](/from-the-field/context-first-qa-part-1) for the bigger picture this episode fits into.

---


## Wzorzec CDAT: Page Objects od nowa w 4 warstwach

**URL:** https://portfolio.sdet.pl/from-the-field/cdat-pattern
**Published:** 2026-06-09
**Language:** pl
Tags: testing, playwright, patterns, page-objects, from-the-field

Page Objects nie skalują się powyżej 50 testów. Oto 4-warstwowy wzorzec (data, actions, components, test) battle-tested na 9 projektach.

# Wzorzec CDAT: Page Objects od nowa w 4 warstwach

**Premiera wtorek 9 czerwca 2026, 9:00 CET**

Page Objects nie skalują się powyżej ~50 testów. 4-warstwowy wzorzec CDAT (data, actions, components, test) trzyma kontrakty czyste, mockowanie uczciwe, migracje tanie. Battle-tested na 9 projektach od 2024, teraz spakowany jako destylat repo.

## Co dostajesz w 3 częściach

- **Część 1 (09.06)** - Gdzie Page Objects się sypią
- **Część 2 (10.06)** - 4 warstwy, jasne kontrakty
- **Część 3 (11.06)** - Ścieżka migracji + commercial

---

**Zakładkuj.** Pierwsza część publikuje się 9 czerwca 2026 o 9:00 CET.

→ Zobacz [manifest Series #03](/pl/from-the-field/context-first-qa-part-1) jeśli chcesz większy obraz w którym ten odcinek się mieści.

---


## Multi-page WCAG, Part 2: what site-wide compliance is actually worth

**URL:** https://portfolio.sdet.it/from-the-field/multipage-wcag-v04-build-part-2
**Published:** 2026-06-03
**Language:** en
Tags: wcag, accessibility, compliance, eaa, business-case, from-the-field

Your homepage passed its accessibility audit. That says almost nothing about whether your site is compliant - and since June 2025, that gap is a liability, not a nice-to-have.

Your homepage passed its accessibility audit. That says almost nothing about your site.

Part 1 was the engineering: how the discovery layer finds pages the homepage audit never touches, and how the deduper turns 5,808 instances of one bug into a single finding. This part is the question a CTO actually asks. What is that worth, and what does it cost to not have it?

No code here. Just the part that shows up on a budget.

## Single-page green is a false sign-off

You point an audit at your homepage. It comes back green. You sign off.

Meanwhile the checkout was never audited. Neither was the account area, the search results, the article templates, the booking flow. The pages where conversion and compliance actually live are exactly the pages a single-URL audit can't see.

Single-page-green doesn't mean the site is accessible. It means that one page is. I learned this on my own portfolio: the homepage was clean across three audit rounds, and the moment I pointed discovery at the router I found nine SERIOUS findings on three pages I'd never have audited by hand.

That gap - between "this page passes" and "this site complies" - used to be a quality concern. Now it's a legal one.

## Since June 2025, this stopped being optional

The European Accessibility Act (Directive 2019/882) applies from 28 June 2025. It covers e-commerce, consumer banking, ticketing and transport, e-books, and more. The technical baseline is WCAG-level conformance. Enforcement and penalties are set by each member state.

Read the scope carefully: the obligation is the service. The whole product. Not the landing page.

So you have a law scoped to the entire user journey, and an audit scoped to one URL. The distance between those two scopes is your exposure. A complaint doesn't arrive about the page you audited. It arrives about the page you didn't.

## The cost isn't "audit more pages"

Here's where the math usually goes wrong.

A manual accessibility audit across a real site - dozens to hundreds of pages - is weeks of specialist time. And the deliverable is a flat list of thousands of findings that nobody can action. Expensive to produce, useless to act on.

The lever isn't auditing more pages. It's reporting by cause instead of count. The 5,816 findings on my portfolio were three bugs. You don't pay anyone to triage 5,808 duplicate tickets - you fix one CSS variable and 34 pages go green at once. That deduplication is the cost saving. It just happens to look like a technical detail.

Same method, real numbers from my own work: a full performance audit took 7 hours with this approach against 16 billable hours the manual way, against a week the classic way. The portfolio went from a failing grade to A in eight commits, 75 minutes of work. Accessibility has the same shape. The audit-and-fix loop runs in hours, not a multi-week engagement, because the tool collapses symptoms into causes before a human ever reads the report.

## What you're actually buying

The ability to say your whole site is compliant. Every template, every gated route, every page where money changes hands. With a remediation path that reads "one change, thirty-four pages green" - and with it done before an EAA complaint finds the page you forgot existed.

A single-page audit answers "is this page accessible." A multi-page audit answers "is my business compliant." Only one of those questions shows up in a legal letter.

## Where the line is

The public toolkit does the core of this - discovery, cross-page dedup, A-F grading - open source under AGPL-3.0. Clone it, run it, see the method.

Production scale is a different job: authenticated routes behind a login, parallel execution across hundreds of pages, audit gating in CI so a regression never ships, token federation across a design system used by ten repos. That's the Pro tier and the consulting work, and it's where "compliant homepage" becomes "compliant company."

If your accessibility sign-off currently covers one page, you're signing off on a guess. That's the part worth fixing first.

---

*This is Part 2 of the multi-page WCAG build story. Part 1 covers the architecture: [the discovery layer and the deduper](https://portfolio.sdet.it/from-the-field/multipage-wcag-v04-build-part-1/). The toolkit is open source: [sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit), AGPL-3.0.*

---


## Multi-page WCAG, część 2: ile faktycznie warta jest zgodność całego serwisu

**URL:** https://portfolio.sdet.pl/from-the-field/multipage-wcag-v04-build-part-2
**Published:** 2026-06-03
**Language:** pl
Tags: wcag, accessibility, compliance, eaa, business-case, from-the-field

Twoja strona główna przeszła audyt dostępności. O zgodności całego serwisu to nie mówi prawie nic - a od czerwca 2025 ta luka to ryzyko prawne, nie miły dodatek.

Twoja strona główna przeszła audyt dostępności. O twoim serwisie to nie mówi prawie nic.

Część 1 była inżynierią: jak warstwa discovery znajduje strony, których audyt strony głównej nie tyka, i jak deduper zamienia 5 808 wystąpień jednego buga w jedno znalezisko. Ta część to pytanie, które faktycznie zadaje CTO. Ile to jest warte i ile kosztuje tego nie mieć?

Bez kodu. Tylko ta część, która pokazuje się w budżecie.

## Single-page-green to fałszywy odbiór

Celujesz audytem w stronę główną. Świeci się na zielono. Podpisujesz odbiór.

A koszyka nikt nie audytował. Konta użytkownika nie. Wyników wyszukiwania nie. Szablonów artykułów nie. Ścieżki rezerwacji nie. Strony, na których faktycznie żyją konwersja i zgodność, to dokładnie te, których audyt jednego URL-a nie widzi.

Single-page-green nie znaczy, że serwis jest dostępny. Znaczy, że ta jedna strona jest. Nauczyłem się tego na własnym portfolio: strona główna była czysta przez trzy rundy audytu, a w momencie, w którym skierowałem discovery na router, znalazłem dziewięć znalezisk SERIOUS na trzech stronach, których ręcznie nigdy bym nie zaudytował.

Ta przepaść - między "ta strona przechodzi" a "ten serwis jest zgodny" - była kiedyś kwestią jakości. Teraz jest kwestią prawną.

## Od czerwca 2025 to przestało być opcją

European Accessibility Act (dyrektywa 2019/882) obowiązuje od 28 czerwca 2025. Obejmuje e-commerce, bankowość konsumencką, bilety i transport, e-booki i więcej. Baseline techniczny to zgodność na poziomie WCAG. Egzekwowanie i kary ustala każde państwo członkowskie osobno.

Przeczytaj zakres uważnie: obowiązek dotyczy usługi. Całego produktu. Nie strony lądowania.

Masz więc prawo wycelowane w całą ścieżkę użytkownika i audyt wycelowany w jeden URL. Odległość między tymi dwoma zakresami to twoje ryzyko. Skarga nie przychodzi o stronę, którą zaudytowałeś. Przychodzi o tę, której nie.

## Koszt to nie "audytuj więcej stron"

I tu matematyka zwykle się sypie.

Ręczny audyt dostępności całego serwisu - dziesiątki do setek stron - to tygodnie pracy specjalisty. A na wyjściu płaska lista tysięcy znalezisk, której nikt nie ogarnie. Drogo zrobić, bezużytecznie wdrożyć.

Dźwignia to nie audytowanie większej liczby stron. To raportowanie po przyczynie zamiast po liczbie. 5 816 znalezisk na moim portfolio to były trzy bugi. Nie płacisz nikomu za przeklikanie 5 808 duplikatów - poprawiasz jedną zmienną CSS i 34 strony robią się zielone naraz. Ta deduplikacja jest oszczędnością. Po prostu wygląda jak techniczny szczegół.

Ta sama metoda, realne liczby z mojej roboty: pełny audyt performance zajął 7 godzin tym podejściem wobec 16 godzin billowalnych ręcznie, wobec tygodnia klasycznie. Portfolio przeszło z oceny niedostatecznej na A w ośmiu commitach, 75 minut pracy. Dostępność ma ten sam kształt. Pętla audyt-i-fix leci w godzinach, nie w wielotygodniowym zleceniu, bo narzędzie zwija objawy w przyczyny, zanim człowiek w ogóle przeczyta raport.

## Co faktycznie kupujesz

Możliwość powiedzenia, że cały twój serwis jest zgodny. Każdy szablon, każda trasa za logowaniem, każda strona, na której zmieniają się pieniądze. Ze ścieżką naprawy, która brzmi "jedna zmiana, trzydzieści cztery strony na zielono" - i z tym zrobionym, zanim skarga z tytułu EAA znajdzie stronę, o której zapomniałeś, że istnieje.

Audyt jednej strony odpowiada na "czy ta strona jest dostępna". Audyt multi-page odpowiada na "czy mój biznes jest zgodny". Tylko jedno z tych pytań pojawia się w piśmie prawnym.

## Gdzie jest granica

Publiczny toolkit robi rdzeń tego - discovery, cross-page dedup, ocenę A-F - open source na AGPL-3.0. Sklonuj, odpal, zobacz metodę.

Skala produkcyjna to inna robota: trasy za logowaniem, równoległe wykonanie po setkach stron, bramkowanie audytu w CI, żeby regresja nigdy nie wyszła, federacja tokenów w design systemie używanym przez dziesięć repo. To Pro tier i robota konsultingowa, i to tam "zgodna strona główna" zamienia się w "zgodną firmę".

Jeśli twój odbiór dostępności obejmuje teraz jedną stronę, podpisujesz się pod zgadywanką. To jest ta część, którą warto poprawić najpierw.

---

*To część 2 historii budowy multi-page WCAG. Część 1 omawia architekturę: [warstwa discovery i deduper](https://portfolio.sdet.pl/from-the-field/multipage-wcag-v04-build-part-1/). Toolkit jest open source: [sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit), AGPL-3.0.*

---


## Multi-page WCAG, Part 1: the machine behind 5,816 to 7

**URL:** https://portfolio.sdet.it/from-the-field/multipage-wcag-v04-build-part-1
**Published:** 2026-06-02
**Language:** en
Tags: wcag, accessibility, multi-page, architecture, ai-tooling, from-the-field

5,816 findings across 35 pages. Four CSS commits. Seven false positives left. This is the machine behind it - the discovery layer, and why its default chain runs three strategies, not four.

You already saw the number.

In the WCAG toolkit series I ran a sitemap audit across the full published surface of this portfolio. Thirty-five pages. **5,816 findings.** Four CSS-level commits later: seven, all false positives from a subsystem I wrote myself. Zero SERIOUS. Zero AA failures.

That post was about *what happened*. People asked the better question afterwards: **how does an audit find pages the homepage audit never touched?** How does one tool turn 5,808 instances of the same bug into a single line in a report instead of 5,808 tickets?

This is the machine. No reveal, no twist - you know how the story ends. Part 1 is the architecture.

## Single-page audit is a Lighthouse extension

I'll say the uncomfortable part first, because it's the whole reason this feature exists.

A single-URL accessibility audit - point it at a page, get a grade - is a solved problem. Lighthouse does it. axe does it. My own toolkit did it at v0.3. It's useful, and it tells you almost nothing about whether your *site* is accessible.

Round 3 of the portfolio audit converged on the homepage. Three runs, zero new findings. "We're done." Then I pointed discovery at the router instead of a URL and found nine SERIOUS findings on three pages the homepage audit had no way to see. The homepage was clean. The article pages were not. The episode listings were not. The archive was not.

Convergence on one URL doesn't mean the site converged. It means that URL did. That gap - between "this page passes" and "this site passes" - is the entire problem class multi-page audit exists to handle.

## A flag, not a rewrite

The whole multi-page capability hangs off one flag: `--multi-page`.

Without it, the toolkit behaves exactly as it did at v0.3. Same single-page audit, byte-identical output. That was a hard design constraint, not a nice-to-have. The moment a tool silently changes what it does between versions, you've broken every CI pipeline that trusted it. So multi-page is strictly opt-in, and the old path is frozen.

What the flag plugs in is small to describe and the reason the rest works: a **discovery layer** in front of the dynamic tester, and a **deduper** behind it.

```
single-page (v0.3):   --url  ->  audit  ->  report

multi-page (v0.4):     --url  ->  discover routes  ->  audit each  ->  dedup  ->  report
                                  [discovery layer]                 [deduper]
```

The audit step in the middle is the same engine I already had. Multi-page doesn't make the audit smarter. It makes the audit run against *the right set of pages*, and it makes the output legible when one bug shows up on forty of them.

## Discovery: three strategies by default, AI on request

Here's the decision I'd defend hardest, because it's the one that looks wrong until you've paid an API bill.

The discovery dispatcher runs a fallback chain. Default order:

```
sitemap  ->  router-scan  ->  json-config
```

It tries the first. If that comes back empty, it falls to the next. You can pin one explicitly with `--strategy=<name>` and skip the chain.

Notice what's *not* in the default chain: the AI agent. There are four strategies, but only three run automatically.

**Sitemap** is the cheapest truth available. If the site ships a `sitemap.xml`, that's the post-build reality of what's actually published - 35 routes for this portfolio. One HTTP fetch, parse, filter out the noise (`/og/`, `/api/`, feeds). Confidence 1.0, because it's not a guess, it's the build output.

**Router-scan** is the deterministic fallback when there's no sitemap or you're running against a local dev server. It reads `package.json`, identifies the framework, and walks the source: `src/pages/**/*.astro`, App Router and Pages Router for Next, `vite-plugin-pages` config for Vue, and so on. No network, no model, no tokens. It found 11 routes from this portfolio's source skeleton.

**JSON config** is the escape hatch. A `wcag.config.json` with an explicit page list and optional auth hooks, for when discovery can't infer what you want - gated routes, a staging subset, a hand-picked critical path.

**The AI agent** is strategy four, and it's deliberately *out* of the default chain. It dispatches a route-discovery agent through a Claude Code session, reads the framework configs, and returns a structured route list with confidence scoring. It's the most flexible strategy and the only one that costs money to run. So it's opt-in: `--strategy=ai`, or it activates if you already have AI enabled. Nobody gets a surprise token bill because a routine audit decided to think.

That's the principle the whole series runs on, applied to one feature: **deterministic by default, AI only where it earns its place.** A sitemap parse and a source walk solve the discovery problem for most projects without a single model call. The agent is there for the projects that need it, not as the front door.

## Frameworks: four that know, four that warn

The live teaser for this episode says "8 frameworks." That's true in the narrow sense that the detector recognises eight, and misleading in the sense that matters, so here's the honest version.

**Four have a real route-discovery detector:** Astro, Next (both App Router and Pages Router), Vue (`vite-plugin-pages`), and Nuxt (which rides the Vue detector). Point router-scan at any of these and it walks the actual routing structure.

**Four are recognised but not implemented:** SvelteKit, Remix, Gatsby, React Router. The detector identifies them from `package.json` and then warns "no detector yet." You don't get a route list - you get a clear message telling you to use `--strategy=ai` or write a config.

I left a note to myself in the troubleshooting docs about this. If the tool ever tells you "detected next but no detector implemented yet," that message is lying - Next has a detector. If you actually see it, you've hit one of the four that don't. Future me will know what that means. Now you do too.

Eight recognised, four fully supported. If you're a tester, you'd have caught the gap the first time you ran it on SvelteKit, so I'd rather say it up front.

## The deduper is the part that matters

Discovery gets you the right pages. The deduper is what makes 5,816 findings survivable.

Run an audit across 35 pages and the naive output is 35 pages × findings per page. The same broken token in a shared component shows up on every page that renders it. The Shiki code-block theme leak in this portfolio appeared on every page with a code block - **5,808 instances of one bug.** As 5,808 line items, that report is unreadable and unfixable. It looks like a catastrophe. It's one CSS variable.

So findings don't aggregate by count. They aggregate by *cause*. The deduper groups on a four-part key:

```
(ruleId, sourceFile, line, selector)
```

Same rule, same source location, same selector = same bug, regardless of how many URLs it surfaced on. The group collapses to a single canonical finding, and the URLs roll up into an `affectedPages` array hanging off it.

5,808 instances become **one finding** that says "this appears on these 34 pages." Fix the variable once, and the next audit shows all 34 green. That's the line in the report that's worth the whole feature: *single fix -> many pages green*. The dependency graph of your bugs, not a flat list of symptoms.

This is also why "5,816 findings" was never the disaster it sounds like. The right question isn't how many findings - it's how many distinct bugs. The answer was three: one Shiki config leak (5,808 instances), one badge color, and seven keyboard cycles that turned out to be false positives. Multi-page audit didn't multiply the work. It surfaced the structure underneath it.

## What shipped, and what didn't

Public v0.4.1 (30 April) ships all of this: the `route-discovery` package, the three-plus-one strategies, the multi-page orchestrator, cross-page dedup, and heat-map reporting. AGPL-3.0, same as the rest of the toolkit. The discovery layer went in with 47 new hermetic tests on top of the existing suite - sitemap edge cases, dispatcher chain exhaustion, the dedup logic itself.

What did *not* ship, so I don't oversell it: the Pro tier's niche specialists - a modal-specialist and an ecommerce-journey agent - are stubs right now, marked "do not dispatch." They're on the roadmap, not in a release. The Pro tier features that *are* real - trace recording, screenshot sequences, authenticated routes, parallel execution - are the subject of a later part. I'll draw that line clearly when I get there.

## Tomorrow: what this is worth to the person signing off compliance

Part 1 was the engineering. Part 2 is the other half of the question, and it's the one a CTO actually cares about: if your homepage passes its accessibility audit, what does that tell you about your site? (Less than you'd hope.) What does single-page-green actually cost you when the European Accessibility Act applies to the whole surface, not the landing page? And what does a discovery-driven audit change about that math?

That one's tomorrow. No code, just the part that shows up on a budget.

---

*The multi-page audit is open source: [sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit), AGPL-3.0. Part 2 covers the business case - what site-wide compliance is actually worth and where single-page audits leave you exposed.*

---


## Multi-page WCAG, część 1: maszyna za 5 816 do 7

**URL:** https://portfolio.sdet.pl/from-the-field/multipage-wcag-v04-build-part-1
**Published:** 2026-06-02
**Language:** pl
Tags: wcag, accessibility, multi-page, architecture, ai-tooling, from-the-field

5 816 znalezisk na 35 stronach. Cztery commity CSS. Zostało siedem false positive. Oto maszyna za tym - warstwa discovery i czemu jej domyślny łańcuch leci trzema strategiami, nie czterema.

Liczbę już widzieliście.

W serii o toolkicie WCAG puściłem audyt po sitemapie na całej opublikowanej powierzchni tego portfolio. Trzydzieści pięć stron. **5 816 znalezisk.** Cztery commity na poziomie CSS później: siedem, wszystkie false positive z podsystemu, który sam napisałem. Zero SERIOUS. Zero błędów AA.

Tamten wpis był o tym, *co się stało*. Potem padło lepsze pytanie: **jak audyt znajduje strony, których audyt strony głównej nie miał jak zobaczyć?** Jak jedno narzędzie zamienia 5 808 wystąpień tego samego buga w jedną linijkę w raporcie zamiast 5 808 ticketów?

To jest ta maszyna. Bez reveal, bez zwrotu akcji - wiecie, jak kończy się ta historia. Część 1 to architektura.

## Audyt jednej strony to wtyczka do Lighthouse

Najpierw powiem niewygodną rzecz, bo to cały powód, dla którego ta funkcja istnieje.

Audyt dostępności jednego URL-a - wskazujesz stronę, dostajesz ocenę - to problem rozwiązany. Robi to Lighthouse. Robi to axe. Robił to mój własny toolkit w v0.3. Jest przydatny i nie mówi prawie nic o tym, czy twój *serwis* jest dostępny.

Runda 3 audytu portfolio osiągnęła zbieżność na stronie głównej. Trzy przebiegi, zero nowych znalezisk. "Skończone." Potem skierowałem discovery na router zamiast na URL i znalazłem dziewięć znalezisk SERIOUS na trzech stronach, których audyt strony głównej nie miał jak zobaczyć. Strona główna była czysta. Strony artykułów nie. Listingi odcinków nie. Archiwum nie.

Zbieżność na jednym URL-u nie znaczy, że serwis osiągnął zbieżność. Znaczy, że ten URL ją osiągnął. Ta przepaść - między "ta strona przechodzi" a "ten serwis przechodzi" - to cała klasa problemu, do której obsługi istnieje audyt multi-page.

## Flaga, nie przepisywanie

Cała funkcjonalność multi-page wisi na jednej fladze: `--multi-page`.

Bez niej toolkit zachowuje się dokładnie tak jak w v0.3. Ten sam audyt jednej strony, output bajt w bajt identyczny. To było twarde ograniczenie projektowe, nie miły dodatek. W momencie, w którym narzędzie po cichu zmienia to, co robi, między wersjami, masz zepsuty każdy pipeline CI, który mu ufał. Więc multi-page jest ściśle opt-in, a stara ścieżka jest zamrożona.

To, co flaga podpina, jest proste do opisania i jest powodem, dla którego reszta działa: **warstwa discovery** przed dynamicznym testerem i **deduper** za nim.

```
single-page (v0.3):   --url  ->  audyt  ->  raport

multi-page (v0.4):     --url  ->  znajdź trasy  ->  audyt każdej  ->  dedup  ->  raport
                                  [warstwa discovery]               [deduper]
```

Krok audytu w środku to ten sam silnik, który już miałem. Multi-page nie robi audytu mądrzejszym. Sprawia, że audyt leci po *właściwym zbiorze stron*, i sprawia, że output jest czytelny, gdy jeden bug pojawia się na czterdziestu z nich.

## Discovery: domyślnie trzy strategie, AI na żądanie

To jest decyzja, której broniłbym najmocniej, bo to ta, która wygląda na błędną, dopóki nie zapłacisz rachunku za API.

Dyspozytor discovery leci łańcuchem fallbacku. Domyślna kolejność:

```
sitemap  ->  router-scan  ->  json-config
```

Próbuje pierwszej. Jeśli ta wróci pusta, spada do następnej. Możesz przypiąć jedną jawnie przez `--strategy=<nazwa>` i pominąć łańcuch.

Zauważcie, czego *nie ma* w domyślnym łańcuchu: agenta AI. Strategie są cztery, ale automatycznie lecą tylko trzy.

**Sitemap** to najtańsza dostępna prawda. Jeśli serwis wystawia `sitemap.xml`, to jest rzeczywistość po buildzie - co faktycznie jest opublikowane. 35 tras dla tego portfolio. Jeden fetch HTTP, parsowanie, odfiltrowanie szumu (`/og/`, `/api/`, feedy). Pewność 1.0, bo to nie zgadywanka, to output buildu.

**Router-scan** to deterministyczny fallback, gdy nie ma sitemapy albo lecisz po lokalnym dev serverze. Czyta `package.json`, identyfikuje framework i chodzi po źródłach: `src/pages/**/*.astro`, App Router i Pages Router dla Nexta, konfig `vite-plugin-pages` dla Vue i tak dalej. Bez sieci, bez modelu, bez tokenów. Znalazł 11 tras ze szkieletu źródeł tego portfolio.

**JSON config** to wyjście awaryjne. `wcag.config.json` z jawną listą stron i opcjonalnymi hookami auth, na wypadek gdy discovery nie umie wywnioskować, czego chcesz - trasy za logowaniem, podzbiór ze stagingu, ręcznie wybrana ścieżka krytyczna.

**Agent AI** to strategia czwarta i jest świadomie *poza* domyślnym łańcuchem. Dispatchuje agenta route-discovery przez sesję Claude Code, czyta konfigi frameworka i zwraca ustrukturyzowaną listę tras z oceną pewności. To najbardziej elastyczna strategia i jedyna, która kosztuje przy uruchomieniu. Więc jest opt-in: `--strategy=ai`, albo aktywuje się, jeśli masz już AI włączone. Nikt nie dostaje niespodziewanego rachunku za tokeny, bo rutynowy audyt postanowił pomyśleć.

To jest zasada, na której leci cała seria, zastosowana do jednej funkcji: **domyślnie deterministycznie, AI tylko tam, gdzie zarabia na siebie.** Parsowanie sitemapy i przejście po źródłach rozwiązują problem discovery dla większości projektów bez ani jednego wywołania modelu. Agent jest dla projektów, które go potrzebują, nie jako drzwi wejściowe.

## Frameworki: cztery, które wiedzą, cztery, które ostrzegają

Żywa zajawka tego odcinka mówi "8 frameworków". To prawda w wąskim sensie, że detektor rozpoznaje osiem, i mylące w sensie, który ma znaczenie, więc oto wersja uczciwa.

**Cztery mają realny detektor route-discovery:** Astro, Next (zarówno App Router, jak i Pages Router), Vue (`vite-plugin-pages`) i Nuxt (który jedzie na detektorze Vue). Skieruj router-scan na którykolwiek z nich, a przejdzie po faktycznej strukturze routingu.

**Cztery są rozpoznawane, ale niezaimplementowane:** SvelteKit, Remix, Gatsby, React Router. Detektor identyfikuje je z `package.json`, a potem ostrzega "no detector yet". Nie dostajesz listy tras - dostajesz jasny komunikat mówiący, żeby użyć `--strategy=ai` albo napisać config.

Zostawiłem sobie notatkę w dokach troubleshootingu na ten temat. Jeśli narzędzie kiedykolwiek powie ci "detected next but no detector implemented yet", ten komunikat kłamie - Next ma detektor. Jeśli faktycznie go zobaczysz, trafiłeś na jeden z tych czterech, które nie mają. Przyszły ja będzie wiedział, co to znaczy. Teraz wy też.

Osiem rozpoznawanych, cztery w pełni obsługiwane. Jeśli jesteś testerem, wyłapałbyś tę dziurę przy pierwszym uruchomieniu na SvelteKicie, więc wolę powiedzieć to z góry.

## Deduper to ta część, która ma znaczenie

Discovery daje ci właściwe strony. Deduper sprawia, że 5 816 znalezisk da się przeżyć.

Puść audyt po 35 stronach, a naiwny output to 35 stron × znaleziska na stronę. Ten sam zepsuty token we współdzielonym komponencie pojawia się na każdej stronie, która go renderuje. Wyciek motywu bloków kodu Shiki w tym portfolio pojawił się na każdej stronie z blokiem kodu - **5 808 wystąpień jednego buga.** Jako 5 808 pozycji ten raport jest nieczytelny i niefixowalny. Wygląda na katastrofę. To jedna zmienna CSS.

Więc znaleziska nie agregują się po liczbie. Agregują się po *przyczynie*. Deduper grupuje po czteroczęściowym kluczu:

```
(ruleId, sourceFile, line, selector)
```

Ta sama reguła, ta sama lokalizacja w źródle, ten sam selektor = ten sam bug, niezależnie od tego, na ilu URL-ach wypłynął. Grupa zwija się do jednego kanonicznego znaleziska, a URL-e rolują się w tablicę `affectedPages` przyczepioną do niego.

5 808 wystąpień staje się **jednym znaleziskiem**, które mówi "to pojawia się na tych 34 stronach". Napraw zmienną raz, a następny audyt pokaże wszystkie 34 na zielono. To jest ta linijka w raporcie, która jest warta całej funkcji: *jeden fix -> wiele stron na zielono*. Graf zależności twoich bugów, nie płaska lista objawów.

To też powód, dla którego "5 816 znalezisk" nigdy nie było katastrofą, na jaką brzmi. Właściwe pytanie nie brzmi ile znalezisk - brzmi ile odrębnych bugów. Odpowiedź to trzy: jeden wyciek konfigu Shiki (5 808 wystąpień), jeden kolor odznaki i siedem cykli klawiatury, które okazały się false positive. Audyt multi-page nie zwielokrotnił roboty. Wydobył strukturę pod nią.

## Co weszło, a co nie

Public v0.4.1 (30 kwietnia) wysyła to wszystko: pakiet `route-discovery`, strategie trzy-plus-jeden, orchestrator multi-page, cross-page dedup i raporty z heat-mapą. AGPL-3.0, tak samo jak reszta toolkitu. Warstwa discovery weszła z 47 nowymi hermetycznymi testami na wierzchu istniejącego zestawu - edge case'y sitemapy, wyczerpanie łańcucha dyspozytora, sama logika dedupu.

Czego *nie* wysłałem, żeby tego nie przereklamować: niszowi specjaliści Pro tier - modal-specialist i agent ecommerce-journey - są teraz stubami, oznaczonymi "do not dispatch". Są na roadmapie, nie w release'ie. Funkcje Pro tier, które *są* realne - nagrywanie trace, sekwencje screenshotów, trasy za auth, równoległe wykonanie - to temat późniejszej części. Postawię tę granicę jasno, jak do niej dojdę.

## Jutro: ile to jest warte dla osoby, która podpisuje zgodność

Część 1 była inżynierią. Część 2 to druga połowa pytania i ta, która faktycznie obchodzi CTO: jeśli twoja strona główna przechodzi audyt dostępności, co to mówi o twoim serwisie? (Mniej, niż byś chciał.) Ile faktycznie kosztuje cię single-page-green, gdy European Accessibility Act dotyczy całej powierzchni, nie strony lądowania? I co audyt sterowany discovery zmienia w tej matematyce?

To jutro. Bez kodu, tylko ta część, która pokazuje się w budżecie.

---

*Audyt multi-page jest open source: [sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit), AGPL-3.0. Część 2 omawia uzasadnienie biznesowe - ile faktycznie warta jest zgodność na całym serwisie i gdzie audyty jednej strony zostawiają cię odsłoniętym.*

---


## Multi-page WCAG: 4 frameworks with full route-discovery, plus 4 recognised

**URL:** https://portfolio.sdet.it/from-the-field/multipage-wcag-v04-build
**Published:** 2026-06-02
**Language:** en
Tags: wcag, accessibility, multi-page, tooling, from-the-field

Single-page audit = a Lighthouse extension. Multi-page = a different problem class. Route-discovery for 4 frameworks (Astro/Next/Vue/Nuxt), 4 more recognised with a warning.

# Multi-page WCAG: 4 frameworks with full route-discovery, plus 4 recognised

**Both parts are live.**

Single-page audit is a Lighthouse extension with extra polish. Multi-page audit is a different problem class - routing, auth, framework conventions, parallel runs, baseline state. The toolkit discovers routes for 4 frameworks with full support (Astro, Next, Vue, Nuxt) and recognises 4 more with a warning, with a Pro tier you can opt into.

## What you'll get in 2 parts

- **Part 1 (02.06)** - the architecture, live now: **[read Part 1](/from-the-field/multipage-wcag-v04-build-part-1)**
- **Part 2 (03.06)** - the business case, live now: **[read Part 2](/from-the-field/multipage-wcag-v04-build-part-2)**

---

**Start with [Part 1](/from-the-field/multipage-wcag-v04-build-part-1), then [Part 2](/from-the-field/multipage-wcag-v04-build-part-2).**

→ See the [Series #03 manifesto](/from-the-field/context-first-qa-part-1) for the bigger picture this episode fits into.

---


## Multi-page WCAG: 4 frameworki z pełnym route-discovery, plus 4 rozpoznawane

**URL:** https://portfolio.sdet.pl/from-the-field/multipage-wcag-v04-build
**Published:** 2026-06-02
**Language:** pl
Tags: wcag, accessibility, multi-page, tooling, from-the-field

Single-page audit = Lighthouse extension. Multi-page = inna klasa problemu. Route-discovery dla 4 frameworków (Astro/Next/Vue/Nuxt), 4 kolejne rozpoznawane z ostrzeżeniem.

# Multi-page WCAG: 4 frameworki z pełnym route-discovery, plus 4 rozpoznawane

**Obie części są live.**

Single-page audit to Lighthouse extension z dodatkowym polerem. Multi-page audit to inna klasa problemu - routing, auth, konwencje frameworka, parallel runs, baseline state. Toolkit wykrywa trasy dla 4 frameworków z pełnym wsparciem (Astro, Next, Vue, Nuxt) i rozpoznaje 4 kolejne z ostrzeżeniem, plus opcjonalny tier Pro.

## Co dostajesz w 2 częściach

- **Część 1 (02.06)** - architektura, live: **[czytaj Część 1](/pl/from-the-field/multipage-wcag-v04-build-part-1)**
- **Część 2 (03.06)** - perspektywa biznesowa, live: **[czytaj Część 2](/pl/from-the-field/multipage-wcag-v04-build-part-2)**

---

**Zacznij od [Części 1](/pl/from-the-field/multipage-wcag-v04-build-part-1), potem [Część 2](/pl/from-the-field/multipage-wcag-v04-build-part-2).**

→ Zobacz [manifest Series #03](/pl/from-the-field/context-first-qa-part-1) jeśli chcesz większy obraz w którym ten odcinek się mieści.

---


## Performance audit, Part 3: Where the method scales

**URL:** https://portfolio.sdet.it/from-the-field/performance-audit-part-3
**Published:** 2026-05-28
**Language:** en
Tags: performance, ai-tooling, consulting, commercial-gate, from-the-field

Seven anti-patterns from a static pass, the line between a public AGPL toolkit and production work on three ecommerce platforms, and why /perf:fix never auto-fixes architecture.

The public toolkit audits one route and shows you the truth about it. The method behind it ran on three ecommerce platforms - the kind with carts, variant selectors, payment steps and a hundred routes. This is the part about the gap between the two.

## First, the tool finding something

Part 1 was a clean site. Five A's, nothing to fix - which made the point about a single score lying, but isn't much of a demo. So the toolkit ships a `slow-demo`: a deliberately mis-built Nuxt 3 page where every problem is invented on purpose and clearly labelled as synthetic. No client data, ever. Just the shape of the real patterns.

Pointed at it, the deterministic static pass alone finds seven anti-patterns, no browser and no model required:

`runtimeCompiler: true` in the Nuxt config, `components: { global: true }`, an image with no width or height, an image with no `loading` attribute, a `watch` with `deep: true`, and two dependencies - lodash and moment - declared and never imported.

Here is the report the static pass prints, headline first:

- Core Web Vitals: unmeasured (run with `--url` for a measured grade)
- Lighthouse perf: n/a (source-only run)
- Provisional findings grade: C (64/100)

| Area | Grade | Score | Findings |
| --- | --- | --- | --- |
| bundle | B | 81 | 4 |
| runtime | A | 95 | 1 |
| network | A | 100 | 0 |
| ssr-hydration | A | 100 | 0 |
| assets | B | 88 | 2 |

Quick Wins (Impact x Effort 6-9):

| Problem | Area | File/Resource | Impact | Effort | Score | Fix |
| --- | --- | --- | --- | --- | --- | --- |
| `runtimeCompiler: true` ships the Vue template compiler to the client. | bundle | examples/slow-demo/nuxt.config.ts:6 | 3 | 3 | 9 | Remove runtimeCompiler and precompile templates at build time, unless runtime templates are genuinely required. |
| `<img>` is missing explicit width/height, which can cause layout shift (CLS). | assets | examples/slow-demo/pages/index.vue:8 | 3 | 3 | 9 | Add intrinsic width and height (or aspect-ratio) so the browser reserves space before the image loads. |
| Global component registration (`global: true`) forces every component into the entry bundle. | bundle | examples/slow-demo/nuxt.config.ts:10 | 2 | 3 | 6 | Drop `global: true` and rely on Nuxt auto-import / explicit imports so components code-split per route. |

Medium (Impact x Effort 3-5):

| Problem | Area | File/Resource | Impact | Effort | Score | Fix |
| --- | --- | --- | --- | --- | --- | --- |
| Deep watcher (`deep: true`) recursively tracks every nested property. | runtime | examples/slow-demo/pages/index.vue:40 | 2 | 2 | 4 | Watch a specific getter/key, use `shallowRef`/`shallowReactive`, or restructure state so a deep watch is unnecessary. |
| `<img>` has no loading attribute. | assets | examples/slow-demo/pages/index.vue:8 | 1 | 3 | 3 | Add loading="lazy" for below-the-fold images; keep the LCP/above-the-fold image eager. |
| Dependency "lodash" is never imported in source (candidate for removal). | bundle | - | 1 | 3 | 3 | Confirm it is not used in config/runtime, then remove it. |
| Dependency "moment" is never imported in source (candidate for removal). | bundle | - | 1 | 3 | 3 | Confirm it is not used in config/runtime, then remove it. |

Every one comes with a location (`nuxt.config.ts:6`, `index.vue:8`), the impact stated in the message, a concrete fix, and an Impact x Effort score that sorts it into Quick Wins, Medium or Low. That is the whole report a developer needs: where, what, how, and in what order.

Those are the definite ones - pattern-matchable without running anything. The other half of the method only shows up in a full audit, with a URL to measure: the five AI specialists read the measured floor and the source and add what needs interpretation rather than pattern-matching. A style object literal inside a `v-for` that a static rule can't safely flag. A run of independent awaits that should have been parallel. A payload pulling every field when it needs three. Those are judgement calls on top of measurement, and they are exactly what the static pass deliberately does not guess at.

The split is the whole architecture in one example. The static pass catches what is definitely wrong, on its own, fast. The specialists, reading real measurements, catch what is contextually wrong. Neither could do the other's job.

## A small, honest moment

While building the static pass, the slow-demo caught a bug in my own analyzer. The deep-watcher check was matching `deep: true` anywhere it appeared - including in plain data objects that have nothing to do with a Vue watcher. A false positive. The fixture surfaced it, and the fix was to require the match to sit next to an actual `watch()` call.

I mention it because a tool that finds problems should be held to the standard it sets. The fixture that demonstrates the tool also stress-tested it, and it was wrong before it was right. That is what dogfooding is for.

## The line between public and Pro

Everything I have described is the public toolkit, AGPL-3.0, clone and run. It is the complete method on a single route: the deterministic floor, the five specialists, the split headline, a guided fix walkthrough. If you want to understand how a context-first performance audit works, it is all there, working, on real measurements.

What it is not is the production engagement. The three ecommerce audits this distillate came from needed things that don't belong in a public, single-route tool:

Multiple routes, audited as a set, with findings deduplicated across them - because the same bloated payload on forty product pages is one root cause, not forty findings.

A local runtime - the specialists running through Ollama on the machine, no source leaving the building - because some client repositories cannot have their code sent to a hosted model, full stop.

A real auto-fix engine with a verifier loop - apply a change, re-measure, confirm the metric actually moved - rather than the guided walkthrough the public tool ships.

React, Svelte and Angular specialists, because the public v0.1 is honestly Vue/Nuxt-first and I would rather ship one framework done properly than four done from guesswork.

That is the Pro line. Not features hidden behind a paywall for the sake of it - the parts that only make sense at production scale, on real client constraints, with the years of ecommerce auditing that the niche specialists encode.

## The other half lives somewhere else

One thing the three audits included that this tool deliberately does not: the backend. Load testing, server telemetry, database and search root-cause, log mining - all of that was real work on those platforms, and none of it fits a frontend performance tool. It is a different product with a different shape, and it gets its own story another time. A frontend audit that pretends to also be a load test is two tools done badly. This one stays honest about its edge: it audits the frontend, and it says so.

## Why `/perf:fix` doesn't just fix it

The public tool has a `/perf:fix` command, and it is deliberately modest. It will walk you through a finding - explain what is wrong, show the before and after, let you apply it - and it will auto-apply at most one or two trivial mechanical changes as a demo. A `loading="lazy"` here, a `font-display: swap` there.

This is the shape of one finding as `/perf:fix` surfaces it - where, what, how, and at what priority, all from the tool's own report:

| Where | What | How | Priority |
| --- | --- | --- | --- |
| assets / index.vue:8 | `<img>` missing width/height (CLS risk) | Add intrinsic width/height or aspect-ratio | Quick Win (Impact 3 x Effort 3 = 9) |

It will not auto-apply the rest, and that is a design decision, not a missing feature.

Performance fixes are mostly architectural. Splitting a bundle, restructuring hydration, parallelising a request waterfall, adding route rules - these are decisions with trade-offs, not mechanical patches. The WCAG toolkit taught me this lesson in a different domain: the auto-fixable share of real findings is small, and the honest framing is that the tool discovers and explains while a human decides. Anyone selling you an AI that auto-fixes performance is selling you a tool that will confidently make your site worse.

So the tool finds, measures, root-causes and ranks. You fix. If you want someone who has done the fixing across three production ecommerce platforms, that is the conversation.

## Where this leaves the series

Three parts, one thesis, proven on real measurements throughout.

A single performance score lies, and my own portfolio proved it - C on the score, green on every vital, A on every area. The floor underneath is deterministic and stable where it counts - score, verdict and grade identical run to run - which is what lets the AI on top be trusted at all. And the method scales from this one-route distillate to multi-route, multi-framework, local-runtime production work - the part that was never going to fit in a public repo.

The public toolkit is the education. The production engagement is the niche. Both are honest about which is which.

More on the Pro tier and consulting: sdet.it/services.

Series #04 ends here.

#FromTheField

---


## Audyt performance, część 3: gdzie metoda skaluje

**URL:** https://portfolio.sdet.pl/from-the-field/performance-audit-part-3
**Published:** 2026-05-28
**Language:** pl
Tags: performance, ai-tooling, consulting, commercial-gate, from-the-field

Siedem anti-patternów ze statycznego passa, linia między publicznym toolkitem AGPL a produkcją na trzech platformach ecommerce i czemu /perf:fix nigdy nie auto-fixuje architektury.

Publiczny toolkit audytuje jeden route i pokazuje ci o nim prawdę. Metoda za nim odpaliła na trzech platformach ecommerce - takich z koszykami, selektorami wariantów, krokami płatności i setką routes. To jest część o przepaści między jednym a drugim.

## Najpierw - tool coś znajdujący

Część 1 była czystą stroną. Pięć A, nic do naprawienia - co robiło pointę o kłamiącym pojedynczym score, ale nie jest specjalnym demem. Więc toolkit shippuje `slow-demo`: celowo źle zbudowaną stronę Nuxt 3, gdzie każdy problem jest wymyślony na potrzebę i jasno oznaczony jako syntetyczny. Zero danych klienta, nigdy. Tylko kształt prawdziwych patternów.

Wycelowany w nią, sam deterministyczny statyczny pass znajduje siedem anti-patternów, bez przeglądarki i bez modelu:

`runtimeCompiler: true` w configu Nuxt, `components: { global: true }`, obrazek bez width i height, obrazek bez atrybutu `loading`, `watch` z `deep: true` i dwie zależności - lodash i moment - zadeklarowane i nigdy nieimportowane.

Oto raport, który drukuje statyczny pass, najpierw nagłówek:

- Core Web Vitals: unmeasured (odpal z `--url` po zmierzoną ocenę)
- Lighthouse perf: n/a (run tylko na source)
- Provisional findings grade: C (64/100)

| Area | Grade | Score | Findings |
| --- | --- | --- | --- |
| bundle | B | 81 | 4 |
| runtime | A | 95 | 1 |
| network | A | 100 | 0 |
| ssr-hydration | A | 100 | 0 |
| assets | B | 88 | 2 |

Quick Wins (Impact x Effort 6-9):

| Problem | Area | File/Resource | Impact | Effort | Score | Fix |
| --- | --- | --- | --- | --- | --- | --- |
| `runtimeCompiler: true` ships the Vue template compiler to the client. | bundle | examples/slow-demo/nuxt.config.ts:6 | 3 | 3 | 9 | Remove runtimeCompiler and precompile templates at build time, unless runtime templates are genuinely required. |
| `<img>` is missing explicit width/height, which can cause layout shift (CLS). | assets | examples/slow-demo/pages/index.vue:8 | 3 | 3 | 9 | Add intrinsic width and height (or aspect-ratio) so the browser reserves space before the image loads. |
| Global component registration (`global: true`) forces every component into the entry bundle. | bundle | examples/slow-demo/nuxt.config.ts:10 | 2 | 3 | 6 | Drop `global: true` and rely on Nuxt auto-import / explicit imports so components code-split per route. |

Medium (Impact x Effort 3-5):

| Problem | Area | File/Resource | Impact | Effort | Score | Fix |
| --- | --- | --- | --- | --- | --- | --- |
| Deep watcher (`deep: true`) recursively tracks every nested property. | runtime | examples/slow-demo/pages/index.vue:40 | 2 | 2 | 4 | Watch a specific getter/key, use `shallowRef`/`shallowReactive`, or restructure state so a deep watch is unnecessary. |
| `<img>` has no loading attribute. | assets | examples/slow-demo/pages/index.vue:8 | 1 | 3 | 3 | Add loading="lazy" for below-the-fold images; keep the LCP/above-the-fold image eager. |
| Dependency "lodash" is never imported in source (candidate for removal). | bundle | - | 1 | 3 | 3 | Confirm it is not used in config/runtime, then remove it. |
| Dependency "moment" is never imported in source (candidate for removal). | bundle | - | 1 | 3 | 3 | Confirm it is not used in config/runtime, then remove it. |

Każdy ma lokalizację (`nuxt.config.ts:6`, `index.vue:8`), impact w treści komunikatu, konkretny fix i score Impact x Effort, który sortuje go do Quick Wins, Medium albo Low. To cały raport, którego dev potrzebuje: gdzie, co, jak i w jakiej kolejności.

To te pewne - pattern-matchowalne bez uruchamiania czegokolwiek. Druga połowa metody pojawia się dopiero w pełnym audycie, z URL-em do zmierzenia: pięciu AI specjalistów czyta zmierzony floor i source i dorzuca to, co wymaga interpretacji, nie pattern-matchingu. Style object literal w `v-for`, którego statyczna reguła nie może bezpiecznie zgłosić. Seria niezależnych awaitów, które miały być równoległe. Payload ciągnący każde pole, gdy potrzebuje trzech. To decyzje osądu na wierzchu pomiaru i dokładnie to, czego statyczny pass celowo nie zgaduje.

Ten split to cała architektura w jednym przykładzie. Statyczny pass łapie to, co na pewno złe, sam, szybko. Specjaliści, czytając prawdziwe pomiary, łapią to, co kontekstowo złe. Żaden nie zrobi roboty drugiego.

## Mały, uczciwy moment

Przy budowaniu statycznego passa slow-demo złapało buga w moim własnym analyzerze. Check na deep watcher matchował `deep: true` wszędzie, gdzie się pojawiło - włącznie ze zwykłymi obiektami danych, które nie mają nic wspólnego z watcherem Vue. False positive. Fixture go ujawnił, a fix to wymóg, żeby match siedział obok faktycznego `watch()`.

Wspominam, bo tool, który znajduje problemy, trzeba trzymać przy standardzie, który ustawia. Fixture, który demonstruje tool, jednocześnie go przetestował pod obciążeniem, i był błędny, zanim był poprawny. Po to jest dogfooding.

## Linia między public a Pro

Wszystko, co opisałem, to publiczny toolkit, AGPL-3.0, klonuj i odpalaj. To kompletna metoda na jednym route: deterministyczny floor, pięciu specjalistów, split nagłówek, guided fix walkthrough. Jeśli chcesz zrozumieć, jak działa context-first performance audit, jest tam całe, działające, na prawdziwych pomiarach.

Czym nie jest, to produkcyjne zlecenie. Trzy audyty ecommerce, z których przyszedł ten distylat, potrzebowały rzeczy, które nie należą do publicznego single-route toola:

Wiele routes, audytowanych jako zestaw, z findingami zdeduplikowanymi w poprzek - bo ten sam rozdęty payload na czterdziestu stronach produktów to jeden root cause, nie czterdzieści findings.

Lokalny runtime - specjaliści odpalający przez Ollama na maszynie, zero source wychodzącego poza budynek - bo niektóre repozytoria klientów nie mogą wysłać kodu do hostowanego modelu, kropka.

Prawdziwy auto-fix engine z verifier loop - zaaplikuj zmianę, zmierz ponownie, potwierdź, że metryka faktycznie się ruszyła - zamiast guided walkthrough, który shippuje publiczny tool.

Specjaliści React, Svelte i Angular, bo publiczny v0.1 jest uczciwie Vue/Nuxt-first, a wolę zshippować jeden framework zrobiony porządnie niż cztery zrobione na zgadywankę.

To jest linia Pro. Nie funkcje schowane za paywallem dla samego chowania - części, które mają sens dopiero w skali produkcyjnej, na prawdziwych ograniczeniach klienta, z latami audytowania ecommerce, które zakodowali niszowi specjaliści.

## Druga połowa mieszka gdzie indziej

Jedna rzecz, którą te trzy audyty obejmowały, a której ten tool celowo nie robi: backend. Load testing, telemetria serwera, root-cause bazy i wyszukiwarki, log mining - to wszystko była prawdziwa robota na tych platformach i nic z tego nie pasuje do frontendowego performance toola. To inny produkt o innym kształcie i dostanie własną historię kiedy indziej. Frontendowy audyt udający też load test to dwa narzędzia zrobione źle. Ten zostaje uczciwy co do swojej krawędzi: audytuje frontend i mówi to wprost.

## Czemu `/perf:fix` po prostu tego nie naprawia

Publiczny tool ma komendę `/perf:fix` i jest celowo skromna. Przeprowadzi cię przez finding - wyjaśni, co źle, pokaże before i after, da ci zaaplikować - i auto-zaaplikuje najwyżej jedną-dwie trywialne mechaniczne zmiany jako demo. `loading="lazy"` tu, `font-display: swap` tam.

Tak wygląda jeden finding, kiedy `/perf:fix` go pokazuje - gdzie, co, jak i z jakim priorytetem, wszystko z własnego raportu toola:

| Gdzie | Co | Jak | Priorytet |
| --- | --- | --- | --- |
| assets / index.vue:8 | `<img>` bez width/height (ryzyko CLS) | Dodaj intrinsic width/height albo aspect-ratio | Quick Win (Impact 3 x Effort 3 = 9) |

Reszty nie auto-zaaplikuje i to decyzja projektowa, nie brakująca funkcja.

Performance fixy są w większości architektoniczne. Rozbicie bundla, przebudowa hydration, zrównoleglenie request waterfalla, dodanie route rules - to decyzje z trade-offami, nie mechaniczne patche. WCAG toolkit nauczył mnie tej lekcji w innej domenie: auto-fixowalna część prawdziwych findings jest mała, a uczciwy framing to taki, że tool odkrywa i wyjaśnia, a człowiek decyduje. Każdy, kto sprzedaje ci AI auto-fixujące performance, sprzedaje ci tool, który pewnie pogorszy twoją stronę.

Więc tool znajduje, mierzy, robi root-cause i ustawia priorytety. Ty naprawiasz. Jeśli chcesz kogoś, kto naprawiał to na trzech produkcyjnych platformach ecommerce - to jest ta rozmowa.

## Gdzie to zostawia serię

Trzy części, jedna teza, udowodniona na prawdziwych pomiarach od początku do końca.

Pojedynczy performance score kłamie, a moje własne portfolio to udowodniło - C na score, zielono na każdym vitalu, A na każdym obszarze. Floor pod spodem jest deterministyczny i stabilny - score, werdykt i ocena identyczne między runami - co w ogóle pozwala ufać AI na wierzchu. A metoda skaluje od tego jednoroute'owego distylatu do multi-route, multi-framework, lokalnego runtime'u produkcyjnej roboty - części, która nigdy nie miała zmieścić się w publicznym repo.

Publiczny toolkit to edukacja. Produkcyjne zlecenie to nisza. Oba są uczciwe co do tego, które jest które.

Więcej o tierze Pro i konsultingu: sdet.it/uslugi.

Seria #04 kończy się tutaj.

#FromTheField

---


## Performance audit, Part 2: A deterministic floor you can trust

**URL:** https://portfolio.sdet.it/from-the-field/performance-audit-part-2
**Published:** 2026-05-27
**Language:** en
Tags: performance, ai-tooling, web-vitals, parallel-agents, from-the-field

The hard part of a performance tool is not the AI - it is numbers you can trust without a human pasting them in. How I proved the measured floor holds before letting any AI speak.

The hard part of a performance tool is not the AI. It is getting numbers you can trust without a human pasting them in.

Every performance audit I have ever shipped started the same way: open the page, run Lighthouse, copy the numbers into a document by hand. The measurement was real, but the workflow was manual, and a manual workflow doesn't run in CI and doesn't run twice the same way. The whole point of building a tool was to make the measurement automatic without making it flaky.

So before I trusted a single finding, I had to prove one thing: that the floor is stable.

## What the floor is

Five sources, all deterministic, none of them a model.

Lighthouse runs five times in a fresh headless Chrome each run, and the tool takes the median. Fresh Chrome per run, so no warm-cache bias leaks between them. Five runs because a single Lighthouse shot has real run-to-run variance and a single shot is how you end up with a grade that changes every time you look at it.

web-vitals runs through Playwright with a scripted interaction, producing LCP, CLS, TTFB and INP from the live page. I label these synthetic-field, not field, because they come from a scripted browser and not a real user. The distinction matters and I keep it visible.

The resource trace opens the page in Playwright, scrolls, lets it settle, and reads the Resource Timing API plus a buffered long-task observer - per-request sizes and timings, render-blocking status straight from Chromium, third-party origins, and main-thread blocking attribution.

The bundle analyzer parses build stats from Vite, Rollup or `nuxi analyze` and scans dependencies without installing anything - chunk sizes, duplicate versions, declared-but-unused packages.

And a static source pass catches the handful of anti-patterns you can find without a browser at all: an image with no dimensions, a deep watcher, `runtimeCompiler: true`. Definite things. No interpretation needed.

That is the floor. It is the boring part. It is also the part everything else stands on, which is why it had to be proven first.

## The variance proof

I ran the full audit on portfolio.sdet.it twice. Median-of-five each time. Back to back. Here is what came out.

| | Audit #1 | Audit #2 | Δ |
|---|---|---|---|
| Lighthouse score | 72 | 72 | 0 pts |
| CWV verdict | PASS | PASS | same |
| Area grades | all A | all A | same |
| LCP median | 991 ms | 926 ms | 65 ms |
| Grade | C | C | same band |

The same two runs, laid out the way the tool prints them:

```
Run-to-run variance - https://portfolio.sdet.it
(Lighthouse median-of-5 + headless Chromium CWV, two back-to-back runs)

                        Run #1        Run #2
  Lighthouse perf       72/100        72/100
  LCP (lab)             991 ms        926 ms
  CLS (lab)             0.000         0.000
  TTFB                  42 ms         34 ms
  Core Web Vitals       PASS          PASS

  bundle                A (100)       A (100)
  runtime               A (100)       A (100)
  network               A (100)       A (100)
  ssr-hydration         A (100)       A (100)
  assets                A (100)       A (100)
```

The verdict is rock stable where it counts: identical Lighthouse score, identical PASS, identical area grades, identical letter grade. The raw LCP drifted 65 ms between runs - and that is the honest, interesting part. The underlying metric moves a little run to run, as it always does on a live network. Both values sit comfortably inside Google's "good" band, so the verdict never wobbles.

That 65 ms is exactly why the tool runs five Lighthouse passes and takes the median instead of trusting a single shot. A single shot catches whichever number the network handed you that second. The median is what lets the grade hold steady while the raw metric breathes - and it is the difference between a number you can put in front of a client and a number you have to apologise for.

Twelve Lighthouse runs across the two audits. Zero timeouts. Zero crashes. Deterministic medians every time.

The thesis of this entire tool - context before LLM - rests on this table. If the measured verdict wobbled, then everything the AI says on top of it would be built on sand. It doesn't. So the AI gets to speak.

## The five specialists

On top of the floor sit five AI specialists, one per area: bundle, runtime, network, SSR/hydration, assets. They run in parallel - a single dispatch, five agents, each handed its slice of the measured floor plus the source it needs to read.

Each one has a focused job and a Vue/Nuxt idiom behind it, because the method came from auditing Vue and Nuxt ecommerce - three production platforms - not from a generic checklist. The anti-patterns the specialists hunt are the ones that actually bit real carts and product listings, not textbook examples.

Bundle reads chunk sizes and dependency graphs - code-splitting gaps, duplicate dependency versions, packages shipped but never used.

Runtime looks at main-thread cost and the Vue-specific traps: deep watchers, inline style object literals inside a `v-for` that destabilise props on every render.

Network reads the resource trace for the waterfall - sequential awaits that should have been parallel, payloads fetched without field filtering, third-party origins.

SSR/hydration looks at TTFB, hydration strategy, route rules, island boundaries.

Assets handles images, fonts, the LCP element, icon-set bloat.

The crucial thing: the specialists read the measured output. They do not generate metrics. The number came from Lighthouse and the trace. The specialist explains why the number is what it is and what to do about it. Measurement is fact. Interpretation is the model's job. The two never blur.

## The overlap problem, and the protocol

Five specialists looking at one page will step on each other. A slow hydration pass shows up as a runtime cost, a network delay and an SSR issue all at once. Left alone, you get the same root problem reported three times with three different owners, and a findings count that lies by inflation.

SSR/hydration is the worst offender - it shares a seam with all four others. This is the same double-report risk I hit building the WCAG toolkit, where a modal focus trap and a keyboard handler kept claiming the same finding.

The protocol is simple and deterministic: the collector that surfaced a finding decides its owner. If the trace surfaced it, it belongs to whoever owns that signal, not to whoever else could plausibly claim it. On top of that, a deterministic dedup key collapses genuine duplicates - same check, same file and line, or same metric and route. Two specialists can both notice a problem; only one finding survives, attributed once.

It is not glamorous. It is the difference between a report a developer trusts and a report they argue with.

## Why the headline is a split, not a grade

Part 1 showed my portfolio scoring C with every Core Web Vital green and every area graded A. That contradiction is not a bug to paper over. It is the reason the headline is built the way it is.

Performance has no single axis. Core Web Vitals are user and business truth. The Lighthouse score is a lab diagnostic. The area grades are where the work is. Those are three different questions, and a single letter answers none of them honestly - it averages them into a number that is wrong in a specific, misleading way.

So the headline reports two axes:

```
Core Web Vitals:  PASS / FAIL   (per-metric good / needs-improvement / poor)
Lighthouse perf:  NN/100        (lab, throttled)
```

The A-to-F grade did not get thrown away. It moved down a level, to per-area grades, where a single axis actually applies - because "how is the bundle doing" is a question with one answer.

And there is a third state that most tools skip: unmeasured. Run the tool on source only, with no URL to hit, and it cannot honestly say PASS or FAIL on vitals it never measured. So it says unmeasured, and hands back a provisional findings-grade with a banner that says exactly that. A tool that prints a confident grade for a measurement it never took is a tool lying to you politely. This one refuses to.

## The honest CI reality

A median-of-five audit with throttling and web-vitals takes about two to three minutes per URL. That is fine for a CI sample on a representative page. It is also exactly why v0.1 audits one route and not the whole site - N routes is N times that cost, and multi-route is a deliberate later-version problem rather than a thing I pretended to solve now.

One real gotcha worth naming for anyone who clones this: Lighthouse finds Chrome through chrome-launcher, which on my machine discovered the system Chrome. A clean Linux CI runner has no system Chrome. The fix is to point Lighthouse at the Chromium that Playwright already installed, which is roughly 150 MB and already there. I would rather tell you that up front than let you hit it on your first green-to-red CI run.

## Tomorrow

Part 3: where this method actually came from. Not a toy site - three ecommerce platforms, the kind with carts and payment steps and a hundred routes. I will show the tool on a deliberately broken demo so you can see it find things instead of finding nothing, walk through the honest line between the public distillate and the production engagement, and explain why `/perf:fix` will guide you through a fix but won't pretend to auto-apply the architectural ones.

#FromTheField

---


## Audyt performance, część 2: deterministyczny floor, któremu można ufać

**URL:** https://portfolio.sdet.pl/from-the-field/performance-audit-part-2
**Published:** 2026-05-27
**Language:** pl
Tags: performance, ai-tooling, web-vitals, parallel-agents, from-the-field

Trudna część performance toola to nie AI - to liczby, którym można ufać, bez człowieka wklejającego je ręcznie. Jak udowodniłem, że zmierzony floor trzyma, zanim jakiekolwiek AI dostało głos.

Trudna część performance toola to nie AI. To wyciągnięcie liczb, którym można ufać, bez człowieka wklejającego je ręcznie.

Każdy performance audit, jaki kiedykolwiek oddałem, zaczynał się tak samo: otwórz stronę, odpal Lighthouse, przepisz liczby do dokumentu ręcznie. Pomiar był prawdziwy, ale workflow był ręczny, a ręczny workflow nie odpala się w CI i nie odpala się dwa razy tak samo. Cały sens budowania toola to zrobić pomiar automatycznym, nie robiąc go flaky.

Więc zanim zaufałem jakiemukolwiek findingowi, musiałem udowodnić jedno: że floor jest stabilny.

## Czym jest floor

Pięć źródeł, wszystkie deterministyczne, żadne nie jest modelem.

Lighthouse odpala pięć razy, świeży headless Chrome co run, tool bierze medianę. Świeży Chrome co run, więc żaden warm-cache bias nie przecieka między nimi. Pięć runów, bo pojedynczy strzał Lighthouse ma realny run-to-run variance, a pojedynczy strzał to sposób, w jaki kończysz z oceną, która zmienia się za każdym spojrzeniem.

web-vitals odpala przez Playwright ze scripted interaction, produkując LCP, CLS, TTFB i INP z żywej strony. Oznaczam je jako synthetic-field, nie field, bo pochodzą ze scripted przeglądarki, nie od prawdziwego użytkownika. Rozróżnienie ma znaczenie i trzymam je widoczne.

Resource trace otwiera stronę w Playwright, scrolluje, daje się ustabilizować i czyta Resource Timing API plus buforowany long-task observer - rozmiary i timingi per request, render-blocking status prosto z Chromium, third-party origins, atrybucja main-thread blocking.

Bundle analyzer parsuje build stats z Vite, Rollup albo `nuxi analyze` i skanuje zależności bez instalowania czegokolwiek - rozmiary chunków, zduplikowane wersje, paczki zadeklarowane i nigdy nieużyte.

I statyczny pass po source łapie tę garść anti-patternów, które znajdziesz bez przeglądarki: obrazek bez wymiarów, deep watcher, `runtimeCompiler: true`. Rzeczy pewne. Bez interpretacji.

To jest floor. Nudna część. I część, na której stoi cała reszta - dlatego musiała być udowodniona pierwsza.

## Dowód na variance

Odpaliłem pełny audyt na portfolio.sdet.it dwa razy. Mediana z pięciu za każdym. Jeden po drugim. Oto co wyszło.

| | Audyt #1 | Audyt #2 | Δ |
|---|---|---|---|
| Lighthouse score | 72 | 72 | 0 pkt |
| Werdykt CWV | PASS | PASS | ten sam |
| Oceny obszarów | same A | same A | te same |
| LCP mediana | 991 ms | 926 ms | 65 ms |
| Ocena | C | C | ten sam band |

Te same dwa runy, ułożone tak, jak drukuje je tool:

```
Run-to-run variance - https://portfolio.sdet.it
(Lighthouse median-of-5 + headless Chromium CWV, two back-to-back runs)

                        Run #1        Run #2
  Lighthouse perf       72/100        72/100
  LCP (lab)             991 ms        926 ms
  CLS (lab)             0.000         0.000
  TTFB                  42 ms         34 ms
  Core Web Vitals       PASS          PASS

  bundle                A (100)       A (100)
  runtime               A (100)       A (100)
  network               A (100)       A (100)
  ssr-hydration         A (100)       A (100)
  assets                A (100)       A (100)
```

Werdykt jest stabilny jak skała tam, gdzie się liczy: identyczny Lighthouse score, identyczny PASS, identyczne oceny obszarów, identyczna litera. Surowe LCP zdryfowało 65 ms między runami - i to jest uczciwa, ciekawa część. Metryka pod spodem rusza się trochę run do run, jak zawsze na żywej sieci. Obie wartości siedzą wygodnie w środku bandu "good" Google, więc werdykt nigdy się nie chwieje.

Te 65 ms to dokładnie powód, dla którego tool odpala pięć passów Lighthouse i bierze medianę zamiast ufać pojedynczemu strzałowi. Pojedynczy strzał łapie tę liczbę, którą sieć podała ci w tej sekundzie. Mediana to to, co pozwala ocenie stać stabilnie, podczas gdy surowa metryka oddycha - i to różnica między liczbą, którą możesz postawić przed klientem, a liczbą, za którą musisz przepraszać.

Dwanaście runów Lighthouse przez dwa audyty. Zero timeoutów. Zero crashy. Deterministyczne mediany za każdym razem.

Teza całego toola - context before LLM - stoi na tej tabeli. Gdyby zmierzony werdykt się chwiał, to wszystko, co AI mówi na wierzchu, byłoby zbudowane na piasku. Nie chwieje się. Więc AI dostaje głos.

## Pięciu specjalistów

Na floorze siedzi pięciu AI specjalistów, jeden na obszar: bundle, runtime, network, SSR/hydration, assets. Odpalają równolegle - jeden dispatch, pięciu agentów, każdy dostaje swój wycinek zmierzonego floora plus source, który musi przeczytać.

Każdy ma wąską robotę i Vue/Nuxt idiom za sobą, bo metoda przyszła z audytowania ecommerce na Vue i Nuxt - trzech produkcyjnych platform - nie z generycznej checklisty. Anti-patterny, na które polują specjaliści, to te, które realnie ugryzły prawdziwe koszyki i listingi produktów, nie podręcznikowe przykłady.

Bundle czyta rozmiary chunków i grafy zależności - luki w code-splittingu, zduplikowane wersje, paczki wysłane i nigdy nieużyte.

Runtime patrzy na koszt main-thread i Vue-specific pułapki: deep watchery, inline style object literale w `v-for`, które destabilizują propsy przy każdym renderze.

Network czyta resource trace pod kątem waterfalla - sekwencyjne awaity, które miały być równoległe, payloady pobierane bez filtrowania pól, third-party origins.

SSR/hydration patrzy na TTFB, strategię hydration, route rules, granice islandów.

Assets ogarnia obrazki, fonty, element LCP, bloat zestawów ikon.

Rzecz kluczowa: specjaliści czytają zmierzony output. Nie generują metryk. Liczba przyszła z Lighthouse i z trace. Specjalista wyjaśnia, czemu liczba jest, jaka jest, i co z tym zrobić. Pomiar to fakt. Interpretacja to robota modelu. Te dwie rzeczy nigdy się nie mieszają.

## Problem nakładania się i protokół

Pięciu specjalistów patrzących na jedną stronę będzie deptać sobie po nogach. Wolny pass hydration pokazuje się jako koszt runtime, opóźnienie network i problem SSR naraz. Zostawione samo sobie - dostajesz ten sam root problem zgłoszony trzy razy z trzema różnymi ownerami i licznik findings, który kłamie przez zawyżenie.

SSR/hydration to najgorszy przypadek - dzieli styk ze wszystkimi czterema. To ten sam risk podwójnego zgłoszenia, który złapałem budując WCAG toolkit, gdzie focus trap modala i keyboard handler ciągle zgłaszały ten sam finding.

Protokół jest prosty i deterministyczny: collector, który ujawnił finding, decyduje o jego ownerze. Jeśli ujawnił go trace, należy do tego, kto jest ownerem tego sygnału, nie do tego, kto jeszcze mógłby go z sensem zgłosić. Na wierzchu deterministyczny dedup key zwija prawdziwe duplikaty - ten sam check, ten sam plik i linia, albo ta sama metryka i route. Dwóch specjalistów może zauważyć ten sam problem; przeżywa jeden finding, zaatrybuowany raz.

To nie jest efektowne. To różnica między raportem, któremu dev ufa, a raportem, z którym się kłóci.

## Czemu nagłówek to split, nie ocena

Część 1 pokazała moje portfolio z oceną C, każdym Core Web Vital na zielono i każdym obszarem ocenionym na A. Ta sprzeczność to nie bug do zaklejenia. To powód, dla którego nagłówek jest zbudowany tak, jak jest.

Performance nie ma jednej osi. Core Web Vitals to prawda użytkownika i biznesu. Lighthouse score to diagnostyka labowa. Oceny obszarów to miejsce, gdzie jest robota. To trzy różne pytania, a pojedyncza litera nie odpowiada uczciwie na żadne - uśrednia je w liczbę, która jest błędna w konkretny, mylący sposób.

Więc nagłówek raportuje dwie osie:

```
Core Web Vitals:  PASS / FAIL   (per-metryka good / needs-improvement / poor)
Lighthouse perf:  NN/100        (lab, throttled)
```

Ocena A-F nie została wyrzucona. Zeszła poziom niżej, do ocen per-obszar, gdzie pojedyncza oś faktycznie ma zastosowanie - bo "jak się ma bundle" to pytanie z jedną odpowiedzią.

I jest trzeci stan, który większość narzędzi pomija: unmeasured. Odpal tool tylko na source, bez URL-a do uderzenia, i nie może uczciwie powiedzieć PASS ani FAIL na vitalsach, których nigdy nie zmierzył. Więc mówi unmeasured i oddaje provisional findings-grade z bannerem, który mówi dokładnie to. Tool, który drukuje pewną ocenę dla pomiaru, którego nigdy nie zrobił, to tool, który grzecznie cię okłamuje. Ten odmawia.

## Uczciwa rzeczywistość CI

Audyt mediana-z-pięciu z throttlingiem i web-vitals zajmuje jakieś dwie-trzy minuty na URL. To OK dla próbki CI na reprezentatywnej stronie. To też dokładnie powód, dla którego v0.1 audytuje jeden route, nie całą stronę - N routes to N razy tyle, a multi-route to świadomy problem na późniejszą wersję, nie rzecz, którą udawałem, że rozwiązałem teraz.

Jeden realny haczyk wart nazwania dla każdego, kto to sklonuje: Lighthouse znajduje Chrome przez chrome-launcher, który na mojej maszynie wykrył systemowy Chrome. Czysty linuksowy runner CI nie ma systemowego Chrome. Fix to wskazać Lighthouse na Chromium, który Playwright już zainstalował, jakieś 150 MB i już jest. Wolę ci to powiedzieć z góry, niż żebyś wpadł na tym przy pierwszym green-to-red w CI.

## Jutro

Część 3: skąd ta metoda naprawdę przyszła. Nie zabawkowa strona - trzy platformy ecommerce, takie z koszykami, krokami płatności i setką routes. Pokażę tool na celowo zepsutym demo, żebyś zobaczył, jak coś znajduje, zamiast nie znajdować nic, przejdę przez uczciwą linię między publicznym distylatem a produkcyjnym zleceniem i wyjaśnię, czemu `/perf:fix` przeprowadzi cię przez fix, ale nie będzie udawać, że auto-aplikuje te architektoniczne.

#FromTheField

---


## Performance audit, Part 1: My own tool gave my portfolio a C

**URL:** https://portfolio.sdet.it/from-the-field/performance-audit-part-1
**Published:** 2026-05-26
**Language:** en
Tags: performance, ai-tooling, web-vitals, core-web-vitals, from-the-field

My own performance tool gave my portfolio a C while every Core Web Vital stayed green - LCP 991 ms, CLS zero, five areas graded A. Why one score lies and two axes tell the truth.

I spent the last stretch auditing performance on three ecommerce platforms - the kind with carts, variant selectors and payment steps, where a slow page is lost revenue. Then I distilled the method into a tool, pointed it at my own portfolio first, and it handed me a C.

Then I looked at the Core Web Vitals it measured. All green. LCP 991 ms. CLS zero. TTFB 42 ms. So which is it - slow, or fast?

That contradiction is the whole reason this tool shows two numbers where most tools show one. This is the field report on why.

## The setup

The target is portfolio.sdet.it. It is about as thin as a website gets: static Astro, 3.3 KB of JavaScript, 21.7 KB total, six requests. No framework runtime shipped to the browser. No hydration. Nothing clever.

The tool is `sdet-perf-toolkit` v0.1 - a frontend performance audit that measures in a real browser first and lets AI interpret the measurement second. It runs Lighthouse five times in headless Chrome and takes the median, injects web-vitals through Playwright, reads the resource trace, and parses the bundle. The measured floor comes first. The AI never invents a number.

I ran it on my own site. Dogfooding before anyone else sees the tool. Here is the headline it produced:

```
Core Web Vitals:  PASS   (LCP good · CLS good · TTFB good · INP unmeasured)
Lighthouse perf:  72/100 (lab, throttled)
```

The measured Core Web Vitals behind that PASS:

| Metric | Value | Rating | Role |
| --- | --- | --- | --- |
| LCP | 991 ms | good | core |
| CLS | 0.000 | good | core |
| TTFB | 42 ms | good | supporting |

Pass on the vitals. 72 on Lighthouse. A grade of C if you collapse it to a single letter.

And then the part that matters. The tool grades five areas - bundle, runtime, network, SSR/hydration, assets. Every one of them came back A. Zero findings. Each.

Five A's and a C. Same page. Same audit. Same second.

## So which number is lying

Neither, exactly. They measure different things, and that is the point most people miss.

Core Web Vitals are what a real user feels and what Google ranks. Largest Contentful Paint at 991 ms means the main content paints fast. Cumulative Layout Shift at zero means nothing jumps around as it loads. Those are good, by Google's own thresholds, with room to spare.

The Lighthouse performance score is a lab diagnostic. It is dominated by Total Blocking Time, and on this page TBT came back around 1740 ms. That single metric drags the composite score down to 72 even though every vital passes.

So the honest reading is: the site is fast for a user, and a lab metric is unhappy about something. A single C-grade would launder that nuance into a lie. A single 100 would launder it the other way. Two axes tell the truth.

But I wanted to know *what* the lab was unhappy about. A number you can't explain is a number you can't trust.

## The dig

The trace collector exists for exactly this. It opens the page in Playwright, scrolls, waits for the page to settle, and reads the Resource Timing API plus a buffered long-task observer. Real timings, real render-blocking status, real main-thread attribution.

I ran it three times against the live site. Stable to within five milliseconds.

The 1740 ms of blocking time maps to a single long task of about 810 ms. Its attribution: `unknown`. That is the browser doing its own work - parsing, style, layout, paint. Not my JavaScript. Not a third-party script. There is barely any JavaScript to blame; the whole page ships 3.3 KB of it.

What actually happened is this. Lighthouse runs with 4x CPU throttling to simulate a mid-tier device. Under that throttle, the browser's own parse-and-layout pass on the document stretches out far enough to register as a long task. TBT counts it. The score drops. A real user on a real machine never experiences this as a long task at all - which is exactly why the field vitals come back green.

There is an honest limit here, and I will name it. The Long Tasks API attributes self-work as `unknown` and won't split it finer than that. Pinning it down to parse-versus-layout-versus-paint needs a full CDP performance trace, which is a heavier capture than v0.1 does. So I can tell you the long task is browser self-work under throttling. I can't yet tell you which microsecond went where. That is a roadmap item, and I would rather say so than pretend the tool knows more than it does.

## The lesson

Here is what this whole exercise is really about.

If my tool had only shown you "Grade: C", you would have walked away thinking my portfolio is slow. It isn't. Every vital that maps to user experience is green, and every area the tool can audit came back clean.

If it had only shown you "Lighthouse: 100" - which it doesn't, but plenty of tools chase that number - you would have learned nothing about the TBT behaviour under throttling, which on a heavier site genuinely matters.

A performance score is not a performance verdict. The score is a useful lab signal. The vitals are the user truth. The area grades are where the work is, or in this case isn't. You need all of them, and you need them kept apart, because the moment you average them into one letter you have thrown away the only information worth having.

That is why the headline is a split, not a grade. It is the one design decision in this tool I care about most, and my own portfolio is the proof. A site that scores C and deserves five A's is the entire argument in a single screenshot.

| Area | Grade | Score | Findings |
| --- | --- | --- | --- |
| bundle | A | 100 | 0 |
| runtime | A | 100 | 0 |
| network | A | 100 | 0 |
| ssr-hydration | A | 100 | 0 |
| assets | A | 100 | 0 |

## What's under the hood

The measurement I have been describing is the deterministic floor: Lighthouse median-of-five, synthetic web-vitals, the resource trace, the bundle parse, plus a static source pass for the handful of anti-patterns you can catch without a browser at all. None of that involves a model. It is the same boring, reproducible measurement a careful engineer would do by hand - except it runs in two minutes and produces the same answer twice.

On top of that floor sit five AI specialists, one per area, each reading the measured output and the source to explain root cause and rank what to fix. On my portfolio they had nothing to say, because there was nothing to fix. On a site that is actually mis-built, they have plenty - which is what Part 2 is about.

That ordering is the thesis, and it has a name I keep coming back to: context before LLM. Measure first. Let the model interpret what was measured. Never let it guess the numbers. It is the same discipline that held up across three production ecommerce audits, where a wrong number doesn't cost you a grade - it costs the client a checkout.

## Tomorrow

Part 2: the architecture in full, and the thing I most needed to prove before trusting any of this - that the measured floor is stable. I ran the same audit twice, median-of-five each time, back to back. Identical score, identical grade, identical pass - the kind of stability you can put in front of a client. I'll show you the numbers, the five specialists, and why a split headline beats a single grade once you've seen what a single grade hides.

Then Part 3: where the method actually came from - three ecommerce platforms, not a toy - and where the distillate ends and the real engagement begins.

#FromTheField

---


## Audyt performance, część 1: własny tool dał mojemu portfolio C

**URL:** https://portfolio.sdet.pl/from-the-field/performance-audit-part-1
**Published:** 2026-05-26
**Language:** pl
Tags: performance, ai-tooling, web-vitals, core-web-vitals, from-the-field

Mój własny tool performance dał portfolio ocenę C, a każdy Core Web Vital był zielony - LCP 991 ms, CLS zero, pięć obszarów na A. Czemu jedna liczba kłamie, a dwie osie mówią prawdę.

Audytowałem ostatnio performance trzech platform ecommerce. Takich z koszykiem, wariantami produktu i krokiem płatności, gdzie wolna strona to utracony przychód. Potem zdestylowałem metodę w tool, odpaliłem najpierw na własnym portfolio i dostałem C.

Spojrzałem na zmierzone Core Web Vitals. Wszystkie zielone. LCP 991 ms. CLS zero. TTFB 42 ms. No to jak - wolno czy szybko?

Ta sprzeczność to cały powód, dla którego ten tool pokazuje dwie liczby tam, gdzie większość narzędzi pokazuje jedną.

## Setup

Cel to portfolio.sdet.it. Strona tak chuda, jak się da: statyczne Astro, 3.3 KB JavaScriptu, 21.7 KB całości, sześć requestów. Zero runtime frameworka w przeglądarce. Zero hydration. Nic wymyślnego.

Tool to `sdet-perf-toolkit` v0.1 - audyt frontend performance, który najpierw mierzy w prawdziwej przeglądarce, a dopiero potem pozwala AI interpretować pomiar. Odpala Lighthouse pięć razy w headless Chrome i bierze medianę, wstrzykuje web-vitals przez Playwright, czyta resource trace, parsuje bundle. Zmierzony floor jest pierwszy. AI nigdy nie zmyśla liczby.

Odpaliłem na własnej stronie. Dogfooding zanim ktokolwiek zobaczy tool. Oto nagłówek:

```
Core Web Vitals:  PASS   (LCP good · CLS good · TTFB good · INP unmeasured)
Lighthouse perf:  72/100 (lab, throttled)
```

Zmierzone Core Web Vitals stojące za tym PASS:

| Metric | Value | Rating | Role |
| --- | --- | --- | --- |
| LCP | 991 ms | good | core |
| CLS | 0.000 | good | core |
| TTFB | 42 ms | good | supporting |

Pass na vitalsach. 72 na Lighthouse. C, jeśli sprowadzisz to do jednej litery.

I teraz część, która ma znaczenie. Tool ocenia pięć obszarów - bundle, runtime, network, SSR/hydration, assets. Każdy wrócił z A. Zero findings. Każdy.

Pięć A i jedno C. Ta sama strona. Ten sam audyt. Ta sama sekunda.

## Która liczba kłamie

Żadna. Mierzą różne rzeczy i o to właśnie chodzi.

Core Web Vitals to to, co czuje użytkownik i co rankuje Google. LCP 991 ms znaczy, że główna treść maluje się szybko. CLS zero znaczy, że nic nie skacze przy ładowaniu. Dobre, według progów Google, z zapasem.

Lighthouse performance score to diagnostyka labowa. Dominuje w nim Total Blocking Time, a na tej stronie TBT wyszło około 1740 ms. Ta jedna metryka ściąga composite score do 72, mimo że każdy vital przechodzi.

Uczciwy odczyt: strona jest szybka dla użytkownika, a metryka labowa jest z czegoś niezadowolona. Pojedyncza ocena C wyprałaby ten niuans w kłamstwo. Pojedyncza setka wyprałaby go w drugą stronę. Dwie osie mówią prawdę.

Ale chciałem wiedzieć, z *czego* lab jest niezadowolony. Liczba, której nie umiesz wyjaśnić, to liczba, której nie możesz ufać.

## Kopanie

Trace collector jest dokładnie do tego. Otwiera stronę w Playwright, scrolluje, czeka aż się ustabilizuje, czyta Resource Timing API plus buforowany long-task observer. Prawdziwe timingi, prawdziwy render-blocking status, prawdziwa atrybucja main-thread.

Odpaliłem trzy razy na żywej stronie. Stabilne do pięciu milisekund.

Te 1740 ms blocking time mapuje się na jeden long task, około 810 ms. Atrybucja: `unknown`. To przeglądarka robiąca własną robotę - parsing, style, layout, paint. Nie mój JavaScript. Nie third-party. Nie ma czego obwiniać; cała strona wysyła 3.3 KB JS-a.

Co się naprawdę dzieje: Lighthouse odpala z 4x throttlingiem CPU, żeby zasymulować średni telefon. Pod tym throttlingiem własny pass przeglądarki na parsing i layout rozciąga się na tyle, żeby zarejestrować się jako long task. TBT to liczy. Score leci w dół. Prawdziwy użytkownik na prawdziwej maszynie nigdy tego nie odczuje jako long task - i dlatego field vitals wracają zielone.

Jest tu uczciwy limit i go nazwę. Long Tasks API atrybuuje self-work jako `unknown` i nie rozbije tego drobniej. Przypięcie tego do parsing-kontra-layout-kontra-paint wymaga pełnego CDP performance trace, cięższego niż to, co robi v0.1. Więc mogę ci powiedzieć, że long task to self-work przeglądarki pod throttlingiem. Nie umiem jeszcze powiedzieć, która mikrosekunda poszła gdzie. To roadmap, i wolę to powiedzieć, niż udawać, że tool wie więcej, niż wie.

## Lekcja

O to tu naprawdę chodzi.

Gdyby tool pokazał tylko "Grade: C", odszedłbyś z przekonaniem, że moje portfolio jest wolne. Nie jest. Każdy vital, który mapuje się na user experience, jest zielony, a każdy obszar, jaki tool umie zaudytować, wrócił czysty.

Gdyby pokazał tylko "Lighthouse: 100" - czego nie robi, ale mnóstwo narzędzi goni za tą liczbą - nie nauczyłbyś się nic o zachowaniu TBT pod throttlingiem, które na cięższej stronie naprawdę ma znaczenie.

Performance score to nie performance verdict. Score to przydatny sygnał labowy. Vitalsy to prawda użytkownika. Oceny obszarów to miejsce, gdzie jest robota, albo - jak tu - gdzie jej nie ma. Potrzebujesz wszystkich i potrzebujesz ich osobno, bo w momencie, w którym uśrednisz je w jedną literę, wyrzuciłeś jedyną informację wartą posiadania.

Dlatego nagłówek to split, nie ocena. To jedyna decyzja projektowa w tym toolu, na której mi najbardziej zależy, a moje własne portfolio jest dowodem. Strona, która dostaje C, a zasługuje na pięć A, to cały argument w jednym screenshocie.

| Area | Grade | Score | Findings |
| --- | --- | --- | --- |
| bundle | A | 100 | 0 |
| runtime | A | 100 | 0 |
| network | A | 100 | 0 |
| ssr-hydration | A | 100 | 0 |
| assets | A | 100 | 0 |

## Co jest pod maską

Pomiar, który opisywałem, to deterministyczny floor: Lighthouse mediana z pięciu, synthetic web-vitals, resource trace, bundle parse, plus statyczny pass po source dla tych kilku anti-patternów, które złapiesz bez przeglądarki. Nic z tego nie dotyka modelu. To ten sam nudny, powtarzalny pomiar, który uważny inżynier zrobiłby ręcznie - tyle że odpala się w dwie minuty i daje tę samą odpowiedź dwa razy.

Na tym floorze siedzi pięciu AI specjalistów, jeden na obszar, każdy czyta zmierzony output i source, żeby wyjaśnić root cause i ustawić priorytety. Na moim portfolio nie mieli nic do powiedzenia, bo nie było co naprawiać. Na stronie faktycznie źle zbudowanej mają sporo - i o tym jest część 2.

Ta kolejność to teza, która ma nazwę, do której wracam: context before LLM. Najpierw zmierz. Pozwól modelowi interpretować to, co zmierzone. Nigdy nie pozwól mu zgadywać liczb. Ta sama dyscyplina trzymała się przez trzy produkcyjne audyty ecommerce, gdzie zła liczba nie kosztuje cię oceny - kosztuje klienta checkoutu.

## Jutro

Część 2: architektura w całości i rzecz, którą najbardziej musiałem udowodnić, zanim czemukolwiek z tego zaufałem - że zmierzony floor jest stabilny. Odpaliłem ten sam audyt dwa razy, mediana z pięciu za każdym, jeden po drugim. Identyczny score, identyczna ocena, identyczny pass - taka stabilność, jaką możesz postawić przed klientem. Pokażę liczby, pięciu specjalistów i czemu split nagłówek bije pojedynczą ocenę, jak już zobaczysz, co pojedyncza ocena ukrywa.

#FromTheField

---


## Context-First QA, Part 3: The Roadmap

**URL:** https://portfolio.sdet.it/from-the-field/context-first-qa-part-3
**Published:** 2026-05-21
**Language:** en
Tags: ai-qa, roadmap, consulting, mini-portal, commercial-gate

Two days ago: the thesis. Yesterday: the map. Today: the calendar. Eight weeks. Four code drops. Four pitch-mode. Plus how to bring me in if the pattern fits - and why June 15 is the forcing function.

Two days ago: the thesis. 1000 tasks, $700 vs $40, 70% never hits an LLM.

Yesterday: the map. Ten layers, A through J. Three-Layer Architecture as backbone.

Today: the calendar.

Eight weeks. Four code drops. Four standalone series episodes. Plus how to engage if you want this pattern in your stack - and why June 15 is your forcing function.

---

## The ten layers, one more time

Before we land in the calendar, the quick recap:

- **A** Input Layer (typed `QAContext` from messy sources)
- **B** Decision Layer (70% without LLM, the deterministic gate)
- **C** Output Layer (Atlassian ADF, the 303 redirect trick)
- **D** Orchestration (partial continuations, heartbeats)
- **E** HITL Safety (action queue state machine)
- **F** Vendor-Agnostic Infra (swap providers, not orchestrators)
- **G** Cost & Telemetry (per-task attribution)
- **H** Operational Discipline (five production gotchas)
- **I** Multi-Process Glue (one script, three backends)
- **J** Approval UX (HITL UI that doesn't feel like work)

Out of these ten, here's what publishes when.

## The 9-week sequence

| Week | # | Theme | Mode |
|---|---|---|---|
| May 26-28 | #04 | Performance audit 5-agent (WCAG ecosystem) | standalone, code |
| Jun 2-4 | #05 | Multi-page WCAG (V0.4 build story) | standalone, code |
| Jun 9-11 | #06 | CDAT pattern (Page Objects reinvented) | standalone, code |
| Jun 16-18 | #07 | Figma-to-code deterministic | standalone, code |
| Jun 23-25 | #08 | 6 portals agent-ready in 70 minutes | standalone, narrative |
| Jun 30 - Jul 2 | #09 | **Episode B "70% Without an LLM"** | mini-portal, code drop |
| Jul 7-9 | #10 | **Episode C "ADF Without Tears"** ⭐ | mini-portal, code drop |
| Jul 14-16 | #11 | **Episode A "Input Layer"** | mini-portal, code drop |
| Jul 21-23 | #12 | **Episode E "HITL Safety"** | mini-portal, code drop |

Five standalone series episodes first - they're already-shipped pieces of the broader ecosystem (WCAG, CDAT, Figma, agent-ready portals). Then four mini-portal code drops from the Context-First QA series itself: B, C, A, E. Each lands in a repo branch you can clone, read, and adapt.

The mini-portal repo: `darco81/context-first-qa-patterns`, AGPL-3.0. Main branch is an index; each episode branch contains the destylat code for that layer. After publication, branches merge to main with an aggregated README.

## The full loop

Here's the end-to-end production flow. One ticket in Jira, one comment out, full audit trail in between.

```mermaid
sequenceDiagram
    actor Dev as Engineer
    participant Jira
    participant ETL as Deterministic ETL
    participant AI as AI Judge
    participant ADF as ADF Publisher

    Dev->>Jira: ticket created
    Jira->>ETL: webhook trigger
    ETL->>ETL: parallel fetch (Jira+Figma+Playwright)
    ETL->>ETL: enrichment, typing, validation

    alt 70% case (deterministic verdict)
        ETL->>ADF: pass/fail report
        ADF->>Jira: comment with audit trail
    else 30% case (needs judgment)
        ETL->>AI: structured QAContext
        AI->>AI: bounded decision (small scope)
        AI->>ADF: validated output schema
        ADF->>Jira: comment with audit trail
    end
```

Maps were components. Loop is integration. Each layer in Part 2 maps to a participant or a transition in this diagram.

## What I'm NOT publishing - and why

Six of the ten layers (D, F, G, H, I, plus production-grade versions of A and E) stay in pitch-mode for now. They don't become public code drops in this window.

Why:
- **D Orchestration** is multi-tenant and tied to a specific dispatcher infrastructure. The architecture is teachable; the operational scaffolding is not.
- **F Vendor-Agnostic Infra** is mid-refactor as I write this. I'd rather publish it once it's done than publish a snapshot that breaks in a month.
- **G Cost & Telemetry** has compliance-adjacent observability concerns. Public destylat would require enough redaction to be misleading.
- **H Operational Discipline** is the "five gotchas in production" piece. Each gotcha is one paragraph; the pattern is teachable in a single article rather than a code drop.
- **I Multi-Process Glue** is the most production-environment-coupled. It assumes a specific shell setup, a specific CI pipeline, a specific dev workflow. Better as consulting work than a public repo.

This isn't withholding for its own sake. It's the three-tier model in action:

- **Tier 1 (public, AGPL):** the method. Architecture, decisions, working code on representative data. You clone, you see, you adapt.
- **Tier 2 (commercial):** production-ready implementation. Multi-tenant, compliance-aware, scaled.
- **Tier 3 (enterprise):** design system federation, cross-repo audit, full toolchain integration.

Public version gives you the method. Production version takes 2-4 weeks of work to implement against your stack. That's where I come in.

## June 15: the forcing function

If your team is already running Agent SDK pipelines for QA, performance audits, or accessibility checks, June 15, 2026 is the date in your calendar.

That's when Anthropic separates programmatic SDK usage from interactive Claude Code subscription windows, and bills SDK traffic at full API rates from a dedicated monthly credit. Today, agentic pipelines borrow against the same rate-limit budget your developers use to write code. After June 15, they don't - they have their own line item, billed per token.

Which means the cost math from Part 1 stops being theoretical. From mid-June, every naive "LLM everywhere" workflow you ship is an explicit invoice item. Every deterministic floor you add subtracts directly from that invoice.

If you've been pricing this work as "fits within my Max plan," the answer changes in four weeks. The deterministic-first pipelines I'm publishing over the next eight episodes were built before this announcement - but they're now the most direct way to keep your QA AI costs predictable through Q3.

Better to ship the floor in May than to discover the bill in July.

## How to engage

I work with 3-5 teams a year. Long engagements - not "AI tool for QA" consultations, not "let's prototype something," not "can you write some tests for us."

What I do:
- Build the deterministic floor under your existing QA, WCAG, or performance audit work.
- Wire LLM judgment into the right slot - bounded, testable, attributable.
- Set up the toolchain so the next person inheriting it can read it.

What I don't do:
- Hand you an "AI test writer." The premise of this series is that you don't want one.
- Promise specific accuracy numbers without measuring your codebase first.
- Take on work where the goal is volume over reproducibility.

If you're shopping for "AI test writer" - we won't be a fit. If you have an existing QA or audit pipeline and you want to add deterministic intelligence on top, DM is the right next step.

More on services: sdet.it/services (the portal is mid-launch, the DM works today).

## Next Tuesday

Series #03 wraps here. The mini-portal Context-First QA is now public infrastructure - bookmark the calendar, the episodes land on schedule.

Next up: Performance audit, 5 specialists in parallel dispatch. Same Lead orchestrator pattern as the WCAG toolkit (Series #01, May 5-7). Different domain. Different numbers - 7 hours with AI vs 16 billable vs a full week classic.

Map is upside-down. Loop is bounded. Audit trail is trustable.

That's the deal.

---


## QA z kontekstem na pierwszym miejscu, część 3: roadmapa

**URL:** https://portfolio.sdet.pl/from-the-field/context-first-qa-part-3
**Published:** 2026-05-21
**Language:** pl
Tags: ai-qa, roadmap, consulting, mini-portal, commercial-gate

Dwa dni temu: teza. Wczoraj: mapa. Dziś: kalendarz. Osiem tygodni. Cztery code dropy. Cztery pitch-mode. Plus jak mnie zaprosić jeśli wzorzec pasuje - i dlaczego 15 czerwca to forcing function.

Dwa dni temu: teza. 1000 zadań, $700 vs $40, 70% nigdy nie dociera do LLM.

Wczoraj: mapa. Dziesięć warstw, A do J. Three-Layer Architecture jako kręgosłup.

Dziś: kalendarz.

Osiem tygodni. Cztery code dropy. Cztery odcinki standalone serii. Plus jak mnie zaprosić jeśli chcesz ten wzorzec w swoim stacku - i dlaczego 15 czerwca to twój forcing function.

---

## Dziesięć warstw, raz jeszcze

Zanim wylądujemy w kalendarzu, szybka powtórka:

- **A** Input Layer (otypowany `QAContext` z bałaganiarskich źródeł)
- **B** Decision Layer (70% bez LLM, deterministyczna brama)
- **C** Output Layer (Atlassian ADF, trik z 303 redirect)
- **D** Orchestration (częściowe kontynuacje, heartbeats)
- **E** HITL Safety (action queue state machine)
- **F** Vendor-Agnostic Infra (zamień providera, nie orchestrator)
- **G** Cost & Telemetry (per-task attribution)
- **H** Operational Discipline (pięć produkcyjnych gotchas)
- **I** Multi-Process Glue (jeden skrypt, trzy backendy)
- **J** Approval UX (HITL UI które nie czuje się jak praca)

Z tych dziesięciu, oto co kiedy publikuje.

## Sekwencja 9 tygodni

| Tydzień | # | Temat | Tryb |
|---|---|---|---|
| 26-28 maja | #04 | Performance audit 5-agent (WCAG ecosystem) | standalone, kod |
| 2-4 czerwca | #05 | Multi-page WCAG (build story V0.4) | standalone, kod |
| 9-11 czerwca | #06 | CDAT pattern (Page Objects od nowa) | standalone, kod |
| 16-18 czerwca | #07 | Figma-to-code deterministyczny | standalone, kod |
| 23-25 czerwca | #08 | 6 portali agent-ready w 70 minut | standalone, narracja |
| 30 czerwca - 2 lipca | #09 | **Odcinek B "70% bez LLM"** | mini-portal, code drop |
| 7-9 lipca | #10 | **Odcinek C "ADF bez łez"** ⭐ | mini-portal, code drop |
| 14-16 lipca | #11 | **Odcinek A "Input Layer"** | mini-portal, code drop |
| 21-23 lipca | #12 | **Odcinek E "HITL Safety"** | mini-portal, code drop |

Pięć odcinków standalone najpierw - to już-wysłane kawałki szerszego ekosystemu (WCAG, CDAT, Figma, agent-ready portale). Potem cztery code dropy mini-portalu z samej serii Context-First QA: B, C, A, E. Każdy ląduje w branchu repo który klonujesz, czytasz, adaptujesz.

Mini-portal repo: `darco81/context-first-qa-patterns`, AGPL-3.0. Main branch to index; każdy episode branch zawiera destylat kodu dla tej warstwy. Po publikacji branche mergują do main z zagregowanym README.

## Pełna pętla

Oto end-to-end produkcyjny flow. Jeden ticket w Jirze wchodzi, jeden komentarz wychodzi, pełen audit trail między.

```mermaid
sequenceDiagram
    actor Dev as Inżynier
    participant Jira
    participant ETL as Deterministyczny ETL
    participant AI as AI Judge
    participant ADF as ADF Publisher

    Dev->>Jira: ticket stworzony
    Jira->>ETL: webhook trigger
    ETL->>ETL: równoległe fetch (Jira+Figma+Playwright)
    ETL->>ETL: enrichment, typowanie, walidacja

    alt 70% case (deterministyczny werdykt)
        ETL->>ADF: raport pass/fail
        ADF->>Jira: komentarz z audit trail
    else 30% case (potrzeba osądu)
        ETL->>AI: ustrukturyzowany QAContext
        AI->>AI: ograniczona decyzja (mały scope)
        AI->>ADF: zwalidowana output schema
        ADF->>Jira: komentarz z audit trail
    end
```

Mapy były komponentami. Pętla to integracja. Każda warstwa z części 2 mapuje na uczestnika albo przejście w tym diagramie.

## Czego NIE publikuję - i dlaczego

Sześć z dziesięciu warstw (D, F, G, H, I, plus produkcyjne wersje A i E) zostaje w pitch-mode na razie. Nie staje się publicznymi code dropami w tym oknie.

Dlaczego:
- **D Orchestration** jest multi-tenant i przywiązane do specyficznej infrastruktury dispatchera. Architektura jest do nauczenia; operational scaffolding nie.
- **F Vendor-Agnostic Infra** jest mid-refactor jak piszę te słowa. Wolę opublikować jak skończone niż wypuścić snapshot który się rozjedzie za miesiąc.
- **G Cost & Telemetry** ma compliance-adjacent observability concerns. Publiczny destylat wymagałby redakcji do poziomu który wprowadza w błąd.
- **H Operational Discipline** to kawałek "pięć gotchas na produkcji". Każdy gotcha to jeden paragraf; wzorzec da się nauczyć w jednym artykule a nie w code dropie.
- **I Multi-Process Glue** jest najbardziej production-environment-coupled. Zakłada konkretny shell setup, konkretny CI, konkretny dev workflow. Lepsze jako praca konsultingowa niż publiczne repo.

To nie withholding dla samego withholdingu. To three-tier model w akcji:

- **Tier 1 (public, AGPL):** metoda. Architektura, decyzje, działający kod na reprezentatywnych danych. Klonujesz, widzisz, adaptujesz.
- **Tier 2 (commercial):** produkcyjna implementacja. Multi-tenant, compliance-aware, scaled.
- **Tier 3 (enterprise):** design system federation, cross-repo audit, pełna integracja toolchain.

Publiczna wersja daje ci metodę. Produkcyjna wersja zajmuje 2-4 tygodnie pracy żeby ją zaimplementować pod twój stack. Tu wchodzę ja.

## 15 czerwca: forcing function

Jeśli twój team już uruchamia Agent SDK pipelines dla QA, performance audits, accessibility checks - 15 czerwca 2026 to data w twoim kalendarzu.

To dzień kiedy Anthropic odseparowuje programmatic SDK usage od interaktywnych okien subskrypcji Claude Code, i billuje SDK traffic po pełnych stawkach API z dedykowanego miesięcznego kredytu. Dziś agentic pipelines pożyczają z tego samego rate-limit budgetu którego twoi devsi używają do pisania kodu. Po 15 czerwca - nie. Mają własną linijkę, billowaną per token.

Co oznacza że cost math z części 1 przestaje być teoretyczna. Od połowy czerwca, każdy naiwny "LLM wszędzie" workflow który wysyłasz to wprost linijka faktury. Każda deterministyczna podłoga którą dorzucasz odejmuje wprost z tej faktury.

Jeśli wyceniałeś tę pracę jako "mieści się w moim Max planie", odpowiedź zmieni się za cztery tygodnie. Deterministic-first pipelines które publikuję przez następne osiem odcinków były zbudowane przed tym ogłoszeniem - ale teraz są najbardziej bezpośrednim sposobem żeby trzymać koszty AI QA przewidywalne przez Q3.

Lepiej wysłać podłogę w maju niż odkryć rachunek w lipcu.

## Jak się dogadać

Pracuję z 3-5 zespołami rocznie. Długie zaangażowania - nie "konsultacja AI tool for QA", nie "sprototypujmy coś", nie "możesz napisać dla nas testy?".

Co robię:
- Buduję deterministyczną podłogę pod istniejące QA, WCAG, performance audit.
- Wpinam osąd LLM w właściwy slot - bounded, testable, attributable.
- Stawiam toolchain tak, żeby następna osoba która to dziedziczy mogła to przeczytać.

Czego nie robię:
- Nie daję ci "AI test writer". Cała seria zakłada że tego nie chcesz.
- Nie obiecuję konkretnych liczb dokładności bez przemierzenia twojego kodu.
- Nie biorę pracy gdzie celem jest wolumen ponad reprodukowalność.

Jeśli szukasz "AI test writer" - nie będziemy pasować. Jeśli masz istniejący QA albo audit pipeline i chcesz dodać deterministyczną inteligencję na wierzch, DM to właściwy następny krok.

Więcej o usługach: sdet.it/services (portal mid-launch, DM działa już dziś).

## Wtorek

Series #03 kończy się tutaj. Mini-portal Context-First QA jest teraz publiczną infrastrukturą - zakładkuj kalendarz, odcinki lądują zgodnie z planem.

Następnie: Performance audit, 5 specjalistów w parallel dispatch. Ten sam wzorzec Lead orchestrator co WCAG toolkit (Series #01, 5-7 maja). Inna domena. Inne liczby - 7 godzin z AI vs 16 billowalne vs tydzień klasycznie.

Mapa jest do góry nogami. Pętla jest ograniczona. Audit trail jest do zaufania.

I tyle.

---


## Context-First QA, Part 2: The 10 Maps

**URL:** https://portfolio.sdet.it/from-the-field/context-first-qa-part-2
**Published:** 2026-05-20
**Language:** en
Tags: ai-qa, architecture, three-layer, llm-engineering, jarvis

Yesterday I showed the math. Today I show the map. Ten architectural layers that need to be deterministic before AI can do anything useful.

Yesterday I showed the math. 1000 tasks, $700 vs $40, 70% never hits an LLM.

Today I show the map.

Ten architectural layers that need to be deterministic before AI can do anything useful. Four of them I'm publishing as code drops over the next 8 weeks. The other six stay in pitch-mode and I'll tell you why.

---

## Recap: the sandwich

If you missed Part 1: deterministic input, AI middle, deterministic output. The LLM lives in a small bounded slot in the middle. Code is the bun, model is the patty. Everything else is condiments.

But "deterministic in, AI middle, deterministic out" is the spine. Now we put meat on the bones.

## Three-Layer Architecture

Every piece of the pipeline follows the same internal pattern.

```mermaid
graph TB
    subgraph Raw["Layer 1: Raw (deterministic capture)"]
        J[Jira API raw]
        F[Figma MCP raw]
        P[Playwright snap raw]
    end

    subgraph Enriched["Layer 2: Enriched (deterministic ETL)"]
        TE[Token extraction]
        AE[ARIA + computed CSS]
        CE[Context bundle]
    end

    subgraph Builders["Layer 3: Builders (structured output)"]
        TC[Task context object]
        BP[Baseline + current pair]
        QC[QA prompt context]
    end

    Raw --> Enriched --> Builders --> AI[AI Decision]
    AI --> O[Validated output]
```

Three internal layers, repeated across every component:

1. **Raw** - the wild west. Jira's HTML, Figma's design-tokens JSON blob, Playwright's full DOM dump. Source-of-truth fetch, nothing more.
2. **Enriched** - deterministic ETL. Parse the ADF. Walk the Figma tree. Compute styles. Still no LLM, still no interpretation - just code that knows the shape of each source.
3. **Builders** - typed objects. Whatever passes from here forward has a schema, a contract, a test. The LLM, when it eventually shows up, sees only Builder output.

This is the same pattern that powers my WCAG toolkit (Series #01, May 5-7). Same pattern shows up at scale in multi-page audits (Series #05, coming June 2-4). Same pattern in Figma-to-code (#07, June 16-18). It's not novel. It's just disciplined.

## The 10 maps

Ten themes. Each one is a layer that needs to be deterministic before the AI gets involved. Here's the full map.

**A: Input Layer** - capturing the wild west deterministically  →  Episode #11, July 14-16

Jira API + Figma MCP + Playwright snap. Three messy sources. One typed `QAContext` object. AI never reads raw HTML.

**B: Decision Layer** - 70% without an LLM  →  Episode #09, June 30 - July 2 ⭐

The decision gate from Part 1. CSS diff + pixel diff + a11y regression check = deterministic verdict. 70% of tasks exit here.

**C: Output Layer** - Atlassian ADF without tears  →  Episode #10, July 7-9 ⭐ TOP FLAGSHIP

Jira ADF inline images via a 303 redirect. The trick that solves a famous Jira pain. Plus markdown-to-ADF bridges and multi-stage uploads.

**D: Orchestration** - many agents, one story  →  August/September

Partial continuations, heartbeats, retry semantics. Multi-agent dispatch where every agent knows its scope and its boundary.

**E: HITL Safety** - every WRITE asks permission  →  Episode #12, July 21-23 ⭐

Action queue state machine. Atomic approve-and-execute with partial-failure tracking. The dedup cautionary tale where my UNIQUE constraint was wrong.

**F: Vendor-Agnostic Infra** - swap provider, not orchestrator  →  September

Extracting vendor-agnostic QA infra from a Claude-bound dispatcher. The pattern that makes a multi-LLM future cheap.

**G: Cost & Telemetry** - cents, not dollars  →  Q4 2026

Per-task cost attribution. Token telemetry per worker. When to cache, when to batch, when to fail fast.

**H: Operational Discipline** - calendars, dedup, UNIQUE was wrong  →  Q4 2026

Five gotchas I hit in production. UNIQUE on `(task, run)` seemed obvious - until partial retries broke it.

**I: Multi-Process Glue** - terminals, tmux, CI  →  Q4 2026

One script, three backends. Same shell across Claude Code, OpenCode subprocess, direct API.

**J: Approval UX** - HITL UI that doesn't feel like work  →  Episode #13, August 4-6

Drag-drop approval queues. Countdowns. Editable QA reports. UX I built so I'd actually use my own agent.

## Why ten

Why not three? Why not twenty?

Because each of these layers is **independently swappable**, **deterministic** (or has bounded LLM scope, no creep), and has its own **test surface**. Deterministic layers get unit tests. LLM layers get contract tests against schemas. Both layers can fail loudly, and both can be debugged in isolation.

The anti-pattern this avoids: the "agent does everything" trap. When one component has no boundary, it has no testable failure mode. When ten components each own one thing, you can fix what's broken without rewriting what works.

## The input layer up close

Layer A is the most interesting to start with. It's where everyone makes the first mistake.

The mistake: feeding raw Jira HTML to an LLM. The model reads the markup, hallucinates the field structure, misses an attachment, and produces a verdict based on half the ticket. You wouldn't know until production.

The fix: a single-call task aggregator that returns a typed `QAContext` object. Here's the shape.

```typescript
// Single-call task aggregator - one webhook, full context
// This is what AI eventually sees. Pre-validated, structured, typed.

async function assembleContext(taskId: string): Promise<QAContext> {
  // Parallel deterministic fetches
  const [jiraRaw, figmaRaw, snapRaw] = await Promise.all([
    fetchJiraTask(taskId),         // n8n webhook, full task + comments + attachments
    fetchFigmaBaseline(taskId),    // MCP call, design tokens + components
    capturePlaywrightSnap(taskId), // headless browser, aria + computed CSS
  ]);

  // Layer 2: enrichment (still deterministic - no LLM)
  const enriched = {
    jira: parseJiraADF(jiraRaw),
    figma: extractTokens(figmaRaw),
    snap: { aria: snapRaw.tree, css: snapRaw.computedStyles },
  };

  // Layer 3: builder pattern - typed output AI can trust
  return {
    taskMeta: { id: taskId, type: enriched.jira.type, severity: enriched.jira.severity },
    baseline: enriched.figma,
    current: enriched.snap,
    diff: computeStructuralDiff(enriched.figma, enriched.snap),
  };
}
```

Three parallel fetches. Each returns raw data. Each gets enriched separately. Then a builder produces a typed `QAContext` - that's what the LLM eventually sees.

Token-wise: the naive version sends ~8K tokens to the model (raw Jira HTML alone is huge). The structured version sends ~2K. That's 4x cheaper just from the input layer, before any decision-gate routing.

And the AI **literally cannot** read raw HTML in this architecture. It only sees `QAContext`. If the parser breaks, the build breaks. If the model misinterprets a typed field, that's a model failure with a logged input, not a parsing hallucination.

## What's next

Tomorrow: the calendar.

Four of these ten layers (B, C, A, E) publish as full code drops over the next 8 weeks. The other six stay in pitch-mode for now - they're production-specific, multi-tenant, compliance-sensitive, or simply not the right scope for a public destylat.

I'll show you exactly which week each one lands, what the repo branch looks like, and how to engage if you want this pattern in your stack.

Map is upside-down for most teams. Let me show you mine.

---


## QA z kontekstem na pierwszym miejscu, część 2: 10 map

**URL:** https://portfolio.sdet.pl/from-the-field/context-first-qa-part-2
**Published:** 2026-05-20
**Language:** pl
Tags: ai-qa, architecture, three-layer, llm-engineering, jarvis

Wczoraj pokazałem matematykę. Dziś pokazuję mapę. Dziesięć warstw architektonicznych które muszą być deterministyczne zanim AI zrobi cokolwiek użytecznego.

Wczoraj pokazałem matematykę. 1000 zadań, $700 vs $40, 70% nigdy nie dociera do LLM.

Dziś pokazuję mapę.

Dziesięć warstw architektonicznych które muszą być deterministyczne zanim AI zrobi cokolwiek użytecznego. Cztery z nich publikuję jako code dropy przez następne 8 tygodni. Pozostałe sześć zostaje w pitch-mode i powiem dlaczego.

---

## Powtórka kanapki

Jeśli przegapiłeś część 1: deterministyczny input, AI w środku, deterministyczny output. LLM siedzi w małym ograniczonym slocie po środku. Kod to bułka, model to kotlet. Reszta to dodatki.

Ale "deterministycznie wejście, AI środek, deterministycznie wyjście" to kręgosłup. Teraz zakładamy mięso na kości.

## Three-Layer Architecture

Każdy kawałek pipeline'u trzyma się tego samego wewnętrznego wzorca.

```mermaid
graph TB
    subgraph Raw["Warstwa 1: Raw (deterministyczne pobranie)"]
        J[Jira API raw]
        F[Figma MCP raw]
        P[Playwright snap raw]
    end

    subgraph Enriched["Warstwa 2: Enriched (deterministyczny ETL)"]
        TE[Ekstrakcja tokenów]
        AE[ARIA + computed CSS]
        CE[Context bundle]
    end

    subgraph Builders["Warstwa 3: Builders (ustrukturyzowany output)"]
        TC[Task context object]
        BP[Baseline + current pair]
        QC[QA prompt context]
    end

    Raw --> Enriched --> Builders --> AI[Decyzja AI]
    AI --> O[Zwalidowany output]
```

Trzy wewnętrzne warstwy, powtarzane w każdym komponencie:

1. **Raw** - dziki zachód. HTML z Jiry, blob JSON design-tokens z Figmy, pełny DOM dump z Playwrighta. Pobranie source-of-truth, nic więcej.
2. **Enriched** - deterministyczny ETL. Parse ADF. Walk po drzewie Figmy. Compute styles. Nadal bez LLM, nadal bez interpretacji - po prostu kod który zna kształt każdego źródła.
3. **Builders** - otypowane obiekty. Cokolwiek przechodzi dalej ma schemę, kontrakt, test. LLM, kiedy w końcu się pojawia, widzi tylko output Builderów.

To ten sam wzorzec który napędza mój WCAG toolkit (Series #01, 5-7 maja). Ten sam wzorzec pojawia się przy skali w multi-page audits (Series #05, 2-4 czerwca). Ten sam wzorzec w Figma-to-code (#07, 16-18 czerwca). To nie nowość. To po prostu dyscyplina.

## 10 map

Dziesięć tematów. Każdy to warstwa która musi być deterministyczna zanim AI się włączy. Oto pełna mapa.

**A: Input Layer** - łapanie dzikiego zachodu deterministycznie  →  Odcinek #11, 14-16 lipca

Jira API + Figma MCP + Playwright snap. Trzy bałaganiarskie źródła. Jeden otypowany obiekt `QAContext`. AI nigdy nie czyta surowego HTML.

**B: Decision Layer** - 70% bez LLM  →  Odcinek #09, 30 czerwca - 2 lipca ⭐

Brama decyzyjna z części 1. CSS diff + pixel diff + a11y regression check = deterministyczny werdykt. 70% zadań wychodzi tutaj.

**C: Output Layer** - Atlassian ADF bez łez  →  Odcinek #10, 7-9 lipca ⭐ TOP FLAGSHIP

Inline obrazki w Jira ADF przez 303 redirect. Trik który rozwiązuje słynny ból Jiry. Plus mostki markdown-do-ADF i multi-stage uploads.

**D: Orchestration** - wielu agentów, jedna historia  →  Sierpień/wrzesień

Częściowe kontynuacje, heartbeats, retry semantics. Multi-agent dispatch gdzie każdy agent zna swój scope i swoją granicę.

**E: HITL Safety** - każdy WRITE prosi o zgodę  →  Odcinek #12, 21-23 lipca ⭐

Action queue state machine. Atomic approve-and-execute z trackingiem częściowych failures. Cautionary tale o dedup gdzie mój UNIQUE constraint był zły.

**F: Vendor-Agnostic Infra** - zmień providera, nie orchestrator  →  Wrzesień

Wyciąganie vendor-agnostic QA infra z dispatchera związanego z Claude. Wzorzec który robi multi-LLM future tanim.

**G: Cost & Telemetry** - grosze, nie dolary  →  Q4 2026

Per-task cost attribution. Token telemetry per worker. Kiedy cache'ować, kiedy batch'ować, kiedy fail fast.

**H: Operational Discipline** - kalendarze, dedup, UNIQUE był zły  →  Q4 2026

Pięć gotchas które złapałem na produkcji. UNIQUE na `(task, run)` wydawało się oczywiste - dopóki częściowe retry tego nie zepsuły.

**I: Multi-Process Glue** - terminale, tmux, CI  →  Q4 2026

Jeden skrypt, trzy backendy. Ten sam shell przez Claude Code, OpenCode subprocess, direct API.

**J: Approval UX** - HITL UI które nie czuje się jak praca  →  Odcinek #13, 4-6 sierpnia

Drag-drop approval queues. Countdowns. Edytowalne raporty QA. UX który zbudowałem żebym sam chciał używać własnego agenta.

## Dlaczego dziesięć

Dlaczego nie trzy? Dlaczego nie dwadzieścia?

Bo każda z tych warstw jest **niezależnie wymienna**, **deterministyczna** (lub ma ograniczony scope LLM, bez scope creep), i ma własny **test surface**. Deterministyczne warstwy dostają unit testy. Warstwy LLM dostają contract testy przeciwko schemom. Obie warstwy mogą fail loudly i obie da się debugować w izolacji.

Anti-pattern który tego unika: pułapka "agent robi wszystko". Kiedy jeden komponent nie ma granicy, nie ma testowalnego failure mode. Kiedy dziesięć komponentów posiada po jednej rzeczy, możesz naprawić co jest zepsute bez przepisywania tego co działa.

## Input layer z bliska

Warstwa A jest najciekawsza do zaczęcia. Tu wszyscy popełniają pierwszy błąd.

Błąd: skarmianie surowego HTML z Jiry do LLM. Model czyta markup, halucynuje strukturę pól, gubi attachment, produkuje werdykt na bazie połowy ticketu. Nie dowiesz się dopóki produkcja się nie wywali.

Fix: single-call task aggregator który zwraca otypowany obiekt `QAContext`. Oto kształt.

```typescript
// Single-call task aggregator - jeden webhook, pełen kontekst
// To jest to co AI w końcu widzi. Pre-walidowane, ustrukturyzowane, otypowane.

async function assembleContext(taskId: string): Promise<QAContext> {
  // Równoległe deterministyczne pobrania
  const [jiraRaw, figmaRaw, snapRaw] = await Promise.all([
    fetchJiraTask(taskId),         // n8n webhook, full task + komentarze + attachments
    fetchFigmaBaseline(taskId),    // MCP call, design tokens + komponenty
    capturePlaywrightSnap(taskId), // headless browser, aria + computed CSS
  ]);

  // Warstwa 2: enrichment (nadal deterministycznie - bez LLM)
  const enriched = {
    jira: parseJiraADF(jiraRaw),
    figma: extractTokens(figmaRaw),
    snap: { aria: snapRaw.tree, css: snapRaw.computedStyles },
  };

  // Warstwa 3: builder pattern - otypowany output któremu AI może zaufać
  return {
    taskMeta: { id: taskId, type: enriched.jira.type, severity: enriched.jira.severity },
    baseline: enriched.figma,
    current: enriched.snap,
    diff: computeStructuralDiff(enriched.figma, enriched.snap),
  };
}
```

Trzy równoległe pobrania. Każde zwraca surowe dane. Każde wzbogacone osobno. Potem builder produkuje otypowany `QAContext` - to widzi LLM.

Token-wise: naiwna wersja wysyła ~8K tokenów do modelu (samo surowe HTML z Jiry jest ogromne). Ustrukturyzowana wersja wysyła ~2K. To 4x taniej tylko z input layer, przed jakimkolwiek routingiem przez decision-gate.

I AI **literalnie nie może** czytać surowego HTML w tej architekturze. Widzi tylko `QAContext`. Jeśli parser się sypie, build się sypie. Jeśli model źle interpretuje otypowane pole, to jest model failure z zalogowanym inputem, nie halucynacja parsera.

## Co dalej

Jutro: kalendarz.

Cztery z tych dziesięciu warstw (B, C, A, E) publikują się jako pełne code dropy przez następne 8 tygodni. Pozostałe sześć zostaje w pitch-mode na razie - są produkcyjno-specyficzne, multi-tenant, compliance-sensitive, albo po prostu nie są właściwym scope na publiczny destylat.

Pokażę dokładnie który tydzień każda ląduje, jak wygląda branch w repo, i jak się ze mną dogadać jeśli chcesz ten wzorzec w swoim stacku.

Mapa jest do góry nogami dla większości teamów. Pokażę ci moją.

---


## Context-First QA, Part 1: The Thesis

**URL:** https://portfolio.sdet.it/from-the-field/context-first-qa-part-1
**Published:** 2026-05-19
**Language:** en
Tags: ai-qa, manual-testing, architecture, llm-engineering, cost-engineering

I never let AI write my tests. I let it make decisions inside a deterministic harness. Here's the math, and why June 15 makes it visible.

1000 QA tasks. Naive approach: every one through an LLM. Cost: a small mortgage.

My approach: most tickets never see an LLM. Cost: cents.

Same accuracy. Same audit trail. Better sleep.

And in four weeks, the bill stops hiding.

---

## The wrong default

Every AI-in-QA post I scroll past says some version of the same thing. **Let AI write your tests.** Let AI generate scenarios. Let AI explore your app. Let AI judge whether things look right.

It demos beautifully. It crashes on production.

The pattern: an agent reads raw Jira HTML, parses it, guesses element selectors, hallucinates the user's intent, writes a test, runs it. Half the time the test passes for the wrong reason. The other half it fails because Playwright timed out on a popup the LLM didn't know existed.

I built this stack three months ago. It worked on demos. It failed on real tickets.

Then I inverted the architecture. The map was upside-down.

## Context before LLM

AI is the **second** step, not the first.

The first step is a deterministic engineering pipeline that hands AI clean, structured, verified context. The LLM never sees raw Jira. It never parses HTML. It never guesses CSS selectors. By the time the model gets a token, every input has been fetched, normalized, typed, and validated.

Think of it this way: AI as a senior developer doing code review on a well-prepared PR vs. AI as an intern told to "figure out the Jira." Same model. Opposite results.

The principle: **deterministic harness around the LLM, not the other way around.** Code that yields the same output for the same input. Always. No probability. No vibes. The LLM is small, contained, and verifiable - one component in a system, not the system itself.

It's a sandwich.

## The sandwich and the math

Three layers:

1. **Deterministic input** - parallel fetches from Jira, Figma, Playwright. Parsing, enrichment, typing. All pure code.
2. **AI decision** - small, bounded scope. Model sees structured JSON, returns one of N validated verdicts.
3. **Deterministic output** - typed object goes to a publisher that writes to Jira, attaches images, files an audit record.

The LLM sees maybe 5% of the pipeline's scope. The other 95% is code with tests.

Here's what that means at scale. Take a real e-commerce QA workload: 1000 tickets a month.

```
Scenario: 1000 e-commerce QA tasks/month

Naive "LLM everywhere":
- Avg 8K tokens input + 2K output per task
- $0.03/1K input + $0.15/1K output (Claude Sonnet)
- Per task: $0.24 + $0.30 = $0.54
- Monthly: $540 + retries + failures ~ $700-900

Deterministic-first (mine):
- ~70% tasks: zero LLM (static checks, pattern matching, CSS diff)
- ~30% tasks: LLM judgment only on prepared JSON context
- Avg 2K input + 500 output per LLM call
- Per task (LLM ones): $0.06 + $0.075 = $0.135
- Monthly: $0.135 x 300 = $40

Same accuracy. ~18x cheaper. Audit trail intact.
```

Take the table. Recalculate for your scenario. The math doesn't lie.

The 70/30 split is a representative baseline, not a hardcoded SLA. In real production runs it moves with the workload - sometimes 60%, sometimes 80% exits without the LLM. Cosmetic regressions skew it toward more deterministic exits; ambiguous UX judgement skews it toward more LLM calls. Either way: the deterministic floor catches first, the LLM only sees what's left.

## Two notes if you're on a Claude Max subscription

Right now, May 2026, Agent SDK calls and `claude -p` invocations share your Claude Code subscription window. So naive "LLM everywhere" doesn't show up as a line item on your invoice - it shows up as your 5-hour window evaporating before lunch. The dollar math hides, but your daily productive hours don't.

**That changes on June 15, 2026.**

Anthropic is splitting programmatic usage off into its own monthly credit, billed at full API rates. Claude Agent SDK, `claude -p`, Claude Code GitHub Actions, and third-party tools built on the Agent SDK all move to a separate budget at standard API pricing. Your interactive Claude Code stays on subscription. Your agentic pipelines move to pay-per-token.

So the cost table above stops being theoretical for anyone running Agent SDK-driven QA workflows. From June 15 forward, naive "LLM everywhere" is an explicit line item on your bill - and "deterministic floor first" stops being a nice-to-have. It becomes a forcing function.

Better to design for the deterministic floor now than to discover the bill in July.

## The decision gate

Most QA decisions are decidable with deterministic checks - CSS diff against baseline, pixel diff with tolerance, accessibility regression detection. No interpretation needed. Pass or fail, with the reason.

Here's the shape of the gate:

```typescript
// Deterministic decision gate - most tasks exit here
// AI sees nothing. No tokens spent.

function decideQAPath(task: QATask): QAVerdict {
  // Layer 1: static deterministic checks
  if (task.cssBaseline && cssDiff(task.current, task.baseline) === 0) {
    return { verdict: "PASS", reason: "css-identical", llmNeeded: false };
  }

  if (task.visualBaseline && pixelDiff(task.snap, task.baseline) < 0.001) {
    return { verdict: "PASS", reason: "visual-identical", llmNeeded: false };
  }

  // Layer 2: structural checks (still deterministic)
  if (hasA11yRegression(task.aria, task.baselineAria)) {
    return { verdict: "FAIL", reason: "a11y-regression", llmNeeded: false };
  }

  // Only here LLM enters - structured JSON context already prepared
  return { verdict: "AMBIGUOUS", reason: "needs-judgment", llmNeeded: true };
}
```

Three deterministic checks. Three early exits. Token cost: zero. If all three pass through to the AMBIGUOUS verdict, the LLM gets called - but only on tasks that genuinely need judgment.

The majority of tickets exit before any model wakes up. The minority hit the LLM with a typed `QATask` object that someone or something has already validated. Tokens go toward judgment, not parsing.

## Why this matters at scale

The headline number from my own production: **94 QA tickets processed in 2.5 days** on a single project. Classic manual approach for the same scope: a full week of team work, easily two. With this stack: one engineer, half the week, full audit trail.

The other things that hold up at scale:

- Deterministic exits keep the cost curve flat. You don't pay tokens for clear-cut diffs.
- Audit trail intact - every verdict has a reason code, every LLM call has its input bundle attached. You can replay any decision tomorrow.
- Failure modes are bounded. A flaky test still fails - but it fails the same way every time, with the same logs, and you fix it once.

The deeper reason isn't cost. It's blast radius. When AI runs at the I/O layer, every Playwright flake compounds with every LLM hallucination. Failures stack. You can't debug them, you can't reproduce them, you can't trust them.

When AI runs at the judgment layer only, with a deterministic floor underneath and a deterministic publisher above, failures are bounded. The line I keep coming back to: **if your AI agent is doing more than judging, you have a deterministic floor problem.**

## What's next

This is just the spine. The actual production system has ten specific architecture layers, each with its own pattern, its own gotchas, its own reason to be deterministic before the LLM enters.

Tomorrow I publish the full map. Ten layers, A through J. What goes in each. Which four I'm releasing as code drops over the next 8 weeks. Which six stay in pitch-mode for now and why.

If you've ever wondered why your "AI QA" demo worked but production crashed - tomorrow's piece names every gap I had to fix.

It's not magic. It's a sandwich.

---


## QA z kontekstem na pierwszym miejscu, część 1: teza

**URL:** https://portfolio.sdet.pl/from-the-field/context-first-qa-part-1
**Published:** 2026-05-19
**Language:** pl
Tags: ai-qa, manual-testing, architecture, llm-engineering, cost-engineering

Nigdy nie pozwalam AI pisać moich testów. Pozwalam mu podejmować decyzje wewnątrz deterministycznego harnessu. Oto matematyka - i dlaczego 15 czerwca robi rachunek widocznym.

1000 zadań QA. Naiwne podejście: każde przez LLM. Koszt: drobny kredyt hipoteczny.

Moje podejście: większość ticketów nigdy nie widzi LLM. Koszt: grosze.

Ta sama dokładność. Ten sam audit trail. Lepszy sen.

A za cztery tygodnie rachunek przestaje się chować.

---

## Zły domyślny ruch

Każdy post o AI w QA który przewija mi się na feedzie mówi tę samą rzecz w różnych wariantach. **Niech AI napisze ci testy.** Niech AI wygeneruje scenariusze. Niech AI sprawdzi czy aplikacja wygląda OK.

Na demo wygląda pięknie. Na produkcji się sypie.

Wzorzec: agent czyta surowe HTML z Jiry, parsuje, zgaduje selektory, halucynuje intencję użytkownika, pisze test, uruchamia. Połowa razów test przechodzi z niewłaściwego powodu. Druga połowa wywala się bo Playwright timeoutuje na popupie którego LLM nie widział.

Budowałem to trzy miesiące temu. Działało na demo. Sypało się na realnych ticketach.

Potem odwróciłem architekturę. Mapa była do góry nogami.

## Kontekst przed LLM

AI to **drugi** krok, nie pierwszy.

Pierwszy krok to deterministyczny pipeline inżynierski który podaje AI czysty, ustrukturyzowany, zwalidowany kontekst. LLM nigdy nie widzi surowej Jiry. Nigdy nie parsuje HTML. Nigdy nie zgaduje selektorów CSS. Kiedy model dostaje pierwszy token, każdy input został już pobrany, znormalizowany, otypowany i zwalidowany.

Pomyśl o tym tak: AI jako senior dev robiący code review na dobrze przygotowanym PR vs. AI jako stażysta z poleceniem "ogarnij tę Jirę". Ten sam model. Przeciwne wyniki.

Zasada: **deterministyczny harness wokół LLM, nie odwrotnie.** Kod który daje ten sam output dla tego samego inputu. Zawsze. Bez prawdopodobieństwa. Bez czujki. LLM jest mały, ograniczony, weryfikowalny - jeden komponent w systemie, nie sam system.

To jest kanapka.

## Kanapka i matematyka

Trzy warstwy:

1. **Deterministyczny input** - równoległe pobieranie z Jiry, Figmy, Playwrighta. Parsing, wzbogacanie, typowanie. Czysty kod.
2. **Decyzja AI** - mały, ograniczony zakres. Model widzi ustrukturyzowany JSON, zwraca jedno z N zwalidowanych werdyktów.
3. **Deterministyczny output** - typowany obiekt idzie do publishera który pisze do Jiry, dołącza obrazki, robi audit record.

LLM widzi może 5% scope'u pipeline'u. Pozostałe 95% to kod z testami.

Oto co to znaczy przy skali. Weź realny e-commerce workload QA: 1000 ticketów miesięcznie.

```
Scenariusz: 1000 zadań QA e-commerce miesięcznie

Naiwnie "LLM wszędzie":
- Średnio 8K tokenów input + 2K output na zadanie
- $0.03/1K input + $0.15/1K output (Claude Sonnet)
- Na zadanie: $0.24 + $0.30 = $0.54
- Miesięcznie: $540 + retry + failures ~ $700-900

Deterministic-first (mój):
- ~70% zadań: zero LLM (static checks, pattern matching, CSS diff)
- ~30% zadań: LLM tylko na przygotowanym JSON context
- Średnio 2K input + 500 output na wywołanie LLM
- Na zadanie (te z LLM): $0.06 + $0.075 = $0.135
- Miesięcznie: $0.135 x 300 = $40

Ta sama dokładność. ~18x taniej. Audit trail nienaruszony.
```

Weź tabelę. Przelicz dla swojego scenariusza. Matematyka nie kłamie.

Podział 70/30 to representative baseline, nie hardcoded SLA. W realnych produkcyjnych runach rusza się z workloadem - czasem 60%, czasem 80% wychodzi bez LLM. Kosmetyczne regresje skrzywiają w stronę więcej deterministycznych wyjść; niejednoznaczne osądy UX skrzywiają w stronę więcej wywołań LLM. Tak czy inaczej: deterministyczna podłoga łapie pierwsza, LLM widzi tylko co zostaje.

## Dwie uwagi jeśli jesteś na subskrypcji Claude Max

Teraz, maj 2026, wywołania Agent SDK i `claude -p` dzielą okno twojej subskrypcji Claude Code. Więc naiwne "LLM wszędzie" nie pojawia się jako linijka na fakturze - pojawia się jako 5-godzinne okno parujące przed obiadem. Matematyka dolarowa się ukrywa, ale produktywne godziny dnia nie.

**To się zmienia 15 czerwca 2026.**

Anthropic wydziela programmatic usage do osobnego miesięcznego kredytu, billowanego po pełnych stawkach API. Claude Agent SDK, `claude -p`, Claude Code GitHub Actions i third-party narzędzia zbudowane na Agent SDK - wszystko przechodzi na osobny budget po standardowej cenie API. Twój interaktywny Claude Code zostaje na subskrypcji. Twoje agentic pipelines przechodzą na pay-per-token.

Czyli tabela kosztów powyżej przestaje być teoretyczna dla każdego kto uruchamia QA workflows oparte na Agent SDK. Od 15 czerwca naiwne "LLM wszędzie" to wprost linijka na rachunku - a "deterministyczna podłoga najpierw" przestaje być nice-to-have. Staje się forcing function.

Lepiej zaprojektować deterministyczną podłogę teraz, niż odkryć rachunek w lipcu.

## Brama decyzyjna

Większość decyzji QA daje się rozstrzygnąć deterministycznymi sprawdzeniami - CSS diff przeciwko baseline, pixel diff z tolerancją, detekcja regresji a11y. Bez interpretacji. Pass albo fail, z powodem.

Oto kształt bramy:

```typescript
// Deterministyczna brama decyzyjna - większość zadań kończy tutaj
// AI nic nie widzi. Zero tokenów.

function decideQAPath(task: QATask): QAVerdict {
  // Warstwa 1: statyczne deterministyczne sprawdzenia
  if (task.cssBaseline && cssDiff(task.current, task.baseline) === 0) {
    return { verdict: "PASS", reason: "css-identical", llmNeeded: false };
  }

  if (task.visualBaseline && pixelDiff(task.snap, task.baseline) < 0.001) {
    return { verdict: "PASS", reason: "visual-identical", llmNeeded: false };
  }

  // Warstwa 2: strukturalne sprawdzenia (nadal deterministyczne)
  if (hasA11yRegression(task.aria, task.baselineAria)) {
    return { verdict: "FAIL", reason: "a11y-regression", llmNeeded: false };
  }

  // Dopiero tu wchodzi LLM - ustrukturyzowany JSON context już gotowy
  return { verdict: "AMBIGUOUS", reason: "needs-judgment", llmNeeded: true };
}
```

Trzy deterministyczne sprawdzenia. Trzy wczesne wyjścia. Koszt tokenów: zero. Jeśli wszystkie trzy przepuszczą do AMBIGUOUS, LLM zostaje wywołany - ale tylko na zadaniach które naprawdę wymagają osądu.

Większość ticketów wychodzi zanim jakikolwiek model się obudzi. Pozostała mniejszość trafia do LLM z otypowanym obiektem `QATask` który ktoś albo coś już zwalidował. Tokeny idą na osąd, nie na parsowanie.

## Dlaczego to ma znaczenie przy skali

Główna liczba z mojej produkcji: **94 ticketów QA przetworzonych w 2.5 dnia** na jednym projekcie. Klasyczne manualne podejście dla tego samego scope: tydzień pracy zespołu, lekko dwa. Z tym stackiem: jeden inżynier, pół tygodnia, pełen audit trail.

Pozostałe rzeczy które się trzymają przy skali:

- Deterministyczne wyjścia trzymają krzywą kosztów płaską. Nie płacisz tokenów za jasne diffy.
- Audit trail nienaruszony - każdy werdykt ma reason code, każde wywołanie LLM ma swój input bundle przyklejony. Możesz odtworzyć każdą decyzję jutro.
- Failure modes są bounded. Flaky test nadal się wywali - ale wywala się tak samo za każdym razem, z tymi samymi logami, i fixujesz go raz.

Głębszy powód to nie koszt. To blast radius. Kiedy AI siedzi w warstwie I/O, każdy flaky Playwright łączy się z każdą halucynacją LLM. Failures się stackują. Nie da się ich debugować, reprodukować, zaufać im.

Kiedy AI siedzi tylko w warstwie osądu, z deterministyczną podłogą pod spodem i deterministycznym publisherem nad nią, failures są bounded. Linijka do której ciągle wracam: **jeśli twój agent AI robi więcej niż osądza, masz problem z deterministyczną podłogą.**

## Co dalej

To dopiero kręgosłup. Realny produkcyjny system ma dziesięć konkretnych warstw architektonicznych, każda z własnym wzorcem, własnymi gotchas, własnym powodem żeby być deterministyczna zanim LLM wejdzie.

Jutro publikuję pełną mapę. Dziesięć warstw, A do J. Co siedzi w każdej. Które cztery wypuszczam jako code drops przez najbliższe 8 tygodni. Które sześć zostaje w pitch-mode na razie i dlaczego.

Jeśli kiedykolwiek zastanawiałeś się dlaczego twoje demo "AI QA" działało a produkcja się sypała - jutrzejszy odcinek nazywa każdą lukę którą musiałem załatać.

To nie magia. To kanapka.

---


## When This Pays Off - And When Grep Is Already Enough

**URL:** https://portfolio.sdet.it/from-the-field/jarvis-brain-part-3
**Published:** 2026-05-14
**Language:** en
Tags: ai-tooling, mcp, context-engineering, from-the-field

Day 3 of three. Should you adopt jarvis-brain at all? When graph-backed context pays off, when Grep is still enough, and what the V0.5 enterprise tier adds. From the field #02 Part 3.

Part 1 framed the problem. Part 2 shipped the engine and the benchmark. Part 3 is the post you should read before deciding to adopt this - because the honest answer to "should I run jarvis-brain on my codebase" is "it depends on your codebase, and here is how to tell."

Plus the V0.5 tier - what the same engine looks like when the unit of work stops being a repo and starts being a design system org with multiple consumer fronts.

## When it pays off

The benchmark numbers from yesterday were averaged across fifty questions on a 5-repo monorepo. That word - monorepo - is doing real work. Brain wins by the largest margin on architecture and cross-repo questions, where the graph carries information the native tools have to reconstruct from scratch every time.

Concretely, the signals that say "this engine will pay off on your codebase":

- **Three or more repos with shared code.** A core package consumed by multiple consumer apps. A design system used by several frontends. Microservices that import from a common toolkit. The federation layer is where brain earns its place.
- **Tens of thousands of files or more.** At under fifty files, Glob is fast and the model holds the whole thing in working memory. At fifty thousand, the model is doing exploration burn on every cross-cutting question.
- **Design system or shared primitives that get overridden per consumer.** If your consumers are doing `:root { --color-brand: ... }` overrides on tokens defined in a shared library, you have a problem class that `brain_ffcss` was specifically built for.
- **Multiple engineers asking architectural questions of the same codebase.** "Where do god-nodes live", "what depends on this primitive", "what gets touched if I change this contract". These questions take fifty-three percent less wall time with brain than without.
- **Long-lived projects where exploration burn compounds.** If your AI sessions consistently spend the first five minutes re-discovering the project layout, that is the thing being eliminated.

Brain does not eliminate the LLM's job. It eliminates the part where the LLM has to re-derive structural facts from raw source on every question.

## When Grep is already enough

The honest counterpart. The signals that say "you do not need this yet":

- **Single repo under fifty files.** Glob and Grep are already optimal here. The benchmark caught this directly - code discovery questions ran twenty percent slower with brain than without.
- **Solo developer working on a project they wrote.** You have the structure in your head. The LLM is helping with execution, not navigation. The marginal value of pre-computed graphs is small.
- **Throwaway scripts, prototypes, exploratory notebooks.** Anything you might delete in two weeks. The cost of building and maintaining the graph exceeds the cost of the exploration burn.
- **Codebases that change shape every week.** The graph staleness becomes a maintenance burden of its own. Stick with what reads source on every query.
- **No cross-repo concern.** If your work is bounded inside one repo and you do not need to trace anything across boundaries, the federation layer adds complexity without payoff.

The trap to avoid: adopting tooling because it is impressive in a benchmark, then paying maintenance cost on something you did not need. Engineering judgment here looks like "where does my time actually go when I work with the AI" - if the answer is mostly "writing code" and not "explaining the codebase to it", you probably do not need graph-backed context. If the answer is the other way around, you probably do.

## The daily use case - a multi-brand commerce setup

To make this concrete, here is the shape of the work where brain earns its keep daily.

The setup: one shared `core` package - components, composables, types, design tokens, the routing primitives. On top of it, five brand-variant frontends - same product family, different visual identities, different feature gating, different checkout flows per market. Each consumer overrides design tokens, adds brand-specific routes, plugs in market-specific payment integrations.

A typical question: "If I change the contract on `useBaseCart` in core, what breaks across the five fronts, and where would I need to update tests."

Without brain: open `core`, find `useBaseCart`, read the signature. Grep for `useBaseCart` across all six repos. Open each hit. Read enough context to understand the call site. Decide if it breaks. Move to the next hit. Realistically: forty-five minutes of context-switching for a moderately complex change.

With brain: `brain_explain` on the node `useBaseCart` returns inbound edges - every consumer that depends on it, with the file and line of the call site, plus inferred contract usage. One tool call. The model has the full impact surface before it touches a single source file. The "what breaks" answer arrives in five minutes instead of forty-five. The remaining time is for actually writing the change and the test updates.

This is what the benchmark architecture-category numbers feel like in production. Fifty-three percent less wall time on the questions that previously felt like archaeology.

## The V0.5 tier - federation at scale

The public repo is the engine. The private version of the engine includes the multi-tenant scaffolding. The V0.5 tier is the engine plus a different kind of capability on top - the one that matters when the unit of analysis is no longer "this repo" but "this organization's design system and everything that consumes it."

Three things the V0.5 tier adds:

**Cross-repo deduplication.** Run a WCAG audit, or a design token audit, or a contract audit across ten consumer repos at once. Findings get clustered: a `aria-label` problem that appears in nine of ten repos is one finding with ten manifestations, not ten findings. The cluster reports the canonical fix once and the affected consumers as a list. This changes review economics by an order of magnitude when you are managing a design system as a product.

**Design system token federation.** Tokens defined in the core library, overridden in consumers, sometimes overridden again in feature flags. The federation layer tracks the canonical definition, every override edge, every consumer that bypasses the system entirely (DRY violation). `brain_ffcss` already shows you this for one group; V0.5 makes it queryable across an organization.

**Three audit modes.** `--scope component` audits a single component across all consumers. `--scope page` audits a routing-level slice (the checkout page, across five fronts, comparing implementations). `--scope full` does the org-wide pass. The same engine, different traversal strategies, different result shapes. The point is that you ask the question in the shape that matches the work, not in the shape the tool happens to support.

This is the version that justifies a commercial engagement. Multi-tenant production deployment, cost optimization across hundreds of audit runs, integration with whatever ticketing and design tool the org already runs. The shape is open-source educational; the operational layer is not.

## The three tiers

To make the offer surface crisp:

```mermaid
graph LR
    A[jarvis-brain-core<br/>Public AGPL-3.0] --> B[jarvis-brain<br/>Private full<br/>multi-tenant prod]
    B --> C[V0.5 Enterprise<br/>Cross-repo dedup<br/>Token federation<br/>3 audit modes]
```

- **Tier 1 - jarvis-brain-core (public, AGPL-3.0).** The engine. Clone it, run it, learn from it. Build your own on top if you want.
- **Tier 2 - jarvis-brain (private, commercial).** The production deployment. Multi-tenant, auth, webhook orchestration, cost tracking, alerting. Not open-source. Available as a commercial engagement.
- **Tier 3 - V0.5 Enterprise (commercial + design system focus).** Tier 2 plus cross-repo federation, design system token tracking at organization scale, three audit modes. The version that makes sense for design system orgs with five or more consumer fronts.

If you run a setup that matches the "when it pays off" list - especially if you have a shared design system or core library consumed by multiple fronts - the productive conversation is at the tier 2 and tier 3 level. The public repo is enough to evaluate whether the method is real. It is not enough to run in production.

DM is open. The brief that helps me most: how many repos, what they share, where exploration burn currently eats your time. If you have those three numbers I can tell you within a conversation whether brain is worth pursuing for your context, or whether your bottleneck is somewhere else entirely.

`sdet.it/services` for the longer version of the offer.

## Series wrap

Three days. One problem class. One engine. One benchmark. One enterprise tier.

What I tried to do across this series: show the actual decision process, not the polished outcome. Part 1 had the moment I almost shipped naive RAG and deleted it. Part 2 had the benchmark category where brain loses to Grep by twenty percent. Part 3 had the cases where you should not adopt this at all. The honest version of any architecture story includes the parts that did not work.

If you spent forty-five minutes on this series, here is what I hope sticks: graph-backed context is not a magic upgrade to your AI workflow. It is a specific solution to a specific class of problem - structural exploration on codebases big enough or distributed enough that the LLM cannot hold them in working memory. If you have that problem, the engine pays off measurably. If you do not, Grep is still the right answer.

## Next week

Series #03 lands next Tuesday: context-first QA. The premise: most AI-in-QA content says "let AI write your tests." I do the opposite - the AI never writes the tests, but it does almost everything else around them. Why that distinction matters, what it looks like in practice, and what the failure modes are.

#FromTheField - new series Tuesday morning.

---


## Kiedy się opłaca - a kiedy Grep już wystarcza

**URL:** https://portfolio.sdet.pl/from-the-field/jarvis-brain-part-3
**Published:** 2026-05-14
**Language:** pl
Tags: ai-tooling, mcp, context-engineering, from-the-field

Dzień 3 z trzech. Czy w ogóle adoptować jarvis-brain? Kiedy graf-backed context się opłaca, kiedy Grep wystarcza i co dodaje warstwa V0.5 enterprise. From the field #02 Part 3.

Część 1 obramowała problem. Część 2 wypchnęła silnik i benchmark. Część 3 to post, który warto przeczytać przed decyzją o adopcji - bo uczciwa odpowiedź na pytanie "czy mam wdrożyć jarvis-brain u siebie" brzmi "zależy od twojego codebase'u, i tutaj jest sposób, żeby to ocenić".

Plus warstwa V0.5 - jak ten sam silnik wygląda, gdy jednostką pracy przestaje być repo, a staje się design system org z wieloma konsumenckimi frontami.

## Kiedy się opłaca

Liczby z benchmarku wczoraj były uśrednione po pięćdziesięciu pytaniach na monorepo z pięcioma repo. To słowo - monorepo - robi tu prawdziwą robotę. Brain wygrywa największą marginesem na pytaniach o architekturę i cross-repo, gdzie graf niesie informację, którą natywne narzędzia muszą rekonstruować od zera za każdym razem.

Konkretnie, sygnały które mówią "ten silnik się opłaci na twoim codebase":

- **Trzy lub więcej repo ze współdzielonym kodem.** Core package konsumowany przez wiele aplikacji. Design system używany przez kilka frontów. Mikroserwisy importujące ze wspólnego toolkitu. Warstwa federacji to miejsce, gdzie brain zarabia na swoją obecność.
- **Dziesiątki tysięcy plików albo więcej.** Poniżej pięćdziesięciu plików Glob jest szybki, a model trzyma całość w pamięci roboczej. Przy pięćdziesięciu tysiącach model robi exploration burn przy każdym przekrojowym pytaniu.
- **Design system albo współdzielone prymitywy, które są nadpisywane per konsument.** Jeśli twoi konsumenci robią `:root { --color-brand: ... }` na tokenach zdefiniowanych w shared library, masz klasę problemu, dla której `brain_ffcss` został zbudowany specifically.
- **Wielu inżynierów zadaje pytania architektoniczne o ten sam codebase.** "Gdzie siedzą god-nody", "co od tego zależy", "co się zepsuje jak zmienię ten kontrakt". Te pytania zajmują pięćdziesiąt trzy procent mniej wall time z brain niż bez.
- **Długo żyjące projekty, gdzie exploration burn się kumuluje.** Jeśli twoje sesje AI konsekwentnie spędzają pierwsze pięć minut re-odkrywając layout projektu, to jest dokładnie ta rzecz, która zostaje wyeliminowana.

Brain nie eliminuje pracy LLM. Eliminuje tę część, gdzie LLM musi re-derywować strukturalne fakty z surowego źródła przy każdym pytaniu.

## Kiedy Grep już wystarcza

Uczciwy odpowiednik. Sygnały, które mówią "jeszcze tego nie potrzebujesz":

- **Pojedyncze repo poniżej pięćdziesięciu plików.** Glob i Grep są tu już optymalne. Benchmark wyłapał to wprost - pytania code discovery działały dwadzieścia procent wolniej z brain niż bez.
- **Solo developer pracujący na projekcie, który sam napisał.** Masz strukturę w głowie. LLM pomaga z wykonaniem, nie nawigacją. Marginalna wartość pre-komputowanych grafów jest mała.
- **Throwaway scripts, prototypy, exploratory notebooks.** Wszystko, co możesz skasować za dwa tygodnie. Koszt budowy i utrzymania grafu przekracza koszt exploration burn.
- **Codebases zmieniające kształt co tydzień.** Graf staleness staje się obciążeniem utrzymaniowym sam w sobie. Trzymaj się czytania źródła przy każdym zapytaniu.
- **Brak cross-repo problemu.** Jeśli twoja praca jest ograniczona do jednego repo i nie potrzebujesz nic śledzić przez granice, warstwa federacji dodaje komplikacji bez opłacalności.

Pułapka do uniknięcia: adopcja narzędzia, bo robi wrażenie w benchmarku, a potem płacenie maintenance kosztów za coś, czego nie potrzebowałeś. Inżynierska ocena tutaj wygląda jak "gdzie faktycznie idzie mój czas, gdy pracuję z AI" - jeśli odpowiedź to głównie "pisanie kodu", a nie "tłumaczenie codebase'u modelowi", to prawdopodobnie nie potrzebujesz graf-backed context. Jeśli jest odwrotnie - prawdopodobnie potrzebujesz.

## Daily use case - multi-brand commerce setup

Żeby to było konkretne, oto kształt pracy, w której brain zarabia codziennie.

Setup: jeden współdzielony `core` package - komponenty, composables, typy, design tokeny, prymitywy routingu. Na tym - pięć brand-variant frontendów. Ta sama rodzina produktów, różne tożsamości wizualne, różne gating featur, różne flow checkout per market. Każdy konsument nadpisuje design tokens, dodaje brand-specific routes, podłącza market-specific integracje płatnicze.

Typowe pytanie: "Jeśli zmienię kontrakt na `useBaseCart` w core, co się zepsuje w tych pięciu frontach i gdzie trzeba zaktualizować testy?"

Bez brain: otwierasz `core`, znajdujesz `useBaseCart`, czytasz signature. Grepujesz `useBaseCart` przez wszystkie sześć repo. Otwierasz każdy hit. Czytasz wystarczająco kontekstu, żeby zrozumieć call site. Decydujesz, czy się zepsuje. Przechodzisz do następnego hita. Realistycznie: czterdzieści pięć minut context-switchingu dla średnio złożonej zmiany.

Z brain: `brain_explain` na węźle `useBaseCart` zwraca krawędzie inbound - każdy konsument, który od tego zależy, z plikiem i linią call site, plus wnioskowany kontrakt użycia. Jedno wywołanie narzędzia. Model ma cały impact surface zanim dotknie pojedynczego pliku źródłowego. Odpowiedź "co się zepsuje" przychodzi w pięć minut zamiast czterdziestu pięciu. Pozostały czas idzie na faktyczne pisanie zmiany i aktualizację testów.

Tak właśnie odczuwa się w produkcji liczba z kategorii architecture benchmarka. Pięćdziesiąt trzy procent mniej wall time na pytaniach, które wcześniej odczuwało się jak archeologia.

## Warstwa V0.5 - federacja w skali

Publiczne repo to silnik. Prywatna wersja silnika zawiera multi-tenantowe rusztowanie. Warstwa V0.5 to silnik plus inny rodzaj capability na wierzchu - ten, który ma znaczenie, gdy jednostką analizy nie jest już "to repo", tylko "design system tej organizacji i wszystko, co go konsumuje".

Trzy rzeczy, które dodaje V0.5:

**Deduplikacja cross-repo.** Uruchom audyt WCAG, design tokenów albo kontraktów przez dziesięć konsumenckich repo naraz. Findingi są klastrowane: problem z `aria-label`, który pojawia się w dziewięciu z dziesięciu repo, to jeden finding z dziesięcioma manifestacjami, nie dziesięć findingów. Klaster raportuje kanoniczny fix raz, a affected consumers jako listę. To zmienia ekonomię review o rząd wielkości, gdy zarządzasz design systemem jak produktem.

**Federacja design tokens.** Tokeny zdefiniowane w core library, nadpisane w konsumentach, czasem nadpisane znowu w feature flagach. Warstwa federacji śledzi kanoniczną definicję, każdą krawędź override, każdego konsumenta, który całkowicie obchodzi system (naruszenie DRY). `brain_ffcss` już ci to pokazuje dla jednej grupy; V0.5 robi to query'owalne w skali organizacji.

**Trzy tryby audytu.** `--scope component` audytuje pojedynczy komponent przez wszystkich konsumentów. `--scope page` audytuje slice routingowy (stronę checkout, przez pięć frontów, porównując implementacje). `--scope full` robi org-wide pass. Ten sam silnik, różne strategie traversalu, różne kształty wyników. Sens jest taki, że pytasz w kształcie, który pasuje do pracy, a nie w kształcie, który narzędzie akurat obsługuje.

To jest wersja, która uzasadnia commercial engagement. Multi-tenantowy production deployment, optymalizacja kosztów przy setkach audytów, integracja z whatever ticketing i design tool org już używa. Kształt jest open-source educational; warstwa operacyjna nie.

## Trzy poziomy

Żeby surface oferty był ostry:

```mermaid
graph LR
    A[jarvis-brain-core<br/>Public AGPL-3.0] --> B[jarvis-brain<br/>Private full<br/>multi-tenant prod]
    B --> C[V0.5 Enterprise<br/>Cross-repo dedup<br/>Token federation<br/>3 audit modes]
```

- **Tier 1 - jarvis-brain-core (public, AGPL-3.0).** Silnik. Klonuj, odpalaj, ucz się. Buduj swoje na tym, jeśli chcesz.
- **Tier 2 - jarvis-brain (private, commercial).** Production deployment. Multi-tenant, auth, webhook orchestration, cost tracking, alerting. Nie open-source. Dostępny jako commercial engagement.
- **Tier 3 - V0.5 Enterprise (commercial + design system focus).** Tier 2 plus cross-repo federation, design system token tracking w skali organizacji, trzy tryby audytu. Wersja, która ma sens dla design system orgs z pięcioma plus konsumenckimi frontami.

Jeśli prowadzisz setup pasujący do listy "kiedy się opłaca" - szczególnie jeśli masz współdzielony design system albo core library konsumowany przez wiele frontów - produktywna rozmowa jest na poziomie tier 2 i tier 3. Publiczne repo wystarcza do oceny, czy metoda jest prawdziwa. Nie wystarcza do uruchomienia produkcyjnego.

DM jest otwarte. Brief, który pomaga mi najbardziej: ile repo, co dzielą, gdzie exploration burn obecnie zżera ci czas. Jeśli masz te trzy liczby, mogę w rozmowie powiedzieć, czy brain jest wart pościgu dla twojego kontekstu, czy twoje wąskie gardło jest gdzieś indziej.

`sdet.it/services` dla dłuższej wersji oferty.

## Wrap serii

Trzy dni. Jedna klasa problemu. Jeden silnik. Jeden benchmark. Jedna warstwa enterprise.

Co próbowałem zrobić w tej serii: pokazać faktyczny proces decyzyjny, nie polerowany wynik. Część 1 miała moment, w którym prawie shipnąłem naive RAG i skasowałem. Część 2 miała kategorię benchmarka, gdzie brain przegrywa z Grep o dwadzieścia procent. Część 3 miała przypadki, w których w ogóle nie powinieneś tego adoptować. Uczciwa wersja każdej historii architektonicznej zawiera części, które nie zadziałały.

Jeśli spędziłeś czterdzieści pięć minut na tej serii, oto co mam nadzieję, że zostaje: graf-backed context nie jest magicznym upgrade'em twojego AI workflow. To konkretne rozwiązanie konkretnej klasy problemu - strukturalna eksploracja na codebase'ach na tyle dużych albo rozproszonych, że LLM nie utrzyma ich w pamięci roboczej. Jeśli masz ten problem, silnik się opłaca mierzalnie. Jeśli nie masz - Grep nadal jest właściwą odpowiedzią.

## Co za tydzień

Series #03 ląduje we wtorek: context-first QA. Założenie: większość contentu AI-in-QA mówi "niech AI pisze twoje testy". Ja robię odwrotnie - AI nigdy nie pisze testów, ale robi prawie wszystko inne wokół nich. Dlaczego ta dystynkcja ma znaczenie, jak wygląda w praktyce i jakie są failure modes.

#FromTheField - nowa seria wtorek rano.

---


## Architecture of the Indexing Engine

**URL:** https://portfolio.sdet.it/from-the-field/jarvis-brain-part-2
**Published:** 2026-05-13
**Language:** en
Tags: ai-tooling, mcp, context-engineering, from-the-field

Day 2 of three. Inside jarvis-brain: code-to-graph extraction, the FTS5 camelCase trick, 5 MCP tools, and 50 questions worth of benchmark numbers. From the field #02 Part 2.

Yesterday's post framed the problem: Claude Code on a 5-repo monorepo burns thousands of fresh input tokens per cross-repo question, because Glob and Grep have no memory of code structure. Today's post is the engine that fixes it - how the graph gets built, how it gets queried, and where it does and does not beat the native primitives.

The public repo dropped this morning under AGPL-3.0. It is an educational destylat, not a production deployment kit. The shape is in there; the multi-tenant scaffolding is not.

## Two paths into the graph

Code has to become a graph before Claude Code can query it. There are two ways to make that happen, and jarvis-brain supports both.

```mermaid
graph TD
    A[Source repos] --> B[Path A: LLM extraction]
    A --> C[Path B: CC-local bootstrap]
    B -->|Qwen local / Gemini fallback| D[Per-repo graph.json]
    C -->|/brain-extract skill, zero LLM cost| D
    D --> E[Federation: merge + cross-repo edges + design tokens]
    E --> F[Group master graph]
    F --> G[FTS5 index + JSON traversal]
    G --> H[5 MCP tools served to Claude Code]
```

**Path A is the LLM pipeline.** Triggered by a webhook on push, or manually via the admin API. The extractor reads source files and asks a language model to identify nodes (functions, components, composables, types) and relationships (imports, calls, overrides, parent-child layer chains). For day-to-day work it runs against a local Qwen instance reachable through a reverse SSH tunnel - free, fast, good enough for incremental updates under twenty changed files. For deep re-extractions or when Qwen is offline, it falls back to Gemini Flash, with Pro available for the harder runs. Cost is metered and tracked. The output is one `graph.json` per repo.

**Path B is CC-local bootstrap.** Sometimes you want a graph for a repo you have not seeded yet, and you do not want to pay Gemini tokens to do the first pass. So you open Claude Code in the target repo, run the `/brain-extract` skill, and let Claude Code's native analysis produce the same `graph.json` schema. You push the result to the graphs repo, the server picks it up, and the graph is live. Zero LLM API cost - Claude Code's subscription does the work.

Both paths produce identical graph schemas. Both land in the same federation pipeline. The server does not care which one wrote the file. This matters because the audit-precision path (Path B) and the day-to-day-merges path (Path A) have very different cost profiles, and you want them to coexist on the same engine.

Federation is the next stage. Per-repo graphs get merged into a per-group master graph. Cross-repo imports are detected (the core primitives consumed by each front). Design system tokens (CSS custom properties, SCSS variables) are tracked across consumers - canonical definitions flagged, DRY violations flagged, override chains recorded. The result is a single master graph per group that knows the full ecosystem.

## The camelCase FTS5 trick

The query layer is SQLite FTS5. This is mostly boring infrastructure, except for one decision that determines whether the whole system feels good or feels broken: how do you handle identifier names like `useBaseCart`?

FTS5's default `unicode61` tokenizer splits on whitespace and punctuation. `useBaseCart` is one token. A user searching for "Base" returns nothing - the substring is not a token boundary.

You can write a custom tokenizer. That is the textbook answer. It is also a maintenance burden, an upgrade-path landmine, and adds a C dependency. I did not want it.

The trick is preprocessing at index time, not tokenization at query time. Every identifier emits two values into the index: the original (`useBaseCart`) plus a space-split version (`use Base Cart`). FTS5's default tokenizer indexes both. A user searching for "Base" hits the space-split version. A user searching for the exact name hits the original. Same column, same query, same tokenizer. One extra line of Python in the indexer.

It is less elegant than a custom tokenizer. It also takes ten minutes to implement, has no upgrade risk, and survives every SQLite version. The decision is recorded in the architecture notes as "FTS5 camelCase = preprocessing at index time, not custom tokenizer". The same pattern works for kebab-case, snake_case, or any compound identifier convention.

## The 5 MCP tools

Once the graph is built and indexed, it gets served to Claude Code through five MCP tools. They are not a web search interface. They are MCP-native primitives that Claude Code sees in the same tool list as `Glob` and `Read`, and picks based on the shape of the question.

| Tool | What it does |
|---|---|
| `brain_query` | Free-text search with FTS5, returns ranked hits plus two-hop neighbors and cross-repo hints |
| `brain_graph` | Returns the raw `graph.json` for a repo or the group master, for traversal in code |
| `brain_path` | Shortest path between two nodes - "how does this core primitive reach this UI feature" |
| `brain_explain` | Node detail plus inbound/outbound neighbors plus git-blame provenance, zero LLM cost |
| `brain_ffcss` | Design system tokens: list them, count usage per repo, surface DRY violations |

The point of MCP-native is that Claude Code does not need a system prompt update to use them. They are tools in a list. The model decides. Adding a sixth tool tomorrow is one API contract, not a re-engineering of how questions get routed.

`brain_explain` is the underrated one. Zero LLM cost - it is a pure graph lookup with git-blame metadata attached. For "who wrote this and what depends on it" questions it replaces a `Read` + `Grep` + `git log` sequence with a single tool call.

## The benchmark - 50 questions on a 5-repo monorepo

Two runs, fifty questions each, same code, same model. One run with Claude Code native (`Glob`, `Grep`, `Read` only). One run with jarvis-brain MCP enabled on top. Categories: code discovery, usage tracing, cross-repo, dependency path, architecture - ten questions each.

Headline numbers:

| Metric | Baseline (CC native) | With jarvis-brain | Delta |
|---|---:|---:|---:|
| Total wall time | 36m 44s | 26m 03s | **-29.1%** |
| Fresh input tokens | 4,145 | 2,003 | **-51.7%** |
| Total dollar cost | $12.21 | $12.30 | -0.7% |
| Tool calls (avg) | 4.38 | 4.58 | +5% |
| Errors | 0 | 1 | - |

The time savings is ten and a half minutes across the full run. The fresh-tokens savings is the model reading half as much source material the first time. The dollar number is flat because Anthropic's prompt cache absorbs almost all the input-token differential - the cached reads cost a fraction of fresh reads, and the cache is identical between the two runs at the system-prompt level.

By category, the breakdown is sharper:

| Category | Baseline mean | Brain mean | Delta |
|---|---:|---:|---:|
| Architecture | 105.6s | 49.9s | **-53%** |
| Cross-repo | 50.5s | 39.3s | -22% |
| Usage tracing | 19.6s | 18.0s | -8% |
| Dependency path | 26.7s | 27.6s | +3% |
| Code discovery | 17.9s | 21.5s | **+20%** |

Architecture questions are where the graph carries the most signal: "what are the god-nodes in this codebase", "where do cross-repo overrides cluster", "which layer has the densest internal edges". Cross-repo questions next, because the federation pre-computes what Grep would otherwise have to derive from scratch.

## Where brain does not win

Code discovery loses. Twenty percent worse, on average. The question shape is "find file X" or "find files starting with Y" - exactly what `Glob` is built for. Going through `brain_query` adds a hop without changing the answer. The model still ends up at the same `.vue` file; it just took an extra tool call to get there.

Dependency path is essentially a tie. The graph has the data but the model is just as happy to chase imports through `Grep` and `Read` on simple chains. Brain wins when the chain is long or crosses repo boundaries; otherwise the native approach is equivalent.

One error in fifty - a question about circular dependency detection where the graph traversal hit an edge case and returned no answer at all. Baseline got it right with `Grep`. The honest number is 49/50 correct, not 50/50. The fix is queued; not shipped yet.

Cost is the one I expected to win on and did not. Anthropic's cache is aggressive enough that the savings on fresh input tokens evaporate at the bill level. If you are paying API costs the way you pay them today, brain does not cut your bill. What it cuts is wall time and exploration burn - the things that affect how fast you ship, not how much your provider charges.

## What ships in the public repo today

`github.com/darco81/jarvis-brain-core`, AGPL-3.0. The shape of the engine:

- `brain/extractors/` - how source becomes structured node/edge JSON
- `brain/federation/` - merging per-repo graphs, detecting cross-repo edges, design token federation
- `brain/llm/prompts.py` - the extraction prompts
- `brain/api/mcp.py` plus `mcp_tools.py` - the five MCP tools and their schemas
- `brain/api/query.py` plus `query_path.py` - FTS5 with the camelCase preprocessing trick
- `brain/viz/` - the graph visualization adapter
- `benchmark/` - methodology, sample questions, runner

What is not in there is the production scaffolding: auth, webhook handlers, admin UI, cost tracking, alerting, the worker queue, deployment configs, the multi-tenant config schema. You can clone this and read the method. You cannot clone this and run a multi-tenant production deployment of it. That is the line, and it is the line on purpose.

Live demo still at `brain.sdet.it`. Benchmark report rendered as a static page at `brain.sdet.it/benchmark/` - all fifty questions, all categories, both runs side by side.

## Setup for tomorrow

Tomorrow is Part 3, and it is the post for the question you should actually ask before adopting this: when does the engine pay off, and when is `Grep` already enough.

Plus the V0.5 tier - what the same engine looks like when you stop pretending you have one repo and start treating a design system org with ten consumer fronts as the unit of work. Cross-repo dedup, token federation, atomic patches across consumers. Different problem class, same architecture underneath.

#FromTheField - day 3 lands tomorrow morning.

---


## Architektura silnika indeksującego

**URL:** https://portfolio.sdet.pl/from-the-field/jarvis-brain-part-2
**Published:** 2026-05-13
**Language:** pl
Tags: ai-tooling, mcp, context-engineering, from-the-field

Dzień 2 z trzech. Bebechy jarvis-brain: ekstrakcja kodu do grafu, trik FTS5 z camelCase, 5 narzędzi MCP i 50 pytań benchmarku. From the field #02 Part 2.

Wczorajszy post obramował problem: Claude Code na monorepo z pięcioma repozytoriami spala tysiące świeżych input tokens przy każdym pytaniu cross-repo, bo Glob i Grep nie mają pamięci struktury kodu. Dzisiejszy post to silnik, który to naprawia - jak graf jest budowany, jak jest odpytywany i gdzie wygrywa, a gdzie nie wygrywa z natywnymi prymitywami.

Publiczne repo wpadło dziś rano na AGPL-3.0. To destylat edukacyjny, nie kit do production deployment. Kształt jest w środku; multi-tenantowe rusztowanie - nie.

## Dwie ścieżki do grafu

Kod musi stać się grafem, zanim Claude Code będzie mógł go odpytywać. Są dwa sposoby, żeby tak się stało, i jarvis-brain wspiera oba.

```mermaid
graph TD
    A[Źródłowe repo] --> B[Ścieżka A: ekstrakcja LLM]
    A --> C[Ścieżka B: bootstrap CC-local]
    B -->|Qwen local / Gemini fallback| D[graph.json per repo]
    C -->|skill /brain-extract, zero LLM cost| D
    D --> E[Federacja: merge + krawędzie cross-repo + design tokens]
    E --> F[Master graf grupy]
    F --> G[FTS5 + przeszukiwanie JSON]
    G --> H[5 narzędzi MCP serwowanych do Claude Code]
```

**Ścieżka A to pipeline LLM.** Wyzwalana webhookiem na push albo manualnie przez admin API. Ekstraktor czyta źródła i prosi model językowy o zidentyfikowanie węzłów (funkcje, komponenty, composables, typy) oraz relacji (importy, calle, override'y, łańcuchy parent-child między warstwami). Do codziennej roboty leci na lokalnym Qwenie po reverse SSH tunnelu - za darmo, szybko, wystarczająco dobrze przy inkrementalnych aktualizacjach poniżej dwudziestu zmienionych plików. Do głębokich re-ekstrakcji albo gdy Qwen jest offline - fallback na Gemini Flash, z Pro dostępnym dla cięższych przebiegów. Koszty są mierzone i logowane. Output to jeden `graph.json` per repo.

**Ścieżka B to bootstrap CC-local.** Czasem chcesz graf dla repo, którego jeszcze nie zasiałeś, i nie chcesz płacić Gemini tokenami za pierwszy przebieg. Otwierasz więc Claude Code w docelowym repo, uruchamiasz skill `/brain-extract` i pozwalasz natywnej analizie Claude Code wygenerować ten sam schemat `graph.json`. Pushujesz wynik do graphs repo, serwer go odbiera, graf jest live. Zero kosztów API - subskrypcja Claude Code robi robotę.

Obie ścieżki produkują identyczny schemat grafu. Obie lądują w tym samym pipelinie federacyjnym. Serwer nie obchodzi, która go zapisała. To istotne, bo ścieżka precyzji audytowej (B) i ścieżka codziennych merge'y (A) mają zupełnie różne profile kosztowe i chcesz, żeby koegzystowały na jednym silniku.

Federacja to kolejny etap. Grafy per-repo są łączone w master graf per-grupa. Wykrywane są importy cross-repo (core primitives konsumowane przez każdy front). Design tokeny (CSS custom properties, zmienne SCSS) są trackowane między konsumentami - kanoniczne definicje flagowane, naruszenia DRY flagowane, łańcuchy override'ów rejestrowane. Wynik to jeden master graf per grupa, który zna cały ekosystem.

## Trik FTS5 z camelCase

Warstwa query to SQLite FTS5. Z grubsza nudna infrastruktura, oprócz jednej decyzji, która decyduje, czy cały system odczuwa się dobrze czy odczuwa się jak zepsuty: jak obsłużyć nazwy identyfikatorów typu `useBaseCart`?

Domyślny tokenizer FTS5 (`unicode61`) dzieli na białych znakach i znakach interpunkcyjnych. `useBaseCart` to jeden token. Użytkownik szukający "Base" dostaje nic - substring nie jest granicą tokena.

Możesz napisać własny tokenizer. To podręcznikowa odpowiedź. To również obciążenie utrzymaniowe, mina na ścieżce upgrade'u i dodatkowa zależność w C. Nie chciałem tego.

Trik to preprocessing w czasie indeksowania, nie tokenizacja w czasie zapytania. Każdy identyfikator emituje dwie wartości do indeksu: oryginał (`useBaseCart`) plus wersję rozdzieloną spacjami (`use Base Cart`). Domyślny tokenizer FTS5 indeksuje oba. Użytkownik szukający "Base" trafia w wersję rozdzieloną. Użytkownik szukający dokładnej nazwy trafia w oryginał. Ta sama kolumna, to samo zapytanie, ten sam tokenizer. Jedna dodatkowa linia Pythona w indekserze.

Mniej eleganckie niż własny tokenizer. Też wdrożenie w dziesięć minut, zero ryzyka przy upgrade'ach SQLite, działa na każdej wersji. Decyzja zapisana w notatkach architektury jako "FTS5 camelCase = preprocessing w czasie indeksowania, nie własny tokenizer". Ten sam wzorzec działa dla kebab-case, snake_case albo dowolnej konwencji złożonych identyfikatorów.

## Pięć narzędzi MCP

Gdy graf jest zbudowany i zaindeksowany, serwowany jest do Claude Code przez pięć narzędzi MCP. To nie jest interfejs web search. To natywne prymitywy MCP, które Claude Code widzi w tej samej liście narzędzi co `Glob` i `Read`, i wybiera na podstawie kształtu pytania.

| Narzędzie | Co robi |
|---|---|
| `brain_query` | Wyszukiwanie freetext z FTS5, zwraca ranked hits plus sąsiadów 2-hop plus podpowiedzi cross-repo |
| `brain_graph` | Zwraca surowy `graph.json` dla repo albo master grupy, do przeszukiwania w kodzie |
| `brain_path` | Najkrótsza ścieżka między dwoma węzłami - "jak ten core primitive dochodzi do tej feature w UI" |
| `brain_explain` | Detal węzła plus inbound/outbound sąsiedzi plus git-blame, zero LLM cost |
| `brain_ffcss` | Design tokeny: lista, count usage per repo, surfowanie naruszeń DRY |

Sens MCP-native jest taki, że Claude Code nie potrzebuje aktualizacji system prompta, żeby ich użyć. To narzędzia na liście. Model decyduje. Dodanie szóstego narzędzia jutro to jeden kontrakt API, nie re-engineering tego, jak pytania są routowane.

`brain_explain` to niedoceniany jeden. Zero LLM cost - to czysty lookup grafu z metadanymi git-blame doczepionymi. Dla pytań "kto to napisał i co od tego zależy" zastępuje sekwencję `Read` + `Grep` + `git log` jednym wywołaniem narzędzia.

## Benchmark - 50 pytań na monorepo z pięcioma repo

Dwa przebiegi, pięćdziesiąt pytań każdy, ten sam kod, ten sam model. Jeden przebieg z Claude Code natywnym (`Glob`, `Grep`, `Read` only). Drugi z jarvis-brain MCP włączonym na wierzchu. Kategorie: code discovery, usage tracing, cross-repo, dependency path, architecture - po dziesięć pytań na kategorię.

Liczby nagłówkowe:

| Metryka | Baseline (CC natywny) | Z jarvis-brain | Delta |
|---|---:|---:|---:|
| Total wall time | 36m 44s | 26m 03s | **-29,1%** |
| Świeże input tokens | 4 145 | 2 003 | **-51,7%** |
| Total koszt $ | $12,21 | $12,30 | -0,7% |
| Wywołania narzędzi (avg) | 4,38 | 4,58 | +5% |
| Błędy | 0 | 1 | - |

Oszczędność czasu to dziesięć i pół minuty na całym przebiegu. Oszczędność świeżych tokens to model czytający o połowę mniej źródła za pierwszym razem. Liczba dolarowa jest płaska, bo prompt cache Anthropic absorbuje prawie cały różnicowy input - cached reads kosztują ułamek fresh reads, a cache jest identyczny między oboma przebiegami na poziomie system prompta.

Per kategoria breakdown jest ostrzejszy:

| Kategoria | Mean baseline | Mean brain | Delta |
|---|---:|---:|---:|
| Architecture | 105,6s | 49,9s | **-53%** |
| Cross-repo | 50,5s | 39,3s | -22% |
| Usage tracing | 19,6s | 18,0s | -8% |
| Dependency path | 26,7s | 27,6s | +3% |
| Code discovery | 17,9s | 21,5s | **+20%** |

Pytania o architekturę to miejsce, gdzie graf niesie najwięcej sygnału: "jakie są god-nody w tym kodzie", "gdzie skupiają się override'y cross-repo", "która warstwa ma najgęstsze krawędzie wewnętrzne". Pytania cross-repo następne, bo federacja prekomputuje to, co Grep musiałby derywować od zera.

## Gdzie brain nie wygrywa

Code discovery przegrywa. Dwadzieścia procent gorzej, średnio. Kształt pytania to "znajdź plik X" albo "znajdź pliki zaczynające się od Y" - dokładnie to, do czego zbudowany jest `Glob`. Przejście przez `brain_query` dodaje hop bez zmiany odpowiedzi. Model i tak ląduje na tym samym pliku `.vue`; tylko jedno wywołanie narzędzia więcej zajęło.

Dependency path to w zasadzie remis. Graf ma dane, ale model jest równie zadowolony ścigając importy przez `Grep` i `Read` na prostych łańcuchach. Brain wygrywa, gdy łańcuch jest długi albo przechodzi granice repo; w przeciwnym razie podejście natywne jest równoważne.

Jeden błąd na pięćdziesiąt - pytanie o wykrycie cyklicznej zależności, gdzie traversal grafu trafił na edge case i nie zwrócił nic. Baseline trafił z `Grep`. Uczciwa liczba to 49/50 poprawnych, nie 50/50. Fix jest w kolejce; jeszcze nie shipped.

Koszt to ten, na którym spodziewałem się wygrać i nie wygrałem. Cache Anthropic jest na tyle agresywny, że oszczędność na świeżych input tokens paruje na poziomie rachunku. Jeśli płacisz za API tak, jak płacisz dziś, brain nie obetnie ci rachunku. Co obetnie - to wall time i exploration burn - rzeczy, które wpływają na to, jak szybko shipujesz, nie ile bierze od ciebie provider.

## Co wpada do publicznego repo dziś

`github.com/darco81/jarvis-brain-core`, AGPL-3.0. Kształt silnika:

- `brain/extractors/` - jak źródło staje się ustrukturyzowanym JSON-em node/edge
- `brain/federation/` - merge grafów per-repo, detekcja krawędzi cross-repo, federacja design tokens
- `brain/llm/prompts.py` - prompty ekstrakcyjne
- `brain/api/mcp.py` plus `mcp_tools.py` - pięć narzędzi MCP i ich schematy
- `brain/api/query.py` plus `query_path.py` - FTS5 z trikiem preprocessing camelCase
- `brain/viz/` - adapter wizualizacji grafu
- `benchmark/` - metodologia, przykładowe pytania, runner

Czego tam nie ma: production scaffolding - auth, handlery webhooków, admin UI, cost tracking, alerting, worker queue, deployment configs, schemat config multi-tenant. Możesz to sklonować i przeczytać metodę. Nie możesz tego sklonować i odpalić multi-tenantowego production deployment. To jest granica i jest tam celowo.

Live demo nadal na `brain.sdet.it`. Raport benchmarku jako static page na `brain.sdet.it/benchmark/` - wszystkie pięćdziesiąt pytań, wszystkie kategorie, oba przebiegi obok siebie.

## Setup na jutro

Jutro Part 3 - post na pytanie, które powinieneś faktycznie zadać przed adopcją tego: kiedy silnik się opłaca, a kiedy `Grep` już wystarcza.

Plus warstwa V0.5 - jak ten sam silnik wygląda, gdy przestajesz udawać, że masz jedno repo, i zaczynasz traktować design system org z dziesięcioma konsumenckimi frontami jako jednostkę pracy. Cross-repo dedup, federacja tokens, atomic patches przez wszystkich konsumentów. Inna klasa problemu, ta sama architektura pod spodem.

#FromTheField - dzień 3 ląduje jutro rano.

---


## Stop CC From Burning Tokens on Grep/Glob

**URL:** https://portfolio.sdet.it/from-the-field/jarvis-brain-part-1
**Published:** 2026-05-12
**Language:** en
Tags: ai-tooling, mcp, context-engineering, from-the-field

Day 1 of three. Why Claude Code burns tokens on Grep/Glob in a 5-repo monorepo, and the architectural pivot that became jarvis-brain. From the field #02 Part 1.

I have a 5-repo monorepo. Shared core, four brand-variant fronts, one admin module. Every day I ask Claude Code questions like "where is `useBaseCart` consumed across the platform" or "what overrides the cart button in BrandA versus BrandB". And every day Claude Code does the same thing.

It runs `Glob` to find files. Then it runs `Grep` to find references. Then it reads the top three results. Then it asks itself if that was enough. Then it runs `Glob` again. Then `Grep` again.

Fourteen tool calls. Forty seconds. A few thousand fresh input tokens, every time.

This is fine for one question. It is not fine when it is the third cross-repo question of the day and you are watching tokens burn on exploration that the model could have learned once and reused forever. Today's post is about how I got tired of that, what I tried first, what I deleted, and why jarvis-brain looks the way it does.

## What "burning tokens" actually looks like

I ran a benchmark. Fifty questions across five categories - code discovery, usage tracing, cross-repo, dependency path, architecture. Ten questions each. Two runs: Claude Code native (only Glob, Grep, Read) versus Claude Code plus jarvis-brain as an MCP tool.

The headline I care about today is the baseline. Without brain, on the same fifty questions:

- **4,145 fresh input tokens** spent on exploration
- **14 tool calls** on the hardest cross-repo questions
- **44 seconds average** for architecture-deep questions, **189 seconds** on the worst single one
- Total wall time across all fifty: **36 minutes**

The cache helps. Anthropic's prompt cache is aggressive, and dollar cost stays low because of it. But the **fresh** input tokens - the ones the model has to read for the first time on every call - those scale with how often Claude Code re-explores the same code. And it re-explores constantly, because tool results do not become permanent memory.

This is not a Claude Code bug. Glob and Grep are the right primitives when you have no other map of the codebase. They are universal. They work on every repo with zero setup. The cost is paying tokens for exploration every time.

## The use case nobody talks about

The pain scales with what kind of code you have. My pain is what I will call, with the names changed, a multi-brand commerce platform.

One core - the shared engine. Composables, base components, business logic, around eighty percent of the actual code. Four brand-variant fronts on top - same engine, different design system per brand, different content, occasional component overrides where a brand needs something the core does not give it. Plus an admin module that consumes a couple of those brands.

This is a V-commerce pattern. White-label e-commerce platforms work this way. Multi-tenant SaaS frontends work this way. Test suites for consistent applications work this way - one test harness, N variants of the same flow. Agencies that fork a core for each client work this way.

If you have ever asked Claude Code "is there a local override of `AddToCart.vue` in BrandA" you know the shape of the problem. It is not "find the file". It is "find the file, check three other repos for variants, check which one wins by Nuxt layer priority, check who calls it, check if there are brand-specific composables in the way". This is not what Glob is for. Glob will find you ten `AddToCart.vue` files across five repos and leave you to figure out which one matters.

## What I tried first, and deleted

The obvious move was a naive RAG. Embed the codebase with Voyage or OpenAI, dump it into a vector store, give Claude Code a search tool. People do this. There are starter repos for it.

I built a prototype. I deleted it.

Two problems. First, embeddings of source code are bad at code structure. They are good at "find me a function that does X conceptually". They are bad at "find me every consumer of `useBaseCart` in `forge-core`". The first question is semantic. The second one is structural. A vector store does not know that `useBaseCart` is a name, not a phrase.

Second, naive RAG is another tool to learn. Claude Code already has `Glob`, `Grep`, `Read`. They are native, predictable, and cheap to call. A custom search tool sits next to them and requires prompt engineering to use well. Every new tool is friction. The tool that fixes "Claude Code burns tokens on Glob" should not be "here is a tool Claude Code has to remember to use instead of Glob".

I considered a smarter Grep wrapper. Same problem in a different wrapper - still fourteen tool calls, still token burn, just with marginally better filtering.

What I needed was not a better retrieval tool. It was a different access path. The structure of the codebase should be pre-computed once, stored, and served to Claude Code as a native MCP tool that feels like Glob but answers like a senior engineer who has read the code.

That is the pivot. Not "build a search engine". Build the map, then expose it through the protocol Claude Code already speaks.

## The architecture in one paragraph

jarvis-brain extracts a graph from your code. Nodes are functions, components, composables, types, files. Edges are imports, calls, overrides, parent-child layer relationships. Built once per repo, then merged across repos into a federated master graph that knows which brand front overrides which core component, and which design system token gets used where. The graph lives in a SQLite FTS5 index for full-text queries, plus a JSON structure for traversal. It is served to Claude Code through five MCP tools that look and feel like built-in primitives - `brain_query`, `brain_graph`, `brain_path`, `brain_explain`, `brain_ffcss`.

That last detail is the point. They are not yet another search interface. They are MCP-native. Claude Code sees them in the same tool list as `Glob` and decides when to use them based on what the question needs.

Tomorrow's post is how the indexing actually works, the FTS5 trick that makes `useBaseCart` queryable by typing "Base", and the benchmark breakdown across all five categories - including the categories where brain does **not** beat Grep.

## What you can poke at right now

`brain.sdet.it` is the live demo. Brain Website on the front, jarvis-brain backend behind it. Log in to a public demo group, browse the graph, query it, look at the FFCSS token federation. The interesting part is `brain.sdet.it/benchmark/` - the full numbers from the fifty-question benchmark, rendered as a static report.

Tomorrow the public repo drops. Educational destylat under AGPL-3.0 - extractors, federation, prompts, MCP tools, FTS5 query layer, benchmark methodology. Enough to learn the pattern, not enough to run a production multi-tenant deployment. That is the line, and Part 3 is where I explain why the line is there.

For today, one question worth sitting with: how many fresh input tokens did your Claude Code session burn yesterday on exploration that did not need to happen?

#FromTheField - series continues tomorrow.

---


## Zatrzymaj CC zanim spali wszystkie tokeny na Grep/Glob

**URL:** https://portfolio.sdet.pl/from-the-field/jarvis-brain-part-1
**Published:** 2026-05-12
**Language:** pl
Tags: ai-tooling, mcp, context-engineering, from-the-field

Dzień 1 trzydniowej serii. Czemu Claude Code pali tokeny na Grep/Glob w 5-repo monorepo i jaki pivot architektoniczny stał się jarvis-brain. From the field #02 Część 1.

Mam 5-repo monorepo. Shared core, cztery brand-variant fronty, jeden moduł admina. Codziennie zadaję Claude Code pytania w stylu "gdzie jest konsumowany `useBaseCart` przez całą platformę" albo "co nadpisuje cart button w BrandA w stosunku do BrandB". I codziennie Claude Code robi to samo.

Puszcza `Glob` żeby znaleźć pliki. Potem `Grep` żeby znaleźć referencje. Potem czyta top trzy wyniki. Potem zastanawia się, czy to wystarczyło. Potem znowu `Glob`. Potem znowu `Grep`.

Czternaście tool calli. Czterdzieści sekund. Kilka tysięcy fresh input tokenów. Za każdym razem.

Jest OK przy jednym pytaniu. Nie jest OK kiedy to trzecie cross-repo pytanie tego dnia i patrzysz, jak tokeny się palą na eksplorację, której model mógłby się raz nauczyć i pamiętać na zawsze. Dzisiejszy post jest o tym, jak mi się to znudziło, co próbowałem najpierw, co skasowałem i czemu jarvis-brain wygląda tak jak wygląda.

## Jak konkretnie wygląda "burning tokens"

Puściłem benchmark. Pięćdziesiąt pytań w pięciu kategoriach - code discovery, usage tracing, cross-repo, dependency path, architecture. Po dziesięć w każdej. Dwa przebiegi: Claude Code native (tylko Glob, Grep, Read) versus Claude Code plus jarvis-brain jako MCP tool.

Headline który mnie dzisiaj interesuje to baseline. Bez braina, na tych samych pięćdziesięciu pytaniach:

- **4 145 fresh input tokenów** spalonych na eksplorację
- **14 tool calli** przy najtrudniejszych cross-repo pytaniach
- **44 sekundy średnio** dla architecture-deep pytań, **189 sekund** dla najgorszego pojedynczego
- Total wall time przez całe pięćdziesiąt: **36 minut**

Cache pomaga. Anthropic prompt cache jest agresywny i dolarowy koszt zostaje niski przez niego. Ale **fresh** input tokeny - te, które model musi przeczytać po raz pierwszy przy każdym wywołaniu - skalują się z tym, jak często Claude Code re-eksploruje ten sam kod. A re-eksploruje stale, bo tool results nie stają się trwałą pamięcią.

To nie jest bug Claude Code. Glob i Grep to właściwe prymitywy, kiedy nie masz innej mapy kodebase'u. Są uniwersalne. Działają na każdym repo bez setupu. Koszt to płacenie tokenami za eksplorację za każdym razem.

## Use case, o którym nikt nie mówi

Ból skaluje się z tym, jaki masz kod. Mój ból to coś, co - ze zmienionymi nazwami - nazwę multi-brand commerce platform.

Jeden core - shared engine. Composable'y, base components, business logic, około osiemdziesiąt procent faktycznego kodu. Cztery brand-variant fronty na wierzchu - ten sam engine, inny design system per brand, inny content, sporadyczne override'y komponentów tam, gdzie marka potrzebuje czegoś, czego core nie daje. Plus moduł admina, który konsumuje parę z tych brandów.

To wzorzec V-commerce. White-label e-commerce działa w ten sposób. Multi-tenant SaaS frontendy działają w ten sposób. Test suite'y dla spójnych aplikacji działają w ten sposób - jeden test harness, N wariantów tego samego flow. Agencje, które forkują core pod każdego klienta, działają w ten sposób.

Jeśli kiedykolwiek pytałeś Claude Code "czy jest lokalny override `AddToCart.vue` w BrandA", znasz kształt problemu. To nie jest "znajdź plik". To jest "znajdź plik, sprawdź trzy inne repo pod warianty, sprawdź który wygrywa przez priorytet Nuxt layers, sprawdź kto go woła, sprawdź czy są brand-specific composable'y po drodze". To nie jest to, do czego jest Glob. Glob znajdzie ci dziesięć plików `AddToCart.vue` w pięciu repo i zostawi cię z myśleniem, który z nich ma znaczenie.

## Co próbowałem najpierw i skasowałem

Oczywisty ruch to był naive RAG. Zaembeddingowanie kodebase'u przez Voyage albo OpenAI, wrzucenie do vector store'a, danie Claude Code tool do search. Ludzie tak robią. Są starter repo do tego.

Zbudowałem prototyp. Skasowałem go.

Dwa problemy. Po pierwsze, embeddingi source code są słabe na strukturze kodu. Są dobre w "znajdź funkcję która robi X konceptualnie". Są słabe w "znajdź każdego konsumenta `useBaseCart` w `forge-core`". Pierwsze pytanie jest semantyczne. Drugie strukturalne. Vector store nie wie, że `useBaseCart` to nazwa, nie fraza.

Po drugie, naive RAG to kolejne narzędzie do nauczenia. Claude Code ma już `Glob`, `Grep`, `Read`. Są native, przewidywalne, tanie w wywołaniu. Custom search tool siedzi obok nich i wymaga prompt engineeringu żeby go używać dobrze. Każde nowe narzędzie to friction. Narzędzie, które ma naprawić "Claude Code pali tokeny na Globie", nie powinno być "tu jest narzędzie, które Claude Code musi pamiętać żeby użyć zamiast Globa".

Rozważałem mądrzejszy Grep wrapper. Ten sam problem w innym opakowaniu - dalej czternaście tool calli, dalej token burn, tylko z marginalnie lepszym filtrem.

To, czego potrzebowałem, to nie było lepsze retrieval tool. To była inna ścieżka dostępu. Struktura kodebase'u powinna być pre-computed raz, zachowana i serwowana Claude Code'owi jako native MCP tool, który czuje się jak Glob, ale odpowiada jak senior engineer, który przeczytał kod.

To jest pivot. Nie "zbuduj search engine". Zbuduj mapę i wystaw ją przez protokół, którym Claude Code już mówi.

## Architektura w jednym akapicie

jarvis-brain ekstraktuje graf z twojego kodu. Nody to funkcje, komponenty, composable'y, typy, pliki. Edge'e to importy, calle, override'y, parent-child layer relationship. Zbudowany raz per repo, potem zmergowany przez repo w federated master graph, który wie, który brand front nadpisuje który core component, i który design system token jest używany gdzie. Graf żyje w SQLite FTS5 indexie dla full-text query, plus JSON struktura dla traversal. Serwowany Claude Code'owi przez pięć MCP toolsów, które wyglądają i czują się jak built-in primitives - `brain_query`, `brain_graph`, `brain_path`, `brain_explain`, `brain_ffcss`.

Ten ostatni detal to puenta. To nie jest kolejny search interface. To MCP-native. Claude Code widzi je w tej samej liście toolsów co `Glob` i decyduje kiedy ich użyć na podstawie tego, czego pytanie wymaga.

Jutrzejszy post to to, jak faktycznie działa indeksowanie, FTS5 trick, który robi `useBaseCart` query'owalnym przez wpisanie "Base", i benchmark breakdown przez wszystkie pięć kategorii - włącznie z kategoriami, gdzie brain **nie** wygrywa z Grepem.

## Czym się dzisiaj pobawić

`brain.sdet.it` to live demo. Brain Website z przodu, jarvis-brain backend z tyłu. Zaloguj się do publicznej demo grupy, przeglądaj graf, query'uj, zobacz FFCSS token federation. Ciekawa część to `brain.sdet.it/benchmark/` - pełne liczby z pięćdziesiąt-pytaniowego benchmarku, wyrenderowane jako static report.

Jutro public repo wjeżdża. Edukacyjny destylat na AGPL-3.0 - extractory, federacja, prompty, MCP tools, FTS5 query layer, benchmark methodology. Tyle, żeby nauczyć się wzorca, nie tyle, żeby postawić production multi-tenant deployment. To jest linia i Part 3 wytłumaczy, czemu linia jest tam, gdzie jest.

Na dzisiaj jedno pytanie warte przemyślenia: ile fresh input tokenów spaliła twoja wczorajsza sesja Claude Code na eksplorację, która nie musiała się wydarzyć?

#FromTheField - seria leci dalej jutro.

---


## Scale Beyond the Distillate: F to A in 8 Commits, Plus What Pro Tier Actually Adds

**URL:** https://portfolio.sdet.it/from-the-field/wcag-toolkit-part-3
**Published:** 2026-05-07
**Language:** en
Tags: wcag, accessibility, tooling, astro, ai-tooling, commercial, from-the-field

Day 3 of a three-day live audit. F to A in 8 commits. Multiplicative token fix story. Pro tier walk-through: multi-runtime, auto-fix, niche specialists. From the field #01 Part 3.

Three days. Three tool versions. Same portfolio. F (0/100) to A (100/100) in 8 commits, 75 minutes of Claude Code work, 16 unique WCAG findings caught and resolved.

Day 1 ran V0.2 public, the boring tier. Static TypeScript analyzer plus Playwright with axe-core. Three real findings on portfolio.sdet.it after stripping the noise floor. CI-grade work, deterministic, no AI in the loop.

Day 2 ran V0.3 public, the same backbone plus 5 AI specialists reading source. Two productive audit runs found 16 unique findings between them. Round 3 found zero new findings. Convergence, not regression.

Today, Day 3, runs V0.4 Pro on the same project. Same prompts. Same portfolio. Same WCAG 2.2 AA rules. What changes: the runtime, the auto-fix layer, the specialist roster.

Today: what Pro tier adds, and why it adds it.

## The multiplicative token fix, in detail

Yesterday I mentioned a 4-line edit that resolved 6 findings. Worth zooming in.

Round 2 of yesterday's audit returned 7 new findings. Six of them were contrast issues across six different files: `Footer.astro` (twice), `Topbar.astro` (twice), `ArticleCard.astro`, `MatrixToggle.astro`. Different selectors. Different components. Different parts of the page. Same root cause underneath.

The diagnosis came from the AI color-contrast specialist, not from the static analyzer. Static reports 6 contrast violations and stops there. The AI specialist reads the CSS source, follows `var()` indirection across files, recognizes the pattern: 6 visible symptoms, 2 design tokens, 1 root cause.

The fix:

```diff
:root {
  /* dark theme default, text on #141414 */
- --color-text-subtle: #737373;   /* 3.4:1, AA fail */
- --color-text-muted:  #a1a1a1;   /* 6.5:1, AA pass but tight */
+ --color-text-subtle: #a3a3a3;   /* 6.7:1, AA pass */
+ --color-text-muted:  #b3b3b3;   /* 8.8:1, AA pass with headroom */
}
```

Six files containing the symptoms were not opened. Footer, Topbar, ArticleCard, MatrixToggle. They didn't need editing. Every component using `--color-text-muted` or `--color-text-subtle` against the dark background got the new ratio for free.

The architectural lesson is small but important. Design-tokens-first means single source of truth. Six components map to two tokens map to one fix. The spot-fix alternative was 6 separate file edits, 6 commits, six chances for drift. Token fix: 1 file (`global.css`), 4 lines, all 6 findings resolved.

Findability and fixability are different problems. Static rule engines find 6 contrast issues. That's information. AI specialists find 6 issues and point to the 2 design tokens that cause them. That's action.

This is also where the Pro tier story starts. AI doesn't just find issues, it identifies root cause. V0.5 enterprise extends the same idea across repos: the same `--color-text-muted` issue across 5 consumer repos in a design-system organization, atomic patch all 5. Federation rather than spot-fixes.

AI semantic understanding times design-tokens-first architecture equals multiplicative impact. Get either right, you save time. Get both right, you get this.

## Pro tier walk-through, V0.4 alpha.3

Pro V0.4 alpha.3 ships today on the same project. Here's what it adds over public V0.3.

**Multi-runtime.** The public toolkit assumes Claude Code in-session. That's the optimal path for solo developers running audits during code review. Pro adds two more runtimes. OpenCode subprocess works without a Claude Code session and is model-agnostic, so the same audit pipeline runs against GPT-5, Claude, or any model OpenCode supports. Ollama local runtime sends nothing out: the model runs on the auditor's machine, source code never leaves the laptop, every prompt and response stays local.

The Ollama path matters for one reason: client compliance. Audit work for regulated industries (financial services, healthcare, public-sector contractors) cannot send client source through hosted LLMs. Ollama local runtime is the absolute minimum requirement to do that work at all. Public toolkit can't do this. Pro can, by design.

The pipeline stays runtime-agnostic. Same `WcagFinding` shape, same dedupe step, same A through F grade. The runtime is a swappable backbone, not a feature.

**Auto-fix engine.** Two deterministic patchers ship in alpha.3: `ImageAltPatcher` for missing alt attributes on `<img>` elements, `HtmlLangPatcher` for missing `lang` on `<html>`. Both produce predictable output and write atomic commits per fix. On portfolio.sdet.it, the auto-fix engine handles 1 of 22 findings from the original audit, which is 4.5% coverage.

That number is the honest version of the story. Auto-fix engine handles roughly 5%, the mechanical patches like missing alt or missing html-lang. The other 95% are author and designer decisions. `aria-label` content needs human judgment. Color tokens need design-system buy-in. Heading restructure needs editorial decisions about content. AI specialists discover these. Humans fix them. That's the public/Pro split.

Auto-fix saves the boring stuff. AI specialists save the architectural stuff. Pro is the integration of both.

## V0.4 alpha.4 preview, modal and ecommerce

Two specialists land in V0.4 alpha.4: modal-specialist and ecommerce-journey. Both Pro-only. Both niche. Both in the next sprint backlog.

This bit is honest scaffolding for the section. The alpha.4 sprint runs 06-09.05.2026, so neither agent is shipped at the time of this write-up. Treat the rest of this section as a plan, not an inventory.

**Modal-specialist.** Focus trap timing (when the trap engages, what it traps). Focus restoration on close (which element gets focus when the dialog closes, the trigger or the body). Escape key handling. `aria-modal` validation, including the cookie-banner anti-pattern where `aria-modal="true"` declares a region modal that the page is still operable around. The decision tree between `dialog` and `alertdialog`. Scroll-lock behavior on the body element when the modal opens.

The legal context isn't decorative. Cookie banner with `aria-modal="true"` is the EAA pattern that gets sites sued. European Accessibility Act, June 2025 deadline. EU e-commerce now legally exposed when assistive-tech users get told the page is fully blocked by a region that is in fact still partly operable. Generic ARIA agents flag attribute presence. modal-specialist flags the wrong choice.

**Ecommerce-journey.** Variant change announcements through `aria-live` regions when the user picks size M then size L (price update, availability update, both spoken). Payment review step (WCAG 3.3.4 plus 3.3.6, financial-transaction-specific error prevention). Color-only stock indicators (green dot without text, red border without label). Cart toast `aria-live` politeness, polite vs assertive depending on context. Filter facet count updates announced after each filter toggle.

Modal-specialist isn't one more agent. It's 8 years of e-commerce audits in one prompt. You can rebuild it. It'll take you the years I spent auditing 50+ e-commerce sites. Or you hire me.

The split is sharper than it looks. Generic keyboard agent flags missing `onKeyDown`. Modal-specialist flags focus restoration on close to which specific element. Generic forms agent checks labels. ecommerce-journey checks 3.3.4 review-step on payment forms specifically. Public toolkit gives you 5 specialists. Pro adds 2 you can't write yourself unless you've audited e-commerce for years.

## V0.5 enterprise, jarvis-brain and cross-repo

V0.5 enterprise tier (planned, post alpha.4): jarvis-brain integration. Different problem class.

The story is a scope shift, not a feature shift. Solo developer with one repo, one audit, one fix fits comfortably in V0.4 Pro. Design-system organization with 10 repos consuming shared tokens does not. The multiplicative token fix from earlier in this write-up, the one that resolved 6 findings on a single repo, scales differently when the same `--color-text-muted` issue exists across 5 consumer repos. Five spot-fixes, five PRs, five chances for drift. Or one atomic patch coordinated across the design system.

jarvis-brain is the multi-tenant knowledge vault that makes that coordination possible. Token federation across repos, DRY violation detection at the design-system layer, cross-repo deep dedupe (same defect in semantically different file paths collapses to one finding, not five).

V0.5 backlog also picks up three audit modes that the current public single-mode audit does not have. `--scope component` runs a Storybook-style audit on one component. `--scope page` runs a route-level audit, following imports two levels deep. `--scope full` runs the whole project, the default. Different audit unit, different output, different price point.

Plus a positive findings section in the report. What the audit confirmed working, not just what's broken. Trust signal for design-system maintainers who need to demonstrate progress, not just remaining work.

Three tiers. Three problems. Three pricing levels. Aligned by audit scope, not arbitrary feature gating.

## Tier comparison

Visual reference for the tier model:

```mermaid
graph LR
    subgraph Public["Public V0.3 (AGPL-3.0)"]
        A1[Static TS]
        A2[Dynamic Playwright]
        A3[5 AI Specialists]
        A4[Lead Orchestrator]
        A5[A-F Grading]
        A6[/wcag:audit skill]
    end

    subgraph Pro["Pro V0.4 alpha.3 (Commercial)"]
        B1[Everything from Public]
        B2[Multi-runtime CC/OpenCode/Ollama]
        B3[Auto-fix Engine]
        B4[wcag.config.ts]
    end

    subgraph Pro4["Pro V0.4 alpha.4 (Next sprint)"]
        C1[+modal-specialist]
        C2[+ecommerce-journey]
    end

    subgraph Enterprise["Pro V0.5 Enterprise (Planned)"]
        D1[Cross-repo via jarvis-brain]
        D2[Design System Federation]
        D3[Three audit modes]
        D4[Deep dedupe semantic]
        D5[Positive findings]
    end

    Public -.imports.-> Pro
    Pro --> Pro4
    Pro4 --> Enterprise
```

Pro imports from Public. That's the architectural relationship, not a marketing tagline. Each upper tier wraps the lower one. The 5 specialists in Pro are the 5 from Public. The audit pipeline in Enterprise is the same one from Pro. Public is the foundation, not a crippled demo.

Read top to bottom: Public is what runs in CI on every commit, Pro is what runs daily on a maintained project, Enterprise is what runs across an organization's repos. Different audit unit at each tier. Pricing aligned to that unit, not to feature gating.

## Honest commercial framing

Last word on the public/Pro split. I want to be clear about what you're paying for.

This isn't "we hide features for money." It's "we charge for niche expertise you can't write yourself unless you've audited e-commerce for years." Public is education. Pro is the niche.

Three things you buy with a Pro license. First, niche expertise: modal-specialist and ecommerce-journey encode patterns from years of audit work, not generic ARIA scans. Second, multi-runtime: Ollama local for sensitive client repos, no tokens out, the absolute minimum compliance bar for regulated industries. Third, maintenance: the rules evolve as WCAG 2.2 becomes WCAG 3.0, prompts get patched against new anti-patterns, the toolkit stays current without you tracking the spec.

Three things you do not buy. The architecture (clone the public toolkit, learn from it, rebuild it). The 5 baseline specialists (those are public, AGPL-3.0, free). Magic (the audit still requires human review of findings, AI discovers, you decide).

You can rebuild the niche specialists. It'll take you the years I spent auditing 50+ e-commerce sites. Or you hire me.

Public toolkit on GitHub: [github.com/darco81/sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit) (AGPL-3.0). Pro tier at [sdet.it/services](https://sdet.it/services). But before #01 wraps, one more thing.

## Coming in #05: the audit that found 5,816 findings

While writing this, I ran V0.4's new multi-page audit on the same portfolio. Same project, different scope. Round 4 router-scan: eleven routes, nine findings on three pages the homepage audit never touched. Token-level fix on /projects: one ruleset edit, six findings down. Same multiplicative pattern from Day 1, scaled across the site.

Then sitemap audit. Thirty-five routes, full published surface including articles and episode pages. **Five thousand, eight hundred sixteen findings.**

Four CSS-level commits later: seven. All seven are runtime false positives from a toolkit subsystem I wrote myself. Zero SERIOUS, zero AA failures. **5,816 → 7. Three orders of magnitude. Four commits. No JavaScript.**

That's #05's story - June 2-4. Multi-page audit shows you the dependency graph of bugs, not just URL list. Single fix, multiplicative cleanup. The difference between an audit tool and a Lighthouse extension.

## Series wrap, plus what's next

From the field #01 wraps. Three days, one project, three tool versions, real numbers throughout.

Day 1: V0.2 baseline, 3 real findings caught by static plus dynamic. Day 2: V0.3 plus AI, 16 unique findings across triangulation runs, Round 3 finding zero new (convergence proof). Day 3: F to A in 8 commits, 75 minutes total Claude Code work, plus the Pro tier walk-through.

Series continuity tease. Week 2: jarvis-brain, the system that stops Claude Code from burning tokens on Grep and Glob by precomputing a semantic map served via MCP. Week 3: context-first QA workflow, the platform behind my daily Jira and Tempo automation. Week 4: performance audit 5-agent pipeline. Week 5: V0.4 multi-runtime build process.

If you're building accessibility tools, design systems, or AI-driven dev workflows, follow #FromTheField. Real production, real numbers, real engineering humility. Next week: jarvis-brain.

---


## Skala poza destylat: F do A w 8 commitach + co dodaje Pro tier

**URL:** https://portfolio.sdet.pl/from-the-field/wcag-toolkit-part-3
**Published:** 2026-05-07
**Language:** pl
Tags: wcag, accessibility, tooling, astro, ai-tooling, commercial, from-the-field

Dzień 3 trzydniowego live audytu. F do A w 8 commitach. Multiplicative token fix story. Pro tier walk-through: multi-runtime, auto-fix, niche specialists. From the field #01 Część 3.

Trzy dni. Trzy wersje toola. To samo portfolio. F (0/100) do A (100/100) w 8 commitach, 75 minut Claude Code work, 16 unique WCAG findings złapanych i rozwiązanych.

Dzień 1 puścił V0.2 public, nudną warstwę. Statyczny analizator TypeScript plus Playwright z axe-core. Trzy realne findings na portfolio.sdet.it po odsianiu podłogi szumu. Robota CI-grade, deterministyczna, bez AI w pętli.

Dzień 2 puścił V0.3 public, ten sam backbone plus 5 AI specjalistów czytających source. Dwa produktywne runy audytu znalazły 16 unique findings między nimi. Round 3 znalazł zero nowych findings. Konwergencja, nie regresja.

Dzisiaj, Dzień 3, puszcza V0.4 Pro na tym samym projekcie. Te same prompty. To samo portfolio. Te same reguły WCAG 2.2 AA. Co się zmienia: runtime, warstwa auto-fixu, roster specjalistów.

Dzisiaj: co dodaje Pro tier i czemu to dodaje.

## Multiplicative token fix, w szczegółach

Wczoraj wspomniałem 4-linijkowy edit, który rozwiązał 6 findings. Warto powiększyć.

Round 2 wczorajszego audytu zwrócił 7 nowych findings. Sześć z nich to issues kontrastu w sześciu różnych plikach: `Footer.astro` (dwa razy), `Topbar.astro` (dwa razy), `ArticleCard.astro`, `MatrixToggle.astro`. Inne selektory. Inne komponenty. Inne kawałki strony. Ten sam root cause pod spodem.

Diagnoza przyszła od AI specjalisty color-contrast, nie od statycznego analizatora. Statyczny raportuje 6 contrast violations i kończy. AI specjalista czyta CSS source, idzie za `var()` indirectionem przez pliki, rozpoznaje wzorzec: 6 widocznych symptomów, 2 design tokens, 1 root cause.

Fix:

```diff
:root {
  /* dark theme default, text on #141414 */
- --color-text-subtle: #737373;   /* 3.4:1, AA fail */
- --color-text-muted:  #a1a1a1;   /* 6.5:1, AA pass but tight */
+ --color-text-subtle: #a3a3a3;   /* 6.7:1, AA pass */
+ --color-text-muted:  #b3b3b3;   /* 8.8:1, AA pass with headroom */
}
```

Sześć plików zawierających symptomy nie zostało otwartych. Footer, Topbar, ArticleCard, MatrixToggle. Nie potrzebowały edycji. Każdy komponent używający `--color-text-muted` albo `--color-text-subtle` na ciemnym tle dostał nowe ratio za darmo.

Lekcja architektoniczna jest mała, ale ważna. Design-tokens-first oznacza single source of truth. Sześć komponentów mapuje się do dwóch tokenów mapuje się do jednego fixu. Alternatywa spot-fix to 6 osobnych edycji plików, 6 commitów, sześć szans na drift. Token fix: 1 plik (`global.css`), 4 linijki, wszystkie 6 findings rozwiązane.

Findability i fixability to różne problemy. Static rule engine znajduje 6 contrast issues. To informacja. AI specjalista znajduje 6 issues i wskazuje 2 design tokens, które je powodują. To akcja.

Tutaj też zaczyna się Pro tier story. AI nie tylko znajduje issues, identyfikuje root cause. V0.5 enterprise rozszerza ten sam pomysł na cross-repo: ten sam issue `--color-text-muted` w 5 consumer repo w organizacji design-system, atomic patch wszystkich 5. Federation zamiast spot-fixów.

AI semantic understanding razy design-tokens-first architecture daje multiplicative impact. Trafisz w jedno, oszczędzasz czas. Trafisz w oba, dostajesz to.

## Pro tier walk-through, V0.4 alpha.3

Pro V0.4 alpha.3 shippuje dzisiaj na tym samym projekcie. Co dodaje ponad public V0.3.

**Multi-runtime.** Public toolkit zakłada Claude Code in-session. To optymalna ścieżka dla solo devów puszczających audyty podczas code review. Pro dodaje dwa kolejne runtime'y. OpenCode subprocess działa bez sesji Claude Code i jest model-agnostic, więc ten sam pipeline audytu chodzi przeciwko GPT-5, Claude albo dowolnemu modelowi, który OpenCode wspiera. Ollama local runtime nie wysyła nic na zewnątrz: model chodzi na maszynie audytora, source code nie opuszcza laptopa, każdy prompt i response zostaje lokalnie.

Ścieżka Ollama ma znaczenie z jednego powodu: client compliance. Robota audytu dla regulowanych branż (financial services, healthcare, public-sector kontraktorzy) nie może wysyłać client source przez hostowane LLMy. Ollama local runtime to absolutne minimum, żeby w ogóle robić tę robotę. Public toolkit tego nie umie. Pro umie, z założenia.

Pipeline zostaje runtime-agnostic. Ten sam kształt `WcagFinding`, ten sam dedupe step, ten sam grade A-F. Runtime to swappable backbone, nie feature.

**Auto-fix engine.** Dwa deterministyczne patchery shippują w alpha.3: `ImageAltPatcher` dla brakujących atrybutów alt na elementach `<img>`, `HtmlLangPatcher` dla brakującego `lang` na `<html>`. Oba produkują przewidywalny output i piszą atomic commits per fix. Na portfolio.sdet.it auto-fix engine ogarnia 1 z 22 findings z oryginalnego audytu, czyli 4.5% pokrycia.

Ta liczba to uczciwa wersja story. Auto-fix engine ogarnia mniej więcej 5%, mechaniczne łatki jak brakujący alt albo brakujący html-lang. Pozostałe 95% to decyzje autora i designera. Zawartość `aria-label` wymaga ludzkiego osądu. Tokeny kolorów wymagają buy-in design-systemu. Restrukturyzacja nagłówków wymaga decyzji edytorskich o content. AI specjaliści to odkrywają. Ludzie to naprawiają. To jest split public/Pro.

Auto-fix oszczędza tę nudną robotę. AI specjaliści oszczędzają tę architektoniczną. Pro to integracja obu.

## V0.4 alpha.4 preview, modal i ecommerce

Dwóch specjalistów ląduje w V0.4 alpha.4: modal-specialist i ecommerce-journey. Oboje Pro-only. Oboje niche. Oboje w backlogu następnego sprintu.

Ten kawałek to uczciwy scaffolding sekcji. Sprint alpha.4 leci 06-09.05.2026, więc żaden z agentów nie jest zashippowany w momencie tego write-upu. Traktuj resztę sekcji jako plan, nie inwentarz.

**Modal-specialist.** Timing focus trapu (kiedy trap się włącza, co łapie). Focus restoration na close (który element dostaje focus, gdy dialog się zamyka, trigger czy body). Obsługa klawisza Escape. Walidacja `aria-modal`, włącznie z anti-patternem cookie-banner, gdzie `aria-modal="true"` deklaruje region modalny, podczas gdy strona jest nadal częściowo operowalna dookoła. Drzewo decyzyjne między `dialog` a `alertdialog`. Zachowanie scroll-lock na elemencie body, gdy modal się otwiera.

Kontekst legalny nie jest dekoracyjny. Cookie banner z `aria-modal="true"` to wzorzec EAA, za który strony są pozywane. European Accessibility Act, deadline czerwiec 2025. EU e-commerce jest teraz prawnie wystawione, gdy userzy assistive-tech dostają informację, że strona jest w pełni zablokowana przez region, który tak naprawdę jest częściowo operowalny. Generic agenci ARIA flagują obecność atrybutu. modal-specialist flaguje zły wybór.

**Ecommerce-journey.** Announcement przy zmianie wariantu przez regiony `aria-live`, gdy user wybiera rozmiar M potem rozmiar L (price update, availability update, oba odczytane). Payment review step (WCAG 3.3.4 plus 3.3.6, error prevention specyficzny dla transakcji finansowych). Stock indicators tylko-kolorem (zielona kropka bez tekstu, czerwona ramka bez labelki). Politeness toastu z koszyka `aria-live`, polite vs assertive zależnie od kontekstu. Aktualizacje liczników filtrów ogłaszane po każdym filter toggle.

Modal-specialist to nie "kolejny agent". To 8 lat e-commerce audytów w jednym promptcie. Możesz to zbudować od zera. Zajmie Ci to lata, które ja spędziłem audytując 50+ e-commerce sites. Albo mnie zatrudnisz.

Split jest ostrzejszy, niż wygląda. Generic keyboard agent flaguje brakujący `onKeyDown`. Modal-specialist flaguje focus restoration na close do którego konkretnego elementu. Generic forms agent sprawdza labels. ecommerce-journey sprawdza 3.3.4 review-step na payment forms specyficznie. Public toolkit daje Ci 5 specjalistów. Pro dodaje 2, których sam nie napiszesz, jeśli nie audytowałeś e-commerce przez lata.

## V0.5 enterprise, jarvis-brain i cross-repo

V0.5 enterprise tier (planowane, post alpha.4): integracja jarvis-brain. Inna klasa problemu.

Story to scope shift, nie feature shift. Solo developer z jednym repo, jednym audytem, jednym fixem mieści się komfortowo w V0.4 Pro. Organizacja design-system z 10 repo konsumującymi shared tokens nie. Multiplicative token fix z wcześniejszej części tego write-upu, ten, który rozwiązał 6 findings na pojedynczym repo, skaluje się inaczej, gdy ten sam issue `--color-text-muted` istnieje w 5 consumer repo. Pięć spot-fixów, pięć PR, pięć szans na drift. Albo jeden atomic patch skoordynowany przez design system.

jarvis-brain to multi-tenant knowledge vault, który tę koordynację umożliwia. Token federation cross-repo, detekcja DRY violations na warstwie design-systemu, cross-repo deep dedupe (ten sam defekt w semantycznie różnych ścieżkach plików zwija się do jednego findingu, nie pięciu).

Backlog V0.5 łapie też trzy tryby audytu, których obecny single-mode audyt public nie ma. `--scope component` puszcza audyt w stylu Storybook na jednym komponencie. `--scope page` puszcza audyt na poziomie route, idąc za importami dwa poziomy w głąb. `--scope full` puszcza cały projekt, default.

Plus sekcja positive findings w raporcie. Co audyt potwierdził jako działające, nie tylko co jest popsute. Sygnał zaufania dla maintainerów design-systemów, którzy muszą pokazywać postęp, nie tylko zostającą robotę.

Trzy tiery. Trzy problemy. Trzy poziomy cenowe. Wyrównane przez audit scope, nie arbitralne feature gating.

## Tier comparison

Wizualna referencja modelu tierów:

```mermaid
graph LR
    subgraph Public["Public V0.3 (AGPL-3.0)"]
        A1[Static TS]
        A2[Dynamic Playwright]
        A3[5 AI Specialists]
        A4[Lead Orchestrator]
        A5[A-F Grading]
        A6[/wcag:audit skill]
    end

    subgraph Pro["Pro V0.4 alpha.3 (Commercial)"]
        B1[Everything from Public]
        B2[Multi-runtime CC/OpenCode/Ollama]
        B3[Auto-fix Engine]
        B4[wcag.config.ts]
    end

    subgraph Pro4["Pro V0.4 alpha.4 (Next sprint)"]
        C1[+modal-specialist]
        C2[+ecommerce-journey]
    end

    subgraph Enterprise["Pro V0.5 Enterprise (Planned)"]
        D1[Cross-repo via jarvis-brain]
        D2[Design System Federation]
        D3[Three audit modes]
        D4[Deep dedupe semantic]
        D5[Positive findings]
    end

    Public -.imports.-> Pro
    Pro --> Pro4
    Pro4 --> Enterprise
```

Pro importuje z Public. To architektoniczna relacja, nie marketing tagline. Każdy wyższy tier owija niższy. Pięciu specjalistów w Pro to ci pięciu z Public. Pipeline audytu w Enterprise to ten sam, co w Pro. Public to fundament, nie crippled demo.

Czytaj z góry na dół: Public to to, co chodzi w CI na każdym commicie, Pro to to, co chodzi codziennie na utrzymywanym projekcie, Enterprise to to, co chodzi w organizacji przez repo. Inna jednostka audytu w każdym tierze. Wycena wyrównana do tej jednostki, nie do feature gating.

## Uczciwy commercial framing

Ostatnie słowo o splicie public/Pro. Chcę być jasny co do tego, za co płacisz.

To nie jest "ukrywamy features za pieniądze". To jest "płacimy za niche expertise, której nie napiszesz sam, jeśli nie audytowałeś e-commerce przez lata". Public to edukacja. Pro to ekspertyza.

Trzy rzeczy, które kupujesz licencją Pro. Po pierwsze, niche expertise: modal-specialist i ecommerce-journey kodują wzorce z lat audytu, nie generic ARIA scans. Po drugie, multi-runtime: Ollama lokalnie dla wrażliwych client repo, zero tokenów na zewnątrz, absolutne minimum compliance dla regulowanych branż. Po trzecie, maintenance: reguły ewoluują wraz z WCAG 2.2 stającym się WCAG 3.0, prompty są patchowane przeciwko nowym anti-patternom, toolkit zostaje aktualny bez Twojego śledzenia speca.

Trzy rzeczy, których nie kupujesz. Architektury (sklonuj public toolkit, naucz się z niego, zbuduj od zera). Pięciu baseline specjalistów (są public, AGPL-3.0, free). Magii (audyt nadal wymaga ludzkiej rewizji findings, AI odkrywa, Ty decydujesz).

Możesz odbudować niche specjalistów. Zajmie Ci to lata, które ja spędziłem audytując 50+ e-commerce sites. Albo mnie zatrudnisz.

Public toolkit na GitHub: [github.com/darco81/sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit) (AGPL-3.0). Pro tier na [sdet.it/services](https://sdet.it/services). Ale zanim #01 się zamknie, jeszcze jedna rzecz.

## Coming w #05: audyt, który znalazł 5 816 findings

Pisząc ten artykuł odpaliłem V0.4 multi-page audit na tym samym portfolio. Ten sam projekt, inny scope. Round 4 router-scan: jedenaście route'ów, dziewięć findings na trzech stronach, których homepage audit nigdy nie tknął. Token-level fix na /projects: edycja jednego ruleset, sześć findings down. Ten sam multiplicative pattern z Dnia 1, skalowany przez cały site.

Potem sitemap audit. Trzydzieści pięć route'ów, pełen published surface włącznie z artykułami i odcinkami. **Pięć tysięcy osiemset szesnaście findings.**

Cztery CSS-level commity później: siedem. Wszystkie siedem to runtime false positives z toolkit subsystem, który sam napisałem. Zero SERIOUS, zero AA failures. **5 816 → 7. Trzy rzędy wielkości. Cztery commity. Bez JavaScript.**

To story #05 - 2-4 czerwca. Multi-page audit pokazuje dependency graph bugów, nie listę URL'i. Single fix, multiplicative cleanup. Różnica między audit tool a Lighthouse extension.

## Series wrap, plus co dalej

From the field #01 wrap. Trzy dni, jeden projekt, trzy wersje toola, realne liczby przez całość.

Dzień 1: V0.2 baseline, 3 realne findings złapane przez static plus dynamic. Dzień 2: V0.3 plus AI, 16 unique findings przez runy triangulacji, Round 3 znajdujący zero nowych (proof konwergencji). Dzień 3: F do A w 8 commitach, 75 minut total Claude Code work, plus walk-through Pro tier.

Series continuity tease. Tydzień 2: jarvis-brain, system, który zatrzymuje Claude Code przed paleniem tokenów na Grep i Glob, pre-computing semantic mapy serwowanej przez MCP. Tydzień 3: context-first QA workflow, platforma stojąca za moją codzienną automatyzacją Jira i Tempo. Tydzień 4: performance audit 5-agent pipeline. Tydzień 5: V0.4 multi-runtime build process.

Jeśli budujesz narzędzia accessibility, design systemy albo AI-driven dev workflows, śledź #FromTheField. Realna produkcja, realne liczby, realna engineering humility. Tydzień następny: jarvis-brain.

---


## Triangulation: AI Specialists Across Three Audit Runs

**URL:** https://portfolio.sdet.it/from-the-field/wcag-toolkit-part-2
**Published:** 2026-05-06
**Language:** en
Tags: wcag, accessibility, tooling, astro, ai-tooling, from-the-field

Day 2 of a three-day live audit. V0.3 adds 5 AI specialists. Two runs find 16 unique findings, third finds zero. Convergence + dogfooding bug fix. From the field #01 Part 2.

Yesterday's V0.2 audit on this same portfolio found 3 real findings. Static plus dynamic, no LLM in the loop, deterministic floor doing what it does. Today I'm running V0.3 public on the same target, same routes, same Astro 5 build. V0.3 adds 5 AI specialists reading source through Read/Grep/Glob.

Two productive audit runs, eight hours apart. 16 unique production findings caught between them. A third run, after fixing the first two batches, finds zero new findings. That's the number worth opening on: 16 caught, 0 remaining.

Two productive rounds, one convergence round. That last round is the part most AI-audit case studies leave out, because it's the part that requires running the same tool against the same project a third time and admitting if anything new shows up. Nothing did. Three independent runs, three different LLM scan surfaces, all converged on no production issues.

This is the most honest accessibility audit data I've shipped. It's also the part of the toolkit story I had no way to tell yesterday. Today's tier reads the JSX, not the rendered HTML. Different layer, different jobs, different findings.

## V0.3 architecture

V0.3 adds five specialists. Each reads source. Each has focused scope. They run in parallel.

The five are: semantic-structure (heading hierarchy, landmark coverage, lang attributes, modal heading rank), aria-patterns (ARIA misuse, live region politeness, dialog type taxonomy), keyboard-interaction (composite widgets, focus management, onClick without onKeyDown, APG keyboard tables), color-contrast-static (CSS contrast computed from source, color-only indicators, prefers-* media queries), and forms-accessibility (labels, validation timing, autocomplete, payment review steps).

Each specialist receives a focused prompt and has access to Read, Grep, Glob, and LS over the project source. No write access, no shell access, no internet. They open files, look at code, return JSON findings with `ruleId`, `file:line`, severity, WCAG SC, and a suggested fix. They don't run the project, don't render anything, don't compute pixel values. They read.

A Lead orchestrator dispatches all five through Claude Code's Task tool in a single parallel batch. Five specialists run concurrently. The orchestrator waits, collects, merges, and dedupes by `(ruleId, file:line, url)`. Findings that match the static or dynamic backbone collapse to one entry.

```mermaid
graph TB
    A[Project Source] --> B[Lead Orchestrator]
    B --> C[5 AI Specialists in parallel]
    C --> D1[semantic-structure]
    C --> D2[aria-patterns]
    C --> D3[keyboard-interaction]
    C --> D4[color-contrast-static]
    C --> D5[forms-accessibility]
    D1 --> E[Read/Grep/Glob source]
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    A --> F[Static TS Analyzer]
    A --> G[Dynamic Playwright + axe]
    E --> H[Findings Merge + Dedupe]
    F --> H
    G --> H
    H --> I[Score + Grade A-F]
    I --> J[Reports: dev + exec]
```

Same merge step as V0.2. Same `WcagFinding` shape. Same dedupe logic. The static and dynamic backbones still run, still emit findings, still feed the same penalty model and A through F grade. AI is the third source, not the replacement.

## Run 1 (morning), what AI caught that V0.2 missed

Morning audit on the same portfolio. V0.2 found 3 contrast findings. V0.3 found 9, including those 3.

The 9 findings split into three clusters. The first is aria misuse: three findings under WCAG 4.1.2 Name, Role, Value. `Hero.astro:17` had `aria-label="C64 boot"` on a `<p>` element, overriding the visible text. `ProjectCard.astro:45` had `aria-label="Tech stack"` on a `<ul>`. `EcosystemCard.astro:40` had `aria-label="Key metrics"` on a `<dl>`.

Static missed all three because aria-label values are strings. The regex matches the attribute, not whether the attribute is appropriate on a paragraph that already has implicit role and visible text. Dynamic missed them too. Axe checks attribute presence, not semantic appropriateness, and the rendered HTML still has the aria-label intact. The aria-patterns specialist read the JSX, recognized that paragraphs and definition lists already carry semantics, and flagged the overrides as anti-pattern.

The second cluster was contrast across a broader surface. Five findings, all WCAG 1.4.3. Three came back from V0.2 dynamic (the `.hero-load` line and the three `.license` badges from yesterday). The AI color-contrast specialist also caught all three by reading the CSS source directly. The new one was `ArticleCard.astro:136` `.meta` at 2.98:1 on a dark gradient background. Dynamic axe didn't catch it, because axe walks visited pages and computes contrast against the actual rendered state. The article cards in question never landed in axe's sweep with that exact gradient combination. The AI specialist read the CSS, noticed `--color-text-subtle` against the dark background token, and computed the ratio manually.

The third cluster was a single minor finding: `contact.astro:52` had `<ul role="list">`. Redundant ARIA. WCAG 4.1.2. The semantic-structure specialist flagged it with a one-line fix: drop the role attribute, the implicit role is already there.

Six issues V0.2 couldn't see, plus the three it could. Different layer, different findings. The aria misuse is invisible to anything that doesn't read source. The new contrast finding is invisible to anything that doesn't follow CSS variable indirection. That's the discovery payoff for source-reading agents.

## Round 1 fix

I fixed all 9. Six commits, twenty-five minutes. Quick details:

| Fix                                                          | File                       | WCAG SC | Commit  |
| ------------------------------------------------------------ | -------------------------- | ------- | ------- |
| Remove `aria-label` from `.hero-load` `<p>`                  | `Hero.astro:17`            | 4.1.2   | 0c0e19e |
| `aria-label` → sr-only `<h4>` + `aria-labelledby` (tech stack) | `ProjectCard.astro:45`     | 4.1.2   | 9bca4bf |
| Same pattern (key metrics)                                   | `EcosystemCard.astro:40`   | 4.1.2   | b141c52 |
| Remove redundant `role="list"`                               | `contact.astro:52`         | 4.1.2   | 356ae8c |
| `.meta` color: `text-subtle` → `text-muted`                  | `ArticleCard.astro:136`    | 1.4.3   | 3498856 |
| `.license` color: `accent-muted` → `accent`                  | `ProjectCard.astro`        | 1.4.3   | 4929186 |

The aria-label fixes weren't all the same. On the `<p>`, removing the attribute was the right call: the visible text already says it. On the `<ul>` and `<dl>`, the labels were structural ("tech stack", "key metrics"), so I added a screen-reader-only `<h4>` plus `aria-labelledby` pointing at it. Different replacement strategy per element, same anti-pattern caught.

The `.license` fix swapped a single design token (`--color-accent-muted` to `--color-accent`), which resolved all three license badge findings in one commit. First taste of token-level multiplicative impact.

Re-run the audit, expecting a clean grade. Got 7 NEW findings instead. That's where it gets interesting.

## Run 2 (afternoon), the triangulation insight

Same portfolio, post-Round-1, second audit. Seven new findings. None of them regressions. All of them in places Run 1 didn't touch.

The list, in order of severity: `Footer.astro:269` `.footer-links .note` at 1.77:1 (the worst ratio I've shipped, dark theme). `Footer.astro:259` `.footer-links a` at 2.80:1. `ArticleCard.astro:123` `.description` at 2.80:1. `Topbar.astro:194` `.icon-button` at 2.94:1. `Topbar.astro:167` `.nav-desktop a` at 2.94:1. `MatrixToggle.astro:69` `.matrix-hint` at 3.15:1. And one outside the contrast cluster: `articles/index.astro:47` filter pills missing `aria-pressed` (WCAG 3.3.2 plus 4.1.2).

The pattern: six of the seven were contrast, all using `--color-text-subtle` or `--color-text-muted` from the dark theme stylesheet. Single root cause, six visible symptoms, six different files. The seventh was a forms issue, toggle button missing programmatic state.

Why didn't Run 1 surface these? The honest answer is that LLM specialists scan slightly different surface per run. Different prompt activation, different file traversal order, different attention. Run 1 hit the aria-misuse cluster because the morning prompts steered the specialists toward semantic elements with explicit ARIA. Run 2 hit the layout shells (Footer, Topbar) and content surfaces (ArticleCard description, MatrixToggle hint) because the prompts in that run, working on a freshly fixed codebase, opened a different set of files.

This is not a bug. This is the design. AI specialists trade determinism for breadth. One run sees one slice. Three runs see the project.

The counter-narrative I want to push back against is the loud one: "AI auditors are unreliable because they're nondeterministic." That framing measures the wrong thing. The right measure isn't "did you produce identical output twice?" It's "did multiple runs converge on the same final state?" Convergence on no findings across multiple independent runs is the strongest quality signal a probabilistic auditor can give you. Static rules can't even produce that signal, because they only check what they were programmed to check.

Two complementary modes, not competitors. Static catches what's there in a deterministic, CI-friendly way. AI catches what's intended through source semantic understanding, with breadth that requires multiple runs to fully harvest. The boring tier and the AI tier do different jobs.

I had a choice at this point: ship knowing only Run 1, or fix what Run 2 found. I went deeper.

## Round 2 fix and the multiplicative token impact

Two commits. Twenty-five minutes. Seven findings resolved. One commit fixed six findings with four lines of CSS.

```diff
:root {
  /* dark theme default, text on #141414 */
- --color-text-subtle: #737373;   /* 3.4:1, AA fail */
- --color-text-muted:  #a1a1a1;   /* 6.5:1, AA pass but tight */
+ --color-text-subtle: #a3a3a3;   /* 6.7:1, AA pass */
+ --color-text-muted:  #b3b3b3;   /* 8.8:1, AA pass with headroom */
}
```

That single commit (`03ae229`) touched `global.css` and exactly nothing else. The six files containing the symptoms (Footer×2, Topbar×2, ArticleCard, MatrixToggle) were not opened. They didn't need to be. Every component using `--color-text-muted` or `--color-text-subtle` against the dark background got the new ratio for free.

The forms fix was a separate commit (`ab4d716`): added `aria-pressed` plus a small JS toggle sync to the filter pills in `articles/index.astro`. Filter pills now announce "pressed" or "not pressed" to screen readers, which is what WCAG 3.3.2 (Labels or Instructions) and the toggle button pattern from APG both expect.

Static analyzers report 6 contrast violations. The AI specialist reads source, recognizes the pattern: 6 visible symptoms, 2 design tokens, 1 root cause. One commit, four lines, six findings resolved. That's design-tokens-first architecture meeting AI semantic understanding. More on that pattern tomorrow.

Re-audit time. Round 3.

## Run 3, convergence

Round 3, same skill, same portfolio, post-Round-1-and-2. All 5 AI specialists return empty arrays.

The numbers: AI specialists 0 findings each (5 of 5 succeeded). Static analyzer 2 findings, both `playwright-report/index.html` (the Playwright HTML reporter output, generated, never deployed). Dynamic 4 raw findings deduped to 1 unique, all `<astro-dev-toolbar>` (the Astro dev-mode injection that doesn't ship to production). Total real findings, after stripping the documented noise floor: 0. Score: 100/100. Grade: A.

Three independent runs across 8 hours. Three different LLM scan surfaces, no shared context between runs. All converging on no production issues for the same codebase. That's the convergence signal. It's not a single point measurement of whether an LLM is right. It's a multi-run aggregate that asks whether the project is right.

Most AI-audit case studies stop at run 1. "Tool found N issues." One audit, no verification, no convergence test, no honest framing of what the tool sees vs. what it doesn't. The data here goes further: 9 found in run 1, 7 found in run 2 (different surface, not regression), 0 found in run 3 across independent specialists. 16 unique findings caught, 0 remaining. Multiple runs are the unit, not single runs.

Convergence is the design point. Multiple runs aggregate to broader coverage than any single run can give. Round 3 finding zero new is what "done" looks like for an AI-driven audit. Not "the LLM said no problems," but "three independent passes, three different scan surfaces, all said no problems."

## But convergence on homepage isn't convergence on the site

Three runs on the homepage said convergence. Same homepage, every audit pass, eight hours apart, three different LLM scan surfaces, zero new findings. Clean signal - for one URL.

Then V0.4 landed. Multi-page audit with 4-strategy auto-discovery: sitemap, router-scan, AI agent reading the project structure, JSON config when none of those work. Same toolkit, broader surface. I ran V0.4 router-scan on the same portfolio.

Nine new findings on three pages the homepage audit never touched. `/privacy`. `/numbers`. `/projects`. Structural HTML - `<ul>` markup where it should have been `<dl>`, `.archived` badge color contrast at 3.0:1, missing definition list semantics on stat blocks. None of them visible from the homepage's perspective, because none of those pages were in the homepage audit's surface to begin with.

Round 4 fix sprint: 35 minutes, three atomic commits. Re-audit on router-scan: clean. Then I switched to sitemap strategy and ran the audit on the full published site. 35 routes.

5,816 findings.

Long pause.

99.86% of those 5,816 were ONE bug. Shiki light-theme tokens leaking on light backgrounds across every code block on every article page. Single config change in the Astro Markdown integration, 5,808 findings cleared. The remaining 8 split into 1 residual `.archived` badge token (one-line fix) and 7 `keyboard-trap-runtime` false positives on long article pages - filed as a toolkit issue for v0.5+ (natural focus-cycle completion misclassified as a trap).

Pattern: single-page audit measures one URL. It's a Lighthouse extension at best. Multi-page audit with 4-strategy auto-discovery measures the actual production surface. That's what the word "professional" earns.

Convergence and coverage are independent quality dimensions. Three runs converged on the homepage. The site needed a different tool to find the rest.

## Quick honest moment, the dogfooding bug

I shipped V0.3 on Friday. Sunday I dogfooded it on this same portfolio. The audit returned 10 findings, but with a footnote: "All 5 AI specialists returned errors."

The bug was in the `/wcag:audit` skill, not the toolkit code. The SKILL.md instructed Claude Code to invoke the CLI via Bash subprocess with `--use-ai`. That subprocess runs Node directly, where there is no `globalThis.Task`, because the Task tool only exists inside Claude Code's JS runtime. The 5 specialists silent-failed at dispatch and the orchestrator fell through to static plus dynamic only. The CLI's design was correct (graceful degradation when Task is unavailable). The skill's design was wrong (routing AI through a layer where Task can't reach).

I missed it in the smoke test for an embarrassing reason. The smoke test ran from inside a Claude Code session, where Task is available. Task calls succeeded, output looked clean, I shipped. The production user flow (clone repo, open Claude Code, run `/wcag:audit`) hits the Bash subprocess path, where Task isn't available. Different code path, different result.

The fix took 45 minutes and one file. Refactored the skill to dispatch Task calls directly inside the Claude Code session (5 parallel calls), reserving the CLI subprocess for static plus dynamic only. Pattern: the skill is the orchestration document, the CLI is the deterministic engine, don't mix layers. v0.1 of the toolkit's earlier `wcag-static-analyze` skill got this right. v0.3 regressed. Now it doesn't. One file changed, 134 insertions, 37 deletions, no code change. Verified on a react-basic fixture: 19 findings, matches the Pro alpha.2 baseline of 19. Parity confirmed.

Public toolkit shipped Friday. Bug discovered Sunday. Fixed Monday. One file changed. That's why dogfooding before publication matters.

Smoke test in Claude Code is not the real user flow through a skill. Caught it. Fixed it. The numbers above are from the fixed skill.

## Tomorrow

Final piece tomorrow. F to A in 8 commits, 75 minutes total Claude Code work. One token edit, four lines, six findings resolved, in detail. Pro tier on the same project: multi-runtime (Claude Code, OpenCode subprocess, Ollama local for sensitive client repos), auto-fix engine with deterministic patchers (image-alt, html-lang), niche specialists landing in alpha.4 (modal-specialist, ecommerce-journey). Live auto-fix demo with before-and-after grade. Honest commercial framing: public is education, Pro is the niche, you can rebuild it or you can hire me. Series tease: next week, jarvis-brain.

If you're shipping accessibility work in 2026 and using AI in your stack, Part 3 is for you. #FromTheField.

---


## Triangulacja: AI specjaliści w trzech audytach

**URL:** https://portfolio.sdet.pl/from-the-field/wcag-toolkit-part-2
**Published:** 2026-05-06
**Language:** pl
Tags: wcag, accessibility, tooling, astro, ai-tooling, from-the-field

Dzień 2 trzydniowego live audytu. V0.3 dodaje 5 AI specjalistów. Dwa runy znajdują 16 unique findings, trzeci znajduje zero. Konwergencja + dogfooding bug fix. From the field #01 Część 2.

Wczorajszy audyt V0.2 na tym samym portfolio znalazł 3 realne findings. Static plus dynamic, zero LLM w pętli, deterministyczna podłoga robi swoje. Dziś puszczam V0.3 public na ten sam target, te same route'y, ten sam build Astro 5. V0.3 dodaje 5 AI specjalistów czytających source przez Read/Grep/Glob.

Dwa produktywne runy audytu, osiem godzin odstępu. 16 unique produkcyjnych findings złapanych między nimi. Trzeci run, po naprawieniu pierwszych dwóch batchy, znajduje zero nowych findings. To jest liczba, na której warto otworzyć: 16 złapanych, 0 zostających.

Dwie produktywne rundy, jedna runda konwergencji. Ta ostatnia runda to ten kawałek, który większość AI-audit case studies pomija, bo to ten kawałek, który wymaga puszczenia tego samego toola na ten sam projekt trzeci raz i przyznania, czy coś nowego się pojawi. Nic się nie pojawiło. Trzy niezależne runy, trzy różne powierzchnie skanu LLM, wszystkie zbiegły się na zero produkcyjnych issues.

To są najuczciwsze dane audytu accessibility, jakie do tej pory shippowałem. To też jest kawałek toolkit story, której wczoraj nie miałem jak opowiedzieć. Dzisiejsza warstwa czyta JSX, nie wyrenderowany HTML. Inna warstwa, inne zadania, inne findings.

## Architektura V0.3

V0.3 dodaje pięciu specjalistów. Każdy czyta source. Każdy ma węższy zakres. Lecą równolegle.

Pięcioro to: semantic-structure (hierarchia nagłówków, pokrycie landmarków, atrybuty lang, ranga nagłówka modala), aria-patterns (ARIA misuse, politeness live regionów, taksonomia typów dialogu), keyboard-interaction (composite widgets, focus management, onClick bez onKeyDown, tabele klawiatury z APG), color-contrast-static (CSS contrast policzony z source, color-only indicators, prefers-* media queries) i forms-accessibility (labels, validation timing, autocomplete, payment review steps).

Każdy specjalista dostaje wąsko sformułowanego prompta i ma dostęp do Read, Grep, Glob i LS po projektowym source. Bez write access, bez shell access, bez internetu. Otwierają pliki, patrzą w kod, zwracają JSON findings z `ruleId`, `file:line`, severity, WCAG SC i sugerowanym fixem. Nie puszczają projektu, nie renderują niczego, nie liczą wartości pikseli. Czytają.

Lead orchestrator dispatch'uje wszystkich pięciu przez Task tool Claude Code w jednym równoległym batchu. Pięciu specjalistów leci concurrently. Orchestrator czeka, zbiera, merguje i deduplikuje po `(ruleId, file:line, url)`. Findings, które matchują static albo dynamic backbone, zwijają się do jednej entry.

```mermaid
graph TB
    A[Project Source] --> B[Lead Orchestrator]
    B --> C[5 AI Specialists in parallel]
    C --> D1[semantic-structure]
    C --> D2[aria-patterns]
    C --> D3[keyboard-interaction]
    C --> D4[color-contrast-static]
    C --> D5[forms-accessibility]
    D1 --> E[Read/Grep/Glob source]
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    A --> F[Static TS Analyzer]
    A --> G[Dynamic Playwright + axe]
    E --> H[Findings Merge + Dedupe]
    F --> H
    G --> H
    H --> I[Score + Grade A-F]
    I --> J[Reports: dev + exec]
```

Ten sam merge step co w V0.2. Ten sam kształt `WcagFinding`. Ta sama logika dedupe. Static i dynamic backbone nadal lecą, nadal emitują findings, nadal zasilają ten sam model kar i grade A-F. AI to trzecie źródło, nie zastępca.

## Run 1 (rano), co AI złapał, czego V0.2 nie zauważył

Audyt poranny na tym samym portfolio. V0.2 znalazł 3 contrast findings. V0.3 znalazł 9, włączając te 3.

Te 9 findings dzieli się na trzy klastry. Pierwszy to misuse aria: trzy findings pod WCAG 4.1.2 Name, Role, Value. `Hero.astro:17` miał `aria-label="C64 boot"` na elemencie `<p>`, przesłaniając widoczny tekst. `ProjectCard.astro:45` miał `aria-label="Tech stack"` na `<ul>`. `EcosystemCard.astro:40` miał `aria-label="Key metrics"` na `<dl>`.

Statyczny pominął wszystkie trzy, bo wartości aria-label to stringi. Regex matchuje atrybut, nie to, czy atrybut jest sensowny na paragrafie, który już ma implicit role i widoczny tekst. Dynamic też je pominął. Axe sprawdza obecność atrybutu, nie semantyczną sensowność, a wyrenderowany HTML nadal ma aria-label nietknięte. Specjalista aria-patterns przeczytał JSX, rozpoznał, że paragrafy i definition lists już mają semantykę, i oflagował override jako anti-pattern.

Drugi klaster to kontrast na szerszej powierzchni. Pięć findings, wszystkie WCAG 1.4.3. Trzy wróciły z dynamic V0.2 (linia `.hero-load` i trzy badge'y `.license` ze wczoraj). AI specjalista color-contrast też złapał wszystkie trzy, czytając CSS source bezpośrednio. Nowy był `ArticleCard.astro:136` `.meta` przy 2.98:1 na ciemnym gradiencie. Dynamic axe go nie złapał, bo axe chodzi po odwiedzonych stronach i liczy kontrast względem faktycznego renderowanego stanu. Karty artykułów w grze nigdy nie wylądowały w sweepie axe z tą dokładną kombinacją gradientu. AI specjalista przeczytał CSS, zauważył `--color-text-subtle` na tle dark backgroundu i policzył ratio ręcznie.

Trzeci klaster to jeden minor finding: `contact.astro:52` miał `<ul role="list">`. Redundantne ARIA. WCAG 4.1.2. Specjalista semantic-structure oflagował z jednolinijkowym fixem: zdjąć atrybut role, implicit role już tam jest.

Sześć issues, których V0.2 nie mógł zobaczyć, plus te trzy, które mógł. Inna warstwa, inne findings. Aria misuse jest niewidoczne dla wszystkiego, co nie czyta source. Nowy finding kontrastu jest niewidoczny dla wszystkiego, co nie idzie za indirectionem CSS variables. To jest discovery payoff dla agentów czytających source.

## Round 1 fix

Naprawiłem wszystkie 9. Sześć commitów, dwadzieścia pięć minut. Krótkie szczegóły:

| Fix                                                          | Plik                       | WCAG SC | Commit  |
| ------------------------------------------------------------ | -------------------------- | ------- | ------- |
| Zdjąć `aria-label` z `.hero-load` `<p>`                      | `Hero.astro:17`            | 4.1.2   | 0c0e19e |
| `aria-label` -> sr-only `<h4>` + `aria-labelledby` (tech stack) | `ProjectCard.astro:45`     | 4.1.2   | 9bca4bf |
| Ten sam pattern (key metrics)                                | `EcosystemCard.astro:40`   | 4.1.2   | b141c52 |
| Zdjąć redundantne `role="list"`                              | `contact.astro:52`         | 4.1.2   | 356ae8c |
| `.meta` color: `text-subtle` -> `text-muted`                 | `ArticleCard.astro:136`    | 1.4.3   | 3498856 |
| `.license` color: `accent-muted` -> `accent`                 | `ProjectCard.astro`        | 1.4.3   | 4929186 |

Fix'y aria-label nie były wszystkie takie same. Na `<p>` zdjęcie atrybutu było słuszne: widoczny tekst już to mówi. Na `<ul>` i `<dl>` labelki były strukturalne ("tech stack", "key metrics"), więc dodałem screen-reader-only `<h4>` plus `aria-labelledby` wskazujące na nie. Inna strategia podmiany per element, ten sam anti-pattern złapany.

Fix `.license` podmienił jeden design token (`--color-accent-muted` na `--color-accent`), co rozwiązało wszystkie trzy findings badge'y license w jednym commicie. Pierwszy smak token-level multiplicative impact.

Re-run audytu, oczekuję czystego grade'u. Dostaję 7 NOWYCH findings zamiast tego. I tu się robi ciekawie.

## Run 2 (popołudniu), insight triangulacji

To samo portfolio, post-Round-1, drugi audyt. Siedem nowych findings. Żadne z nich to nie regresja. Wszystkie w miejscach, których Run 1 nie tknął.

Lista, w kolejności severity: `Footer.astro:269` `.footer-links .note` przy 1.77:1 (najgorsze ratio, jakie shippowałem, dark theme). `Footer.astro:259` `.footer-links a` przy 2.80:1. `ArticleCard.astro:123` `.description` przy 2.80:1. `Topbar.astro:194` `.icon-button` przy 2.94:1. `Topbar.astro:167` `.nav-desktop a` przy 2.94:1. `MatrixToggle.astro:69` `.matrix-hint` przy 3.15:1. I jedno spoza klastra kontrastu: `articles/index.astro:47` filter pille bez `aria-pressed` (WCAG 3.3.2 plus 4.1.2).

Wzorzec: sześć z siedmiu to kontrast, wszystkie używające `--color-text-subtle` albo `--color-text-muted` z arkusza dark theme. Jeden root cause, sześć widocznych symptomów, sześć różnych plików. Siódme to issue forms, toggle button bez programmatic state.

Czemu Run 1 tego nie wyciągnął? Uczciwa odpowiedź jest taka, że specjaliści LLM skanują nieco inną powierzchnię w każdym runie. Inna aktywacja prompta, inna kolejność traversal plików, inna uwaga. Run 1 trafił w klaster aria-misuse, bo poranne prompty kierowały specjalistów na elementy semantyczne z jawnym ARIA. Run 2 trafił w skorupy layoutu (Footer, Topbar) i powierzchnie content (ArticleCard description, MatrixToggle hint), bo prompty w tym runie, pracujące na świeżo naprawionym codebase, otworzyły inny zestaw plików.

To nie jest bug. To jest design. AI specjaliści wymieniają determinizm na szerokość. Jeden run widzi jeden wycinek. Trzy runy widzą projekt.

Counter-narratywa, którą chcę odbić, to ta głośna: "AI auditorzy są zawodni, bo są niedeterministyczni". To framing mierzy nie to, co trzeba. Słuszna miara to nie "czy wyprodukowałeś identyczny output dwa razy?". To "czy wiele runów zbiega się do tego samego stanu końcowego?". Konwergencja na zero findings przez wiele niezależnych runów to najsilniejszy sygnał jakości, jaki probabilistic auditor może Ci dać. Statyczne reguły nie mogą nawet wyprodukować takiego sygnału, bo sprawdzają tylko to, do czego były zaprogramowane.

Dwa komplementarne tryby, nie konkurenci. Static łapie to, co jest, w sposób deterministyczny i CI-friendly. AI łapie to, co jest intencją, przez source semantic understanding, z szerokością, która wymaga wielu runów dla pełnego harvestu. Nudna warstwa i warstwa AI robią różne rzeczy.

Mam wybór w tym punkcie: shippować z wiedzą tylko z Run 1 albo naprawić to, co Run 2 znalazł. Poszedłem głębiej.

## Round 2 fix i multiplicative impact tokenu

Dwa commity. Dwadzieścia pięć minut. Siedem findings rozwiązanych. Jeden commit naprawił sześć findings czterema linijkami CSS.

```diff
:root {
  /* dark theme default, text on #141414 */
- --color-text-subtle: #737373;   /* 3.4:1, AA fail */
- --color-text-muted:  #a1a1a1;   /* 6.5:1, AA pass but tight */
+ --color-text-subtle: #a3a3a3;   /* 6.7:1, AA pass */
+ --color-text-muted:  #b3b3b3;   /* 8.8:1, AA pass with headroom */
}
```

Ten jeden commit (`03ae229`) tknął `global.css` i dokładnie nic więcej. Sześć plików zawierających symptomy (Footer x2, Topbar x2, ArticleCard, MatrixToggle) nie zostało otwartych. Nie potrzebowały. Każdy komponent używający `--color-text-muted` albo `--color-text-subtle` na ciemnym tle dostał nowe ratio za darmo.

Fix forms był osobnym commitem (`ab4d716`): dodałem `aria-pressed` plus mały JS toggle sync do filter pilli w `articles/index.astro`. Filter pille teraz announce'ują "pressed" albo "not pressed" do screen readerów, co oczekują WCAG 3.3.2 (Labels or Instructions) i toggle button pattern z APG.

Statyczne analizatory raportują 6 contrast violations. AI specjalista czyta source, rozpoznaje wzorzec: 6 widocznych symptomów, 2 design tokens, 1 root cause. Jeden commit, cztery linijki, sześć findings rozwiązane. To jest design-tokens-first architecture spotykająca AI semantic understanding. Więcej o tym wzorcu jutro.

Czas na re-audyt. Round 3.

## Run 3, konwergencja

Round 3, ten sam skill, to samo portfolio, post-Round-1-i-2. Wszyscy 5 AI specjalistów zwraca puste tablice.

Liczby: AI specjaliści 0 findings each (5 z 5 z sukcesem). Statyczny analizator 2 findings, oba `playwright-report/index.html` (output Playwright HTML reportera, generowany, nigdy nie deployowany). Dynamic 4 raw findings dedupowane do 1 unique, wszystkie `<astro-dev-toolbar>` (Astro dev-mode injection, który nie shippuje na produkcję). Total realnych findings, po odsianiu udokumentowanej podłogi szumu: 0. Score: 100/100. Grade: A.

Trzy niezależne runy w 8 godzin. Trzy różne powierzchnie skanu LLM, zero shared context między runami. Wszystkie zbiegające się na zero produkcyjnych issues dla tego samego codebase. To jest sygnał konwergencji. To nie jest pojedynczy point measurement, czy LLM ma rację. To multi-run aggregate, który pyta, czy projekt ma rację.

Większość AI-audit case studies zatrzymuje się na run 1. "Tool znalazł N issues". Jeden audyt, zero weryfikacji, zero testu konwergencji, zero uczciwego framingu, co tool widzi vs czego nie widzi. Dane tutaj idą dalej: 9 znalezionych w run 1, 7 znalezionych w run 2 (inna powierzchnia, nie regresja), 0 znalezionych w run 3 między niezależnymi specjalistami. 16 unique findings złapanych, 0 zostających. Wiele runów to jednostka, nie pojedyncze runy.

Konwergencja to design point. Wiele runów agreguje się do szerszego pokrycia, niż jakikolwiek pojedynczy run może dać. Round 3 znajdujący zero nowych to tak wygląda "done" dla audytu AI-driven. Nie "LLM powiedział że nie ma problemów", ale "trzy niezależne przebiegi, trzy różne powierzchnie skanu, wszystkie powiedziały że nie ma problemów".

## Ale konwergencja na homepage to nie konwergencja na całym site

Trzy runy na homepage powiedziały konwergencję. Ta sama strona, każde przejście audytu, osiem godzin między nimi, trzy różne powierzchnie skanu LLM, zero nowych findings. Czysty sygnał - dla jednego URL.

Potem wylądowało V0.4. Multi-page audit z 4-strategy auto-discovery: sitemap, router-scan, AI agent czytający strukturę projektu, JSON config gdy żadne z tych nie działa. Ten sam toolkit, szersza powierzchnia. Puściłem V0.4 router-scan na tym samym portfolio.

Dziewięć nowych findings na trzech stronach, których audyt homepage nigdy nie tknął. `/privacy`. `/numbers`. `/projects`. Strukturalne problemy HTML - `<ul>` tam, gdzie powinno być `<dl>`, kontrast `.archived` badge przy 3.0:1, brak semantyki definition list na blokach statystyk. Żadne z nich niewidoczne z perspektywy homepage, bo żadnej z tych stron nie było w surface'ie audytu homepage.

Round 4 fix sprint: 35 minut, trzy atomic commits. Re-audit na router-scan: clean. Potem przełączyłem na sitemap strategy i odpaliłem audyt na całym opublikowanym site. 35 routes.

5,816 findings.

Długa pauza.

99.86% z tych 5,816 to JEDEN bug. Shiki light-theme tokens przeciekały na jasne tła w każdym code blocku na każdej stronie artykułu. Jedna zmiana konfiguracji w Astro Markdown integration, 5,808 findings wyczyszczone. Pozostałe 8 podzieliło się na 1 residual `.archived` badge token (one-line fix) i 7 `keyboard-trap-runtime` false positives na długich stronach artykułów - zgłoszone jako toolkit issue dla v0.5+ (naturalna konwergencja focus cycle błędnie klasyfikowana jako pułapka).

Wzorzec: single-page audit mierzy jeden URL. To w najlepszym razie rozszerzenie Lighthouse'a. Multi-page audit z 4-strategy auto-discovery mierzy realną produkcyjną powierzchnię. Dopiero to zarabia słowo "professional".

Konwergencja i pokrycie to niezależne wymiary jakości. Trzy runy zbiegły się na homepage. Site potrzebował innego narzędzia, żeby znaleźć resztę.

## Krótki uczciwy moment, dogfooding bug

Shippowałem V0.3 w piątek. W niedzielę dogfoodowałem na tym samym portfolio. Audyt zwrócił 10 findings, ale z przypisem: "Wszyscy 5 AI specjalistów zwróciło błędy".

Bug był w skillu `/wcag:audit`, nie w kodzie toolkit. SKILL.md instruował Claude Code, żeby wywołać CLI przez Bash subprocess z `--use-ai`. Ten subprocess puszcza Node bezpośrednio, gdzie nie ma `globalThis.Task`, bo Task tool istnieje tylko w runtime JS Claude Code. Pięciu specjalistów silent-failowało przy dispatchu, a orchestrator spadł do static plus dynamic only. Design CLI był poprawny (graceful degradation, gdy Task jest niedostępny). Design skilla był zły (routowanie AI przez warstwę, gdzie Task nie sięga).

Pominąłem to w smoke teście z żenującego powodu. Smoke test leciał z wewnątrz sesji Claude Code, gdzie Task jest dostępny. Wywołania Task się powiodły, output wyglądał czysto, shippowałem. Produkcyjny user flow (clone repo, otwórz Claude Code, puść `/wcag:audit`) trafia w ścieżkę Bash subprocess, gdzie Task jest niedostępny. Inna ścieżka kodu, inny wynik.

Fix zajął 45 minut i jeden plik. Refaktoryzowałem skilla, żeby dispatch'ował wywołania Task bezpośrednio wewnątrz sesji Claude Code (5 równoległych wywołań), zostawiając CLI subprocess tylko dla static plus dynamic. Wzorzec: skill to dokument orkiestracji, CLI to deterministyczny silnik, nie mieszać warstw. v0.1 wcześniejszego skilla `wcag-static-analyze` toolkit miał to dobrze. v0.3 zregresjonował. Teraz nie. Jeden plik zmieniony, 134 insertions, 37 deletions, zero zmian w kodzie. Zweryfikowane na fixturze react-basic: 19 findings, matchuje baseline Pro alpha.2 19. Parytet potwierdzony.

Public toolkit shipped w piątek. Bug znaleziony w niedzielę. Fix w poniedziałek. Jeden plik zmieniony. Dlatego dogfooding przed publikacją ma znaczenie.

Smoke test w Claude Code to nie jest realny user flow przez skill. Złapałem. Naprawiłem. Liczby wyżej są z naprawionego skilla.

## Jutro

Ostatni kawałek jutro. F do A w 8 commitach, 75 minut total Claude Code work. Jeden token edit, cztery linijki, sześć findings rozwiązane, w szczegółach. Pro tier na tym samym projekcie: multi-runtime (Claude Code, OpenCode subprocess, Ollama lokalnie dla wrażliwych client repo), auto-fix engine z deterministycznymi patcherami (image-alt, html-lang), niche specjaliści lądujący w alpha.4 (modal-specialist, ecommerce-journey). Live demo auto-fixu z grade'em before-and-after. Uczciwy commercial framing: public to edukacja, Pro to ekspertyza, możesz to zbudować od zera albo mnie zatrudnić. Series tease: tydzień następny, jarvis-brain.

Jeśli shippujesz robotę accessibility w 2026 i używasz AI w swoim stacku, Część 3 jest dla Ciebie. #FromTheField.

---


## When axe-core Isn't Enough: Auditing My Own Portfolio with V0.2 Public

**URL:** https://portfolio.sdet.it/from-the-field/wcag-toolkit-part-1
**Published:** 2026-05-05
**Language:** en
Tags: wcag, accessibility, tooling, astro, from-the-field

Day 1 of a three-day live audit. V0.2 public WCAG toolkit catches 3 real findings on portfolio.sdet.it. Static + dynamic baseline, no AI yet. From the field #01 Part 1.

I'm three days into auditing my own portfolio with my own toolkit, live, in public. Today is day one, and I'm running the boring tier.

Portfolio.sdet.it is an Astro 5 dual-domain site I'm shipping for real. Not a fixture, not a contrived demo. Real content, real components, real hosting on a VPS I configured a week ago. Three days of audits, three tool versions, same target.

Day 1 (today) is V0.2 of my public WCAG toolkit. Static TypeScript analyzer plus Playwright with axe-core. Deterministic, CI-friendly, no LLM in the loop. The kind of tier you trust to fail your build at 2am.

Day 2 will run V0.3 public, the same toolkit plus 5 AI specialists reading source through Read/Grep/Glob. Day 3 will run V0.4 Pro, the commercial tier. Each day adds capability. Each day finds findings the previous tier missed.

Today's tier is what runs in CI. It catches what's there in HTML and CSS. It misses what's intended in source. Here's what that looks like on a real project.

## A quick word on how I got here

Backstory before we look at numbers: this isn't where I planned to be.

Day 1 of the WCAG sprint, four hours in, I had a working TypeScript analyzer that pattern-matched HTML for missing alt attributes, missing lang, missing landmarks. AI agents wrapped around it, translating axe-style findings into prettier output. Same regex matching at the core, fancier syntax on top. Another axe-core wrapper.

I deleted it.

A regex doesn't know that `onClick={handler}` without `onKeyDown={handler}` breaks keyboard users. AI reading the JSX does. That was the pivot. AI specialists read source through Read/Grep/Glob and write findings in their own words. Static rules become the deterministic fallback for CI, not the main flow. Two layers, different jobs.

That's why V0.2 looks the way it does. It's not "the whole tool." It's the deterministic floor of a tool whose discovery layer is AI. CI-friendly, fast, predictable. You run it on every commit and it tells you what's rendered wrong. It does not tell you what's structurally wrong in source.

The full pivot story is its own post. For now: V0.2 is what you keep when you decide AI is the discovery layer, not the wrapper.

## V0.2 architecture

V0.2 public has two paths. Both deterministic. Both fast. Neither AI.

The static TypeScript analyzer pattern-matches HTML and CSS source files. Missing img alt attributes, missing html lang, missing landmark roles, redundant ARIA roles. Things you can find with a regex over source text. Zero LLM calls, zero tokens, sub-second on a portfolio-sized repo. CI-friendly by design.

The dynamic path runs Playwright against the dev server, then runs axe-core inside a real browser. Computed contrast ratios, focus indicators, keyboard-reachable controls, ARIA in the rendered DOM. Catches what's actually shipped to users, not just what's in source. Slower than static (about 2 seconds for a 4-route audit), but it sees what regex can't compute.

```mermaid
graph LR
    A[Project Source] --> B[Static TS Analyzer]
    A --> C[Dev Server]
    C --> D[Dynamic Playwright + axe]
    B --> E[Findings Merge]
    D --> E
    E --> F[Markdown Reports]
```

Both paths emit the same `WcagFinding` shape: `ruleId`, `file:line` (or `url:selector`), severity, WCAG SC reference, suggested fix. The orchestrator merges them, dedupes by `(ruleId, file:line, url)`, scores against a penalty model (-15 critical, -10 serious, -5 moderate, -2 minor, floor at 0), and emits an A through F grade.

That's the deterministic floor. It's not nothing. It's just not enough.

## Numbers from the V0.2 baseline

Here's what V0.2 saw on portfolio.sdet.it.

| Source                      | Findings | Real | Notes                                                       |
| --------------------------- | -------: | ---: | ----------------------------------------------------------- |
| Static TS                   |        2 |    0 | Both `playwright-report/index.html` (test artifact)         |
| Dynamic axe                 |        4 |    3 | `.hero-load` + `.license` × 3 contrast                      |
| Dynamic focus-visibility    |        4 |    0 | All `<astro-dev-toolbar>` (Astro dev injection)             |
| **Total**                   |   **10** | **3** |                                                            |

Severity breakdown: 0 critical, 10 serious, 0 moderate, 0 minor. Score: 0/100 (penalty 100, floored). Grade: F. Wall time: 2.14 seconds.

That's a failing grade with a clean ten-finding report. Looks dramatic. Reality is more boring: 7 of 10 findings were noise. Test artifacts and dev-mode injections that never ship to production.

The `playwright-report/index.html` lives in `playwright-report/` because the Playwright HTML reporter writes there after every test run. That HTML is autogenerated, never deployed, but the static analyzer scans it because it lives in the repo. Fixable with a `.wcagignore` glob, on the roadmap.

The `<astro-dev-toolbar>` findings are the Astro dev-mode toolbar injection. It only exists when you run `astro dev`. It never ships to production. Re-running against a `pnpm preview` build eliminates them. Documented noise.

Honest number after stripping noise: 3 real bugs. All contrast. All in the dark theme. All would slip past V0.2 if they didn't render in browser. Static caught zero of them, because all three hide behind CSS variable indirection and design tokens.

Three findings on a 23-page site. Clean. But that's because static + dynamic only catches what's there in HTML and CSS. The bugs in source are still in source.

## The 3 real findings

Three real bugs. Same root-cause shape: design token misuse hiding behind CSS variables. Static didn't see them. Dynamic did.

Bug 1 lives at `src/components/sections/Hero.astro` line 17. The hero load line ("KERNAL READY") renders `#38935a` on white at 3.82:1. WCAG 1.4.3 wants at least 4.5:1 for normal text. Source uses `var(--color-accent)` plus `opacity: 0.85`, which static analysis can't multiply. Dynamic axe ran the page, got the computed color, computed the ratio, flagged it.

Bugs 2-4 are the same root cause across three project cards. File: `src/components/cards/ProjectCard.astro` lines 199-202. The `.license` badges (AGPL-3.0, MIT featured, MIT third card) render `#22c55e` on `#fafafa` at 2.18:1. That's well below the 3:1 floor for non-text contrast, let alone 4.5:1 for text.

One design token (`--color-accent-muted`, resolving to a too-light green in light theme) feeds three rendered components. Static analysis sees three separate `.license` selectors with `var(--color-accent-muted)` and can't follow the indirection to know what RGB value comes out the other side. Dynamic axe walks each project page, computes the actual rendered color, reports three findings. Three visible symptoms, one root cause, but V0.2 reports them as three separate findings because that's all it can see.

These are the wins of dynamic testing. The losses come tomorrow.

## What V0.2 is missing

Here's the kicker. There are nine more production WCAG bugs in this codebase. V0.2 won't find them. Not in this run, not in ten more runs.

Sneak preview without spoilers (those are for Part 2):

- aria-label misuse on semantic elements: `aria-label="C64 boot"` on a `<p>`, `aria-label="Tech stack"` on a `<ul>`, `aria-label="Key metrics"` on a `<dl>`. Three instances, three different elements, one anti-pattern.
- Heading hierarchy gaps in MDX content.
- A token-level contrast issue affecting six components on the dark theme through a single `--color-text-subtle` value.
- Filter pills missing toggle state (`aria-pressed`).
- A few smaller things.

Why static can't see them: aria-label values are strings. A regex matches the attribute presence, not whether the attribute is appropriate on the element. Token misuse propagates through `var()` indirection across files. Toggle state requires understanding interaction context, not text.

Why dynamic can't see them either: most aria misuse is in JSX or Astro templates, and the rendered HTML still has the attribute (axe checks presence, not appropriateness). The visited-pages-only sweep misses internal article routes. ARIA attribute presence does not equal semantic correctness, and that's a semantic call, not a pattern.

These are the wins of AI specialists reading source. They open a file, recognize that a `<p>` already has an implicit role and visible text, and flag the aria-label as an override anti-pattern. They follow `var()` chains across files. They reason about toggle state.

Tomorrow Part 2.

## Honest cliffhanger

If V0.2 was the whole tool, I'd ship it as "use this in CI, manually audit the rest." That's where most static toolkits stop, and it's a reasonable place to stop. Three production bugs caught on a live portfolio is real value. CI failing on contrast 3.82:1 versus required 4.5:1 is exactly the kind of guardrail static analysis exists to provide.

But it's not enough. Nine more bugs sit in source right now, and no amount of re-running V0.2 will surface them. Different scan surface, different layer. Static asks "what is rendered." Dynamic asks "what is computed." Neither asks "what is intended."

Tomorrow Part 2: same portfolio, V0.3 public adds 5 AI specialists reading source through Read/Grep/Glob. Spoiler: 16 unique findings caught across two audit runs. Triangulation, not regression. #FromTheField.

## Repo and clone

Repo: [github.com/darco81/sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit) (AGPL-3.0).

Quick start:

```bash
pnpm install
pnpm -r build
wcag-toolkit audit . --url http://localhost:4321
```

That gets you the V0.2 baseline (static + dynamic). For the full V0.3 AI tier, open Claude Code in your project root and run `/wcag:audit`. The skill dispatches 5 specialists in parallel through the Task tool, merges findings with the deterministic backbone, emits an A-F grade.

Pro tier at [sdet.it/services](https://sdet.it/services): multi-runtime (Claude Code, OpenCode, Ollama local for sensitive client repos), auto-fix engine, niche specialists.

Series continues tomorrow. Part 2: triangulation.

---


## Kiedy axe-core to za mało: audyt mojego portfolio z V0.2 Public

**URL:** https://portfolio.sdet.pl/from-the-field/wcag-toolkit-part-1
**Published:** 2026-05-05
**Language:** pl
Tags: wcag, accessibility, tooling, astro, from-the-field

Dzień 1 trzydniowego live audytu. V0.2 public WCAG toolkit łapie 3 real findings na portfolio.sdet.it. Static + dynamic baseline, jeszcze bez AI. From the field #01 Część 1.

Trzeci dzień audytu własnego portfolio własnym toolkitem, na żywo, publicznie. Dziś jest dzień pierwszy i puszczam tę nudną warstwę.

Portfolio.sdet.it to dwujęzyczny site na Astro 5, który shippuję na produkcję. Nie fixture, nie wykombinowane demo. Realny content, realne komponenty, realny hosting na VPS, który skonfigurowałem tydzień temu. Trzy dni audytów, trzy wersje toola, ten sam target.

Dzień 1 (dzisiaj) to V0.2 mojego public WCAG toolkitu. Statyczny analizator TypeScript plus Playwright z axe-core. Deterministyczny, CI-friendly, zero LLM w pętli. Warstwa, której ufasz że wywali Ci build o drugiej w nocy.

Dzień 2 puści V0.3 public, ten sam toolkit plus 5 AI specjalistów czytających source przez Read/Grep/Glob. Dzień 3 puści V0.4 Pro, warstwę komercyjną. Każdy dzień dodaje capability. Każdy dzień łapie findings, których poprzednia warstwa nie widziała.

Dzisiejsza warstwa to to, co chodzi w CI. Łapie to, co jest w HTML i CSS. Pomija to, co jest intencją w source. Tak to wygląda na realnym projekcie.

## Krótko o tym, jak tu trafiłem

Backstory zanim spojrzymy w liczby: to nie jest miejsce, w którym planowałem być.

Dzień 1 sprintu WCAG, cztery godziny w środku, miałem działający analizator TypeScript, który pattern-matchował HTML pod brakujące atrybuty alt, brak lang, brak landmarków. AI agenci owinięci wokół niego, tłumaczący axe-style findings na ładniejszy output. Ten sam regex matching w środku, ładniejszy syntax na wierzchu. Kolejny wrapper na axe-core.

Skasowałem go.

Regex nie wie, że `onClick={handler}` bez `onKeyDown={handler}` rozwala keyboard userów. AI czytający JSX wie. To był pivot. AI specjaliści czytają source przez Read/Grep/Glob i piszą findings własnymi słowami. Statyczne reguły zostają deterministycznym fallbackiem dla CI, nie głównym flow. Dwie warstwy, dwa różne zadania.

Dlatego V0.2 wygląda tak, jak wygląda. To nie jest "cały tool". To deterministyczna podłoga toola, którego warstwa odkrywania jest AI. CI-friendly, szybka, przewidywalna. Puszczasz na każdym commicie i mówi Ci, co jest wyrenderowane źle. Nie powie Ci, co jest źle strukturalnie w source.

Pełna historia pivotu to osobny post. Na teraz: V0.2 to to, co zostawiasz, kiedy zdecydujesz, że AI jest warstwą odkrywania, a nie wrapperem.

## Architektura V0.2

V0.2 public ma dwie ścieżki. Obie deterministyczne. Obie szybkie. Żadna AI.

Statyczny analizator TypeScript pattern-matchuje pliki HTML i CSS source. Brakujące atrybuty alt na img, brak html lang, brak ról landmarków, redundantne role ARIA. Rzeczy, które znajdziesz regexem na tekście source. Zero wywołań LLM, zero tokenów, sub-sekundowo na repo wielkości portfolio. CI-friendly z założenia.

Ścieżka dynamiczna puszcza Playwrighta na dev serwer, potem axe-core w realnej przeglądarce. Computed contrast ratio, focus indicators, keyboard-reachable controls, ARIA w wyrenderowanym DOM. Łapie to, co faktycznie shippuje się do userów, nie tylko to, co jest w source. Wolniejsza od statycznej (około 2 sekundy dla audytu 4 routów), ale widzi to, czego regex nie policzy.

```mermaid
graph LR
    A[Project Source] --> B[Static TS Analyzer]
    A --> C[Dev Server]
    C --> D[Dynamic Playwright + axe]
    B --> E[Findings Merge]
    D --> E
    E --> F[Markdown Reports]
```

Obie ścieżki emitują ten sam kształt `WcagFinding`: `ruleId`, `file:line` (albo `url:selector`), severity, WCAG SC reference, sugerowany fix. Orchestrator merguje je, deduplikuje po `(ruleId, file:line, url)`, scoruje wg modelu kar (-15 critical, -10 serious, -5 moderate, -2 minor, podłoga 0) i emituje grade A-F.

To jest deterministyczna podłoga. To nie jest nic. To jest po prostu za mało.

## Liczby z baseline V0.2

Co V0.2 zobaczył na portfolio.sdet.it.

| Źródło                      | Findings | Real | Notatki                                                       |
| --------------------------- | -------: | ---: | ------------------------------------------------------------- |
| Static TS                   |        2 |    0 | Oba `playwright-report/index.html` (artefakt testów)          |
| Dynamic axe                 |        4 |    3 | `.hero-load` + `.license` x 3 contrast                        |
| Dynamic focus-visibility    |        4 |    0 | Wszystkie `<astro-dev-toolbar>` (Astro dev injection)         |
| **Total**                   |   **10** | **3** |                                                               |

Severity breakdown: 0 critical, 10 serious, 0 moderate, 0 minor. Score: 0/100 (penalty 100, podłoga). Grade: F. Wall time: 2.14 sekundy.

Failing grade z czystym dziesięcioma findings. Wygląda dramatycznie. Rzeczywistość jest nudniejsza: 7 z 10 to noise. Artefakty testów i dev-mode injecty, które nigdy nie shippują na produkcję.

Plik `playwright-report/index.html` siedzi w `playwright-report/`, bo Playwright HTML reporter zapisuje tam po każdym test runie. To autogenerowany HTML, nigdy nie deployowany, ale statyczny analizator skanuje go, bo siedzi w repo. Naprawialne `.wcagignore` globem, na roadmapie.

Findings z `<astro-dev-toolbar>` to injection toolbara dev-mode od Astro. Istnieje tylko, kiedy puszczasz `astro dev`. Nigdy nie shippuje na produkcję. Re-run na buildzie z `pnpm preview` je eliminuje. Udokumentowany szum.

Uczciwa liczba po odsianiu noise: 3 realne bugi. Wszystkie kontrast. Wszystkie w dark theme. Wszystkie prześlizgnęłyby się obok V0.2, gdyby nie wyrenderowały się w przeglądarce. Statyczny złapał zero z nich, bo wszystkie trzy chowają się za indirectionem CSS variables i design tokens.

Trzy findings na 23-stronicowym site. Czysto. Ale to dlatego, że static plus dynamic łapie tylko to, co jest w HTML i CSS. Bugi w source nadal siedzą w source.

## 3 realne findings

Trzy realne bugi. Ten sam kształt root cause: misuse design tokenów chowający się za zmiennymi CSS. Statyczny ich nie zobaczył. Dynamic zobaczył.

Bug 1 siedzi w `src/components/sections/Hero.astro` linia 17. Linia hero load ("KERNAL READY") renderuje `#38935a` na białym przy 3.82:1. WCAG 1.4.3 chce minimum 4.5:1 dla normalnego tekstu. Source używa `var(--color-accent)` plus `opacity: 0.85`, czego analiza statyczna nie pomnoży. Dynamic axe puścił stronę, dostał computed color, policzył ratio, oflagował.

Bugi 2-4 to ten sam root cause na trzech project cards. Plik: `src/components/cards/ProjectCard.astro` linie 199-202. Badge'e `.license` (AGPL-3.0, MIT featured, MIT trzecia karta) renderują `#22c55e` na `#fafafa` przy 2.18:1. To dobrze poniżej podłogi 3:1 dla non-text contrast, nie mówiąc o 4.5:1 dla tekstu.

Jeden design token (`--color-accent-muted`, rozwiązujący się do za jasnego zielonego w light theme) zasila trzy wyrenderowane komponenty. Analiza statyczna widzi trzy osobne selektory `.license` z `var(--color-accent-muted)` i nie potrafi pójść za indirectionem, żeby wiedzieć, jaka wartość RGB wychodzi z drugiej strony. Dynamic axe chodzi po każdej stronie projektu, liczy faktyczny renderowany kolor, raportuje trzy findings. Trzy widoczne symptomy, jeden root cause, ale V0.2 raportuje je jako trzy osobne findings, bo tyle zobaczy.

To są wygrane testowania dynamicznego. Przegrane przyjdą jutro.

## Czego V0.2 nie widzi

I to jest ten haczyk. W tym codebase siedzi jeszcze dziewięć produkcyjnych bugów WCAG. V0.2 ich nie znajdzie. Ani w tym runie, ani w dziesięciu kolejnych.

Sneak preview bez spoilerów (te są na Część 2):

- aria-label misuse na elementach semantycznych: `aria-label="C64 boot"` na `<p>`, `aria-label="Tech stack"` na `<ul>`, `aria-label="Key metrics"` na `<dl>`. Trzy instancje, trzy różne elementy, jeden anti-pattern.
- Dziury w hierarchii nagłówków w MDX content.
- Issue na poziomie tokena affecting sześć komponentów na dark theme przez jedną wartość `--color-text-subtle`.
- Filter pille bez stanu toggle (`aria-pressed`).
- Parę pomniejszych rzeczy.

Dlaczego statyczny tego nie zobaczy: wartości aria-label to stringi. Regex matchuje obecność atrybutu, nie to, czy atrybut jest sensowny na tym elemencie. Token misuse propaguje się przez `var()` indirection w wielu plikach. Stan toggle wymaga rozumienia kontekstu interakcji, nie tekstu.

Dlaczego dynamic też tego nie zobaczy: większość misuse aria jest w JSX albo Astro template'ach, a wyrenderowany HTML nadal ma ten atrybut (axe sprawdza obecność, nie sensowność). Sweep tylko po odwiedzonych stronach pomija wewnętrzne route'y artykułów. Obecność atrybutu ARIA nie równa się semantycznej poprawności, a to jest semantyczna decyzja, nie pattern.

To są wygrane AI specjalistów czytających source. Otwierają plik, rozpoznają, że `<p>` ma już implicit role i widoczny tekst, i flagują aria-label jako anti-pattern przesłaniający override. Idą za łańcuchem `var()` przez pliki. Rozumują o stanie toggle.

Jutro Część 2.

## Uczciwy cliffhanger

Gdyby V0.2 było całym toolem, shippowałbym jako "użyj tego w CI, resztę audytuj manualnie". Tam się większość statycznych toolkitów zatrzymuje i to jest sensowne miejsce zatrzymania. Trzy produkcyjne bugi złapane na żywym portfolio to realna wartość. CI failujące na kontraście 3.82:1 vs wymagane 4.5:1 to dokładnie ten guardrail, dla którego statyczna analiza istnieje.

Ale to za mało. Dziewięć kolejnych bugów siedzi teraz w source i żadna ilość re-runów V0.2 ich nie wyciągnie. Inna powierzchnia skanu, inna warstwa. Statyczny pyta "co jest renderowane". Dynamic pyta "co jest computed". Żaden nie pyta "co jest intencją".

Jutro Część 2: to samo portfolio, V0.3 public dodaje 5 AI specjalistów czytających source przez Read/Grep/Glob. Spoiler: 16 unique findings złapanych przez dwa runy audytów. Triangulacja, nie regression. #FromTheField.

## Repo i clone

Repo: [github.com/darco81/sdet-wcag-toolkit](https://github.com/darco81/sdet-wcag-toolkit) (AGPL-3.0).

Quick start:

```bash
pnpm install
pnpm -r build
wcag-toolkit audit . --url http://localhost:4321
```

To daje Ci baseline V0.2 (static + dynamic). Dla pełnej warstwy AI V0.3 otwórz Claude Code w korzeniu projektu i puść `/wcag:audit`. Skill dispatch'uje 5 specjalistów równolegle przez Task tool, merguje findings z deterministycznym backbone, emituje grade A-F.

Pro tier na [sdet.it/services](https://sdet.it/services): multi-runtime (Claude Code, OpenCode, Ollama lokalnie dla wrażliwych client repo), auto-fix engine, niche specjaliści.

Seria leci jutro. Część 2: triangulacja.

---


---

# How I Do It - Methodology (10)

Working methodology reference: bug reporting, test architecture, test case,
test plan, Playwright class patterns.


## Bug Reporting Methodology

**URL:** https://portfolio.sdet.it/how-i-do-it/bug-reporting
**Published:** 2025-03-10
**Language:** en
Tags: qa, bug-reporting, methodology

How I report bugs - systematic approach with reproducible steps, impact assessment, and clear acceptance criteria.

### An important aspect of software development is proper bug reporting

**[ECOMMERCE] Logical error when canceling an order: exception when attempting to refund the total amount ("Refund value cannot exceed order amount")**

When performing the order cancellation procedure with the "Refund All" option, an exception appears in the system logs indicating an attempt to increase the refund amount above the original order value. Despite this, the operation in the user interface completes successfully and no error message is presented to the customer.

**Steps to reproduce:**

1. Go to the _Orders_ section
2. Perform cancellation with the "Refund All" option.
3. Observe the application behavior (no errors in the interface) and system logs (exception).

**Expected behavior:**

- The system should properly handle the order refund logic.
- The refund value should not exceed the original order amount.
- The operation should complete without any exceptions in the system logs.

**Actual behavior:**

- The cancellation operation with the "Refund All" option completes successfully in the interface, but an exception is generated in the logs.

**Error logs:**

```typescript
ERROR [RefundService] (transaction-handler-thread): Transaction error: Value cannot exceed original order amount
at com.onlineshop.payment.RefundProcessor.validateAmount(RefundProcessor.java:142)
at com.onlineshop.payment.RefundProcessor.processFullRefund(RefundProcessor.java:89)
```

**[INVENTORY] Product status in the Warehouse panel doesn't update after delivery approval**

In the **Warehouse** panel on the product list, after approving a delivery, the product status remains as "awaiting delivery" and doesn't update to the correct status. This issue hampers warehouse inventory monitoring and can mislead warehouse staff.

**Steps to reproduce:**

1. Go to the **Warehouse** panel.
2. Approve a product delivery, which should change the product status.
3. Check the product list - notice that the status remains set as "awaiting delivery," even though the delivery receipt operation has been completed.

**Expected behavior:**
After approving a delivery, the product status should be updated to the appropriate state (e.g., "available," "in stock," or another appropriate status) to clearly indicate the current availability of the product.

**Actual behavior:**
The product status doesn't automatically change after delivery approval, causing inconsistency between the actual warehouse state and the information displayed in the system.

---

**[USER MANAGEMENT] "Context menu in the Users section doesn't disappear after clicking on the account activation option"**

In the **User Management** view, the context menu expands correctly after right-clicking on a user profile. The problem is that after clicking on the account activation option (both for a single user and when trying to activate multiple accounts simultaneously), the context menu doesn't close. It should automatically disappear when the user clicks outside the menu or selects an option from the menu.

**Steps to reproduce:**

1. Go to the **Users** section and open the **User Management** view.
2. Right-click on a user profile to invoke the context menu - the menu expands.
3. Click on the "Activate account" option in the context menu.
4. Observe that the context menu doesn't disappear, even though it should close.

**Expected behavior:**
The context menu should automatically disappear after performing the following actions:

- Clicking on any menu option (e.g., "Activate account," "Block access").
- Clicking outside the menu.
- Pressing the Escape key.

**Actual behavior:**
The context menu remains visible and doesn't respond to clicks on menu options or clicks outside the menu, causing administrator confusion and requiring additional actions to close the menu.

### It's understood that screenshots, short videos, console content, or network information from devtools should be attached to the report.

---


## Metodologia raportowania bugów

**URL:** https://portfolio.sdet.pl/how-i-do-it/bug-reporting
**Published:** 2025-03-10
**Language:** pl
Tags: qa, bug-reporting, methodology

Jak raportuję bugi - systematyczne podejście z krokami do reprodukcji, oceną wpływu i jasnymi kryteriami akceptacji.

### Istotnym aspektem rozwoju oprogramowania jest właściwe raportowaniu błędów

**[ECOMMERCE] Błąd logiczny przy anulowaniu zamówienia: wyjątek przy próbie zwrotu całkowitej kwoty ("Wartość refundacji nie może przekraczać kwoty zamówienia")**

Podczas wykonywania procedury anulowania zamówienia z opcją "Zwróć całość", w logach systemu pojawia się wyjątek, który wskazuje na próbę zwiększenia kwoty refundacji powyżej pierwotnej wartości zamówienia. Pomimo tego, operacja w interfejsie użytkownika kończy się poprawnie i żaden komunikat błędu nie jest prezentowany klientowi.

**Kroki do odtworzenia:**

1. Wejdź do sekcji _Zamówienia_
2. Wykonaj anulowanie z opcją "Zwróć całość".
3. Obserwuj zachowanie aplikacji (brak błędów w interfejsie) oraz logi systemowe (wyjątek).

**Oczekiwane zachowanie:**

- System powinien odpowiednio obsługiwać logikę refundacji kwoty zamówienia.
- Wartość refundacji nie powinna przekraczać oryginalnej kwoty zamówienia.
- Operacja powinna zakończyć się bez wyjątku w logach systemu.

**Rzeczywiste zachowanie:**

- Operacja anulowania z opcją "Zwróć całość" kończy się poprawnie w interfejsie, ale w logach generowany jest wyjątek.

**Logi błędu:**

```
ERROR [RefundService] (transaction-handler-thread): Transaction error: Value cannot exceed original order amount
at com.onlineshop.payment.RefundProcessor.validateAmount(RefundProcessor.java:142)
at com.onlineshop.payment.RefundProcessor.processFullRefund(RefundProcessor.java:89)
```

---

**[INVENTORY] Status produktu w panelu Magazyn nie aktualizuje się po zatwierdzeniu dostawy**

W panelu **Magazyn** na liście produktów, po zatwierdzeniu dostawy, status produktu pozostaje jako "oczekiwanie na dostawę" i nie aktualizuje się na prawidłowy status. Problem ten utrudnia monitorowanie stanu magazynu i może wprowadzać w błąd pracowników magazynu.

**Kroki do odtworzenia:**

1. Przejdź do panelu **Magazyn**.
2. Zatwierdź dostawę produktu, co powinno zmienić status produktu.
3. Sprawdź listę produktów - zauważ, że status pozostaje ustawiony jako "oczekiwanie na dostawę", mimo że operacja przyjęcia dostawy została zakończona.

**Oczekiwane zachowanie:**
Po zatwierdzeniu dostawy status produktu powinien zostać zaktualizowany do właściwego stanu (np. "dostępny", "w magazynie" lub inny odpowiedni status) w celu jednoznacznego wskazania aktualnej dostępności produktu.

**Rzeczywiste zachowanie:**
Status produktu nie zmienia się automatycznie po zatwierdzeniu dostawy, co powoduje niespójność między rzeczywistym stanem magazynowym a informacją wyświetlaną w systemie.

---

**[USER MANAGEMENT] "Menu kontekstowe w sekcji Użytkownicy nie znika po kliknięciu w opcję aktywacji konta"**

W widoku **Zarządzania użytkownikami** menu kontekstowe rozwija się poprawnie po kliknięciu prawym przyciskiem myszy na profilu użytkownika. Problem polega na tym, że po kliknięciu w opcję aktywacji konta (zarówno dla pojedynczego użytkownika, jak i przy próbie aktywacji wielu kont jednocześnie) menu kontekstowe nie zamyka się. Powinno ono automatycznie zniknąć, gdy użytkownik kliknie poza menu, lub gdy wybierze opcję z menu.

**Kroki do odtworzenia:**

1. Przejdź do sekcji **Użytkownicy** i otwórz widok **Zarządzanie użytkownikami**.
2. Kliknij prawym przyciskiem myszy na profilu użytkownika, aby wywołać menu kontekstowe - menu się rozwija.
3. Kliknij w opcję "Aktywuj konto" w menu kontekstowym.
4. Obserwuj, że menu kontekstowe nie znika, mimo że powinno zostać zamknięte.

**Oczekiwane zachowanie:**
Menu kontekstowe powinno automatycznie znikać po wykonaniu następujących akcji:

- Kliknięciu w dowolną opcję menu (np. "Aktywuj konto", "Zablokuj dostęp").
- Kliknięciu poza menu.
- Naciśnięciu klawisza Escape.

**Rzeczywiste zachowanie:**
Menu kontekstowe pozostaje widoczne i nie reaguje na kliknięcia w opcje menu ani na kliknięcia poza menu, co powoduje dezorientację administratora i wymaga dodatkowych działań w celu zamknięcia menu.

### Wiadomą rzeczą jest dołączenie do zgłoszenia zrzutu ekranu czy też krótkiego video / zawartości konsoli czy network z devtools.

---


## Playwright Class Patterns

**URL:** https://portfolio.sdet.it/how-i-do-it/playwright-class
**Published:** 2025-03-12
**Language:** en
Tags: playwright, cdat, patterns, typescript

How I structure Playwright test classes - Page Object reinvented, 4-layer separation (precursor to CDAT Pattern).

## Advanced Object-Oriented Approach to Automated Tests with TypeScript

## Table of contents

1. [Introduction](#introduction)
2. [Solution Architecture](#solution-architecture)
3. [Design Patterns](#design-patterns)
4. [Key Components](#key-components)
5. [Practical Application](#practical-application)
6. [Type and Interface Management](#type-and-interface-management)
7. [Advanced TypeScript Mechanisms](#advanced-typescript-mechanisms)
8. [Summary](#summary)

## Introduction

The presented solution demonstrates an advanced approach to creating automated tests using TypeScript and Playwright. The main goal was to create a reusable, easy-to-maintain, and extensible architecture for testing user interfaces, with special emphasis on data filtering operations in web applications.

The framework utilizes modern design practices such as:

- Object-oriented programming
- Strategy design pattern
- Abstraction and separation of concerns
- Interfaces and generic classes
- Static typing

## Solution Architecture

The solution is based on a multi-layered architecture that separates:

1. **Interfaces** - defining contracts for implementing classes
2. **Abstract classes** - providing base functionality
3. **Concrete implementations** - specific to tested views
4. **Action classes** - implementing UI interaction logic

Project structure diagram:

```
├── common/
│   ├── IElementComponents.ts      # Basic interfaces
│   ├── BaseElementComponents.ts   # Abstract class
│   └── elementActions.ts          # Main action class
├── module-specific/
│   ├── components.ts              # Concrete implementation for the module
│   └── test.ts                    # Tests for the given module
└── utils/
    └── index.ts                   # Helper tools
```

## Design Patterns

### 1. Strategy Pattern

The solution intensively uses the strategy pattern, where different algorithms (strategies) are encapsulated and can be exchanged. Implementation example:

```typescript
// Strategies for filling different types of fields
const fillStrategy: Record<
  ElementType,
  (key: T, val: RandomDataValue, elementId: string) => Promise<void>
> = {
  [ElementType.NUMERIC_RANGE]: this.fillNumericRange.bind(this),
  [ElementType.MULTISELECT]: this.fillMultiselect.bind(this),
  [ElementType.TEXT]: this.fillText.bind(this),
  [ElementType.DATE_RANGE]: this.fillDateRange.bind(this),
  // Other strategies...
};

// Using the appropriate strategy
await fillStrategy[type](key, searchValue, elementId);
```

### 2. Template Method Pattern

The abstract class `BaseElementComponents` defines the algorithm skeleton, delegating the implementation of specific steps to subclasses:

```typescript
export abstract class BaseElementComponents<
  T extends string | number,
> implements IElementComponents<T> {
  // Common implementations
  public getDataFieldLocator(dataField: string): Locator {
    return this.page.locator(`[data-field="${dataField}"]`);
  }

  // Abstract methods to be implemented by subclasses
  abstract getFilterOptionByIndex(index: T): Locator;
  abstract getFilterTextInput(elementId: string, type?: ElementType): Locator;
  // Other methods...
}
```

### 3. Inversion of Control and Dependency Injection

Action classes accept dependencies through the constructor, which facilitates testing and increases flexibility:

```typescript
export class TestActions<T extends string | number> {
  constructor(
    private page: Page,
    private elements: IElementComponents<T>,
  ) {}

  // Methods using injected dependencies
}
```

## Key Components

### IElementComponents - Base Interface

Defines the basic contract that all UI components must implement:

```typescript
export interface IElementComponents<T extends string | number> {
  mainButton: Locator;
  closeIcons: Locator;
  elementDefinitions: Record<T, ElementDefinition>;

  getOptionByIndex(index: T): Locator;
  getApplyButton(elementId: string): Locator;
  getCancelButton(elementId: string): Locator;
  getInputField(elementId: string, type?: ElementType): Locator;
  getRangeInput(elementId: string, type: ElementType): { from: Locator; to: Locator };
  getDataFieldLocator(dataField: string): Locator;
}
```

### BaseElementComponents - Abstract Class

Provides a partial implementation of the interface, leaving specific elements to be implemented by derived classes:

```typescript
export abstract class BaseElementComponents<
  T extends string | number,
> implements IElementComponents<T> {
  abstract mainButton: Locator;
  abstract closeIcons: Locator;
  abstract elementDefinitions: Record<T, ElementDefinition>;

  // Implementation of common methods

  protected constructor(protected page: Page) {}

  public getDataFieldLocator(dataField: string): Locator {
    return this.page.locator(`[data-field="${dataField}"]`);
  }

  // Remaining abstract methods...
}
```

### ModuleSpecificComponents - Concrete Implementation

Implements the abstract base class, providing module-specific selectors and functions:

```typescript
export class ModuleSpecificComponents extends BaseElementComponents<TestElementIndex> {
  constructor(page: Page) {
    super(page);
  }

  // Implementation of module-specific selectors
  public readonly elementDefinitions: Record<TestElementIndex, ElementDefinition> = {
    [TestElementIndex.IDENTIFIER]: {
      label: 'Identifier',
      locator: () => this.identifierElement,
      dataField: 'identifier',
      type: ElementType.TEXT,
    },
    // Other element definitions...
  };

  // Getters for element selectors
  get mainButton(): Locator {
    return this.page.locator(this.MAIN_BUTTON_SELECTOR);
  }

  // Remaining implementations consistent with the interface
}
```

### TestActions - Main Action Class

Central class implementing operations performed on interface elements:

```typescript
export class TestActions<T extends string | number> {
  constructor(
    private page: Page,
    private elements: IElementComponents<T>,
  ) {}

  /**
   * Gets the number of defined elements.
   */
  public getElementCount(): number {
    return Object.keys(this.elements.elementDefinitions).length;
  }

  /**
   * Opens an element and optionally pins it.
   * @param key - Key of the element to open
   * @param pin - Whether the element should be pinned
   */
  public async openElement(key: T, pin: boolean = false): Promise<Locator> {
    // Implementation
  }

  /**
   * Extracts values from a data field based on element type.
   */
  public async extractAllDataValues(elementIndex: T, option: TestOption): Promise<string[]> {
    // Data extraction implementation
  }

  // Other action methods...
}
```

## Practical Application

Example of a test using the created architecture:

```typescript
test('Element1.GivenUserIsOnPage_WhenApplyingFilter_ThenListIsFiltered @regression', async () => {
  // ARRANGE
  const elementIndex = TestElementIndex.IDENTIFIER;
  const searchValue = await testActions.getRandomValue(elementIndex);

  // ACT
  await testActions.applyElementAndCompareLabel(elementIndex, searchValue);
  const result = await testActions.verifyFilteredResults(elementIndex, searchValue);

  // ASSERT
  expect(result).toBeTruthy();
});
```

## Type and Interface Management

### Enumeration Types (Enums)

Define available options and states:

```typescript
export enum ElementType {
  TEXT = 'text',
  MULTISELECT = 'multiselect',
  NUMERIC_RANGE = 'numericRange',
  DATE_RANGE = 'dateRange',
  SWITCH = 'switch',
  STATUS = 'status',
}

export enum TestOption {
  SHORT_TEXT = 'SHORT_TEXT',
  FULL_TEXT = 'FULL_TEXT',
  WITH_EMPTY = 'WITH_EMPTY',
}

export enum RangeOption {
  Both = 'both',
  FromOnly = 'fromOnly',
  ToOnly = 'toOnly',
}
```

### Type Guards

Ensure safe operations on types:

```typescript
// Complex type
type RandomDataValue = string | { from: number; to: number } | { from: string; to: string } | null;

// Type guards
function isRangeValue(val: any): val is { from: string; to: string } {
  return typeof val === 'object' && val !== null && 'from' in val && 'to' in val;
}

function isStringValue(val: any): val is string {
  return typeof val === 'string';
}

// Example usage
if (isRangeValue(value)) {
  // TypeScript knows that value has from and to properties
  console.log(value.from, value.to);
} else if (isStringValue(value)) {
  // TypeScript knows that value is a string
  console.log(value.toUpperCase());
}
```

## Advanced TypeScript Mechanisms

### Generic Types

The solution intensively uses generic types to ensure flexibility and type safety:

```typescript
export class TestActions<T extends string | number> {
  // T is a generic parameter constrained to string or number
  // This allows using enums as indexes for objects
}
```

### Type Mapping and Records

Used to create object types with dynamic keys:

```typescript
// Dynamic type mapping
const rangeExtractors: { [key in ElementType]?: (values: string[], side: RangeOption) => RandomDataValue } = {
  [ElementType.NUMERIC_RANGE]: this.getNumericRangeValue.bind(this),
  [ElementType.DATE_RANGE]: this.getDateRangeValue.bind(this),
};

// Typical records
private labelFormatters: Record<ElementType, (label: string, value: RandomDataValue) => string> = {
  [ElementType.TEXT]: (label, value) => `${label}:Contains "${value}"`,
  [ElementType.NUMERIC_RANGE]: (label, value) => {
    // Implementation of formatting for a numeric range
  },
  // Other formatters...
};
```

### Destructuring and Spread

Used for readable object manipulation:

```typescript
// Destructuring
const { searchArea, applyButton } = await this.getElementLocators(elementId, type);

// Spread operator
const filteredTexts = Array.from(new Set([...existingTexts, ...additionalTexts]));
```

## Summary

The presented solution demonstrates advanced programming skills in:

1. **Object-oriented design** - proper application of inheritance, interfaces, and abstraction
2. **Typing** - effective use of TypeScript's type system to ensure type safety
3. **Design patterns** - implementation of recognized patterns to increase code flexibility and reusability
4. **Clean Code** - code structuring according to SOLID and DRY principles
5. **Testing** - tests that are readable, concise, and easy to maintain

This solution can be easily extended with new modules and element types while maintaining architectural consistency and high code quality.

---

---


## Wzorce klas w Playwright

**URL:** https://portfolio.sdet.pl/how-i-do-it/playwright-class
**Published:** 2025-03-12
**Language:** pl
Tags: playwright, cdat, patterns, typescript

Jak struktura­lizuję klasy testów Playwright - Page Object na nowo, 4-warstwowa separacja (prekursor wzorca CDAT).

## Zaawansowane Podejście Obiektowe w Testach Automatycznych z TypeScript

## Spis treści

1. [Wprowadzenie](#wprowadzenie)
2. [Architektura Rozwiązania](#architektura-rozwiązania)
3. [Wzorce Projektowe](#wzorce-projektowe)
4. [Kluczowe Komponenty](#kluczowe-komponenty)
5. [Praktyczne Zastosowanie](#praktyczne-zastosowanie)
6. [Zarządzanie Typami i Interfejsami](#zarządzanie-typami-i-interfejsami)
7. [Zaawansowane Mechanizmy TypeScript](#zaawansowane-mechanizmy-typescript)
8. [Podsumowanie](#podsumowanie)

## Wprowadzenie

Przedstawione rozwiązanie demonstruje zaawansowane podejście do tworzenia testów automatycznych z wykorzystaniem TypeScript i Playwright. Głównym celem było stworzenie reużywalnej, łatwej w utrzymaniu i rozszerzalnej architektury do testowania interfejsu użytkownika, ze szczególnym uwzględnieniem operacji filtrowania danych w aplikacjach webowych.

Framework wykorzystuje nowoczesne praktyki projektowe takie jak:

- Programowanie zorientowane obiektowo
- Wzorzec projektowy strategii
- Abstrakcja i separacja odpowiedzialności
- Interfejsy i klasy generyczne
- Typowanie statyczne

## Architektura Rozwiązania

Rozwiązanie opiera się na wielowarstwowej architekturze, która oddziela:

1. **Interfejsy** - definiujące kontrakty dla klas implementujących
2. **Klasy abstrakcyjne** - dostarczające bazową funkcjonalność
3. **Konkretne implementacje** - specyficzne dla testowanych widoków
4. **Klasy akcji** - implementujące logikę interakcji z UI

Diagram struktury projektu:

```
├── common/
│   ├── IElementComponents.ts      # Interfejsy podstawowe
│   ├── BaseElementComponents.ts   # Klasa abstrakcyjna
│   └── elementActions.ts          # Główna klasa akcji
├── module-specific/
│   ├── components.ts              # Konkretna implementacja dla modułu
│   └── test.ts                    # Testy dla danego modułu
└── utils/
    └── index.ts                   # Narzędzia pomocnicze
```

## Wzorce Projektowe

### 1. Wzorzec Strategii

Rozwiązanie intensywnie korzysta z wzorca strategii, gdzie różne algorytmy (strategie) są hermetyzowane i mogą być wymieniane. Przykład implementacji:

```typescript
// Strategie wypełniania różnych typów pól
const fillStrategy: Record<
  ElementType,
  (key: T, val: RandomDataValue, elementId: string) => Promise<void>
> = {
  [ElementType.NUMERIC_RANGE]: this.fillNumericRange.bind(this),
  [ElementType.MULTISELECT]: this.fillMultiselect.bind(this),
  [ElementType.TEXT]: this.fillText.bind(this),
  [ElementType.DATE_RANGE]: this.fillDateRange.bind(this),
  // Inne strategie...
};

// Użycie odpowiedniej strategii
await fillStrategy[type](key, searchValue, elementId);
```

### 2. Wzorzec Szablonowy (Template Method)

Klasa abstrakcyjna `BaseElementComponents` definiuje szkielet algorytmu, delegując implementację konkretnych kroków do podklas:

```typescript
export abstract class BaseElementComponents<
  T extends string | number,
> implements IElementComponents<T> {
  // Wspólne implementacje
  public getDataFieldLocator(dataField: string): Locator {
    return this.page.locator(`[data-field="${dataField}"]`);
  }

  // Metody abstrakcyjne do implementacji przez podklasy
  abstract getFilterOptionByIndex(index: T): Locator;
  abstract getFilterTextInput(elementId: string, type?: ElementType): Locator;
  // Inne metody...
}
```

### 3. Inversion of Control i Wstrzykiwanie Zależności

Klasy akcji przyjmują zależności poprzez konstruktor, co ułatwia testowanie i zwiększa elastyczność:

```typescript
export class TestActions<T extends string | number> {
  constructor(
    private page: Page,
    private elements: IElementComponents<T>,
  ) {}

  // Metody wykorzystujące wstrzyknięte zależności
}
```

## Kluczowe Komponenty

### IElementComponents - Interfejs Bazowy

Definiuje podstawowy kontrakt, który muszą implementować wszystkie komponenty UI:

```typescript
export interface IElementComponents<T extends string | number> {
  mainButton: Locator;
  closeIcons: Locator;
  elementDefinitions: Record<T, ElementDefinition>;

  getOptionByIndex(index: T): Locator;
  getApplyButton(elementId: string): Locator;
  getCancelButton(elementId: string): Locator;
  getInputField(elementId: string, type?: ElementType): Locator;
  getRangeInput(elementId: string, type: ElementType): { from: Locator; to: Locator };
  getDataFieldLocator(dataField: string): Locator;
}
```

### BaseElementComponents - Klasa Abstrakcyjna

Dostarcza częściową implementację interfejsu, pozostawiając specyficzne elementy do implementacji przez klasy pochodne:

```typescript
export abstract class BaseElementComponents<
  T extends string | number,
> implements IElementComponents<T> {
  abstract mainButton: Locator;
  abstract closeIcons: Locator;
  abstract elementDefinitions: Record<T, ElementDefinition>;

  // Implementacja wspólnych metod

  protected constructor(protected page: Page) {}

  public getDataFieldLocator(dataField: string): Locator {
    return this.page.locator(`[data-field="${dataField}"]`);
  }

  // Pozostałe metody abstrakcyjne...
}
```

### ModuleSpecificComponents - Konkretna Implementacja

Implementuje abstrakcyjną klasę bazową, dostarczając specyficzne dla modułu selektory i funkcje:

```typescript
export class ModuleSpecificComponents extends BaseElementComponents<TestElementIndex> {
  constructor(page: Page) {
    super(page);
  }

  // Implementacja selektorów specyficznych dla modułu
  public readonly elementDefinitions: Record<TestElementIndex, ElementDefinition> = {
    [TestElementIndex.IDENTIFIER]: {
      label: 'Identyfikator',
      locator: () => this.identifierElement,
      dataField: 'identifier',
      type: ElementType.TEXT,
    },
    // Inne definicje elementów...
  };

  // Gettery dla selektorów elementów
  get mainButton(): Locator {
    return this.page.locator(this.MAIN_BUTTON_SELECTOR);
  }

  // Pozostałe implementacje zgodne z interfejsem
}
```

### TestActions - Główna Klasa Akcji

Centralna klasa implementująca operacje wykonywane na elementach interfejsu:

```typescript
export class TestActions<T extends string | number> {
  constructor(
    private page: Page,
    private elements: IElementComponents<T>,
  ) {}

  /**
   * Pobiera liczbę zdefiniowanych elementów.
   */
  public getElementCount(): number {
    return Object.keys(this.elements.elementDefinitions).length;
  }

  /**
   * Otwiera element i opcjonalnie przypina go.
   * @param key - Klucz elementu do otwarcia
   * @param pin - Czy element powinien zostać przypięty
   */
  public async openElement(key: T, pin: boolean = false): Promise<Locator> {
    // Implementacja
  }

  /**
   * Ekstrahuje wartości z pola danych na podstawie typu elementu.
   */
  public async extractAllDataValues(elementIndex: T, option: TestOption): Promise<string[]> {
    // Implementacja ekstrakcji danych
  }

  // Inne metody akcji...
}
```

## Praktyczne Zastosowanie

Przykład testu wykorzystującego stworzoną architekturę:

```typescript
test('Element1.GivenUserIsOnPage_WhenApplyingFilter_ThenListIsFiltered @regression', async () => {
  // ARRANGE
  const elementIndex = TestElementIndex.IDENTIFIER;
  const searchValue = await testActions.getRandomValue(elementIndex);

  // ACT
  await testActions.applyElementAndCompareLabel(elementIndex, searchValue);
  const result = await testActions.verifyFilteredResults(elementIndex, searchValue);

  // ASSERT
  expect(result).toBeTruthy();
});
```

## Zarządzanie Typami i Interfejsami

### Typy Wyliczeniowe (Enums)

Definiują dostępne opcje i stany:

```typescript
export enum ElementType {
  TEXT = 'text',
  MULTISELECT = 'multiselect',
  NUMERIC_RANGE = 'numericRange',
  DATE_RANGE = 'dateRange',
  SWITCH = 'switch',
  STATUS = 'status',
}

export enum TestOption {
  SHORT_TEXT = 'SHORT_TEXT',
  FULL_TEXT = 'FULL_TEXT',
  WITH_EMPTY = 'WITH_EMPTY',
}

export enum RangeOption {
  Both = 'both',
  FromOnly = 'fromOnly',
  ToOnly = 'toOnly',
}
```

### Strażnicy Typów (Type Guards)

Zapewniają bezpieczne operacje na typach:

```typescript
// Typ złożony
type RandomDataValue = string | { from: number; to: number } | { from: string; to: string } | null;

// Strażnicy typu
function isRangeValue(val: any): val is { from: string; to: string } {
  return typeof val === 'object' && val !== null && 'from' in val && 'to' in val;
}

function isStringValue(val: any): val is string {
  return typeof val === 'string';
}

// Przykład użycia
if (isRangeValue(value)) {
  // TypeScript wie, że value ma właściwości from i to
  console.log(value.from, value.to);
} else if (isStringValue(value)) {
  // TypeScript wie, że value jest stringiem
  console.log(value.toUpperCase());
}
```

## Zaawansowane Mechanizmy TypeScript

### Typy Generyczne

Rozwiązanie intensywnie wykorzystuje typy generyczne do zapewnienia elastyczności i typowej bezpieczeństwa:

```typescript
export class TestActions<T extends string | number> {
  // T jest parametrem generycznym ograniczonym do string lub number
  // Umożliwia to używanie enumów jako indeksów do obiektów
}
```

### Mapowanie Typów i Rekordy

Używane do tworzenia typów obiektów o dynamicznych kluczach:

```typescript
// Dynamiczne mapowanie typów
const rangeExtractors: { [key in ElementType]?: (values: string[], side: RangeOption) => RandomDataValue } = {
  [ElementType.NUMERIC_RANGE]: this.getNumericRangeValue.bind(this),
  [ElementType.DATE_RANGE]: this.getDateRangeValue.bind(this),
};

// Typowe rekordy
private labelFormatters: Record<ElementType, (label: string, value: RandomDataValue) => string> = {
  [ElementType.TEXT]: (label, value) => `${label}:Contains "${value}"`,
  [ElementType.NUMERIC_RANGE]: (label, value) => {
    // Implementacja formatowania dla zakresu liczbowego
  },
  // Inne formatery...
};
```

### Destrukturyzacja i Spread

Używane do czytelnej manipulacji obiektami:

```typescript
// Destrukturyzacja
const { searchArea, applyButton } = await this.getElementLocators(elementId, type);

// Spread operator
const filteredTexts = Array.from(new Set([...existingTexts, ...additionalTexts]));
```

## Podsumowanie

Przedstawione rozwiązanie demonstruje zaawansowane umiejętności programistyczne w zakresie:

1. **Projektowania obiektowego** - prawidłowe zastosowanie dziedziczenia, interfejsów i abstrakcji
2. **Typowania** - skuteczne wykorzystanie systemu typów TypeScript w celu zapewnienia bezpieczeństwa typów
3. **Wzorców projektowych** - implementacja uznanych wzorców w celu zwiększenia elastyczności i reużywalności kodu
4. **Clean Code** - strukturyzacja kodu zgodnie z zasadami SOLID i DRY
5. **Testowania** - testy czytelne, zwięzłe i łatwe w utrzymaniu

Rozwiązanie to można łatwo rozszerzać o nowe moduły i typy elementów, zachowując spójność architektury i wysoką jakość kodu.

---

---


## Test Architecture

**URL:** https://portfolio.sdet.it/how-i-do-it/test-architecture
**Published:** 2025-03-14
**Language:** en
Tags: architecture, testing, cdat, scalability

How I design test architecture - layers, ownership, scalability across 3000+ tests in production without maintenance nightmare.

## Test Architecture Based on a Hybrid Approach of Vertical Slice and Page Object Model Using Playwright

```mermaid
flowchart TD
    Start([Start]) --> ProjectStructure[Project Structure Setup]

    ProjectStructure --> AppDir[app/]
    ProjectStructure --> FeaturesDir[features/]

    AppDir --> ConfigFile[config.ts]
    AppDir --> ComponentsDir[components/]

    ComponentsDir --> InputComp[input/]
    ComponentsDir --> TableComp[table/]
    ComponentsDir --> OtherComp[other components...]

    FeaturesDir --> UserManagement[user-management/]

    UserManagement --> CreateUser[create-user/]
    UserManagement --> UserProfile[user-profile/]

    CreateUser --> CrUserCompFile[components.ts]
    CreateUser --> CrUserActFile[actions.ts]
    CreateUser --> CrUserDataFile[data.ts]
    CreateUser --> CrUserTestFile[test.ts]

    UserProfile --> GeneralInfo[general-info/]
    UserProfile --> PermissionSettings[permission-settings/]

    GeneralInfo --> GenInfoCompFile[components.ts]
    GeneralInfo --> GenInfoActFile[actions.ts]
    GeneralInfo --> GenInfoDataFile[data.ts]
    GeneralInfo --> GenInfoTestFile[test.ts]

    PermissionSettings --> PermCompFile[components.ts]
    PermissionSettings --> PermActFile[actions.ts]
    PermissionSettings --> PermDataFile[data.ts]
    PermissionSettings --> PermTestFile[test.ts]

    subgraph Dependencies [Dependencies between files]
        Components[Components - No dependencies]
        Data[Data - No dependencies]
        Actions[Actions - Depends on Components and Data]
        Tests[Tests - Depends on Components, Data and Actions]
    end

    CrUserTestFile --> TestExecution[Test Execution]
    GenInfoTestFile --> TestExecution
    PermTestFile --> TestExecution

    TestExecution --> ReportGen[Report Generation]
    ReportGen --> End([End])
```

My approach to automated testing is based on a hybrid architecture combining the concepts of Vertical Slice and Page Object Model (POM). It's important to distinguish between these two concepts:

- **Vertical Slice Architecture** is an application architecture pattern that organizes code by business functionality (vertically) rather than by technical layers (horizontally). Traditionally, it's used in application development, not in testing.

- **Page Object Model (POM)** is a classic design pattern in automated testing where each page of the application is represented as a separate class with methods to interact with the elements on that page.

My approach combines these concepts: I organize test code around business functionalities (as in Vertical Slice), but within each functionality, I apply a structure similar to POM with a clear separation of responsibilities.

#### Project Structure

```
├── app/
│   ├── config.ts
│   └── components/
│       ├── input/
│       ├── table/
│       └── ...
└── features/
    └── user-management/
        ├── create-user/
        │   ├── components.ts
        │   ├── actions.ts
        │   ├── data.ts
        │   └── test.ts
        └── user-profile/
            ├── general-info/
            │   ├── components.ts
            │   ├── actions.ts
            │   ├── data.ts
            │   └── test.ts
            └── permission-settings/
                ├── components.ts
                ├── actions.ts
                ├── data.ts
                └── test.ts
```

#### Responsibilities of Individual Files

Each functionality module contains four key types of files with a strict separation of responsibilities:

1. **Components (components.ts)**

- Contains only UI element locators (similar to POM)
- No dependencies on other files
- Example:

  ```typescript
  export class CreateUserComponents {
    readonly addButton = this.page.locator('text="+ Create User"');
    readonly nameField = this.page.locator('[data-testid="name-field"]');
    readonly saveButton = this.page.locator('text="Save"');

    constructor(private page: Page) {}
  }
  ```

2. **Data (data.ts)**

- Contains test data and required types
- No dependencies on other files
- Example:

  ```typescript
  export const UserData = {
    Valid: {
      role: 'admin',
      name: 'John Smith',
      email: 'john.smith@example.com',
      // other data
    },
    Invalid: {
      EmptyName: {
        role: 'admin',
        name: '',
        email: 'john.smith@example.com',
        // other data
      },
      // other sets of invalid data
    },
  };
  ```

3. **Actions (actions.ts)**

- Contains page interactions without assertions (equivalent to methods in POM)
- Depends on Components and Data
- Example:

  ```typescript
  export class CreateUserActions {
    private components: CreateUserComponents;

    constructor(private page: Page) {
      this.components = new CreateUserComponents(page);
    }

    async fillForm(data: typeof UserData.Valid) {
      await this.components.nameField.fill(data.name);
      // filling other fields
    }

    async submitForm() {
      await this.components.saveButton.click();
    }
  }
  ```

4. **Tests (test.ts)**

- Contains test cases with assertions
- Depends on Components, Data, and Actions
- Example:

```typescript
test.describe('CreateUser', () => {
  test.beforeEach(async ({ page }) => {
    await new AuthActions(page).loginAsAdmin();
  });

  test('TC_User_001.GivenValidUserData_WhenSubmitForm_ThenUserIsCreated', async ({ page }) => {
    const { fillForm, submitForm } = new CreateUserActions(page);
    await fillForm(UserData.Valid);
    await submitForm();
    await expect(page.locator('.notification')).toContainText('User created successfully');
  });

  test('TC_User_002.GivenMissingName_WhenSubmitForm_ThenErrorDisplayed', async ({ page }) => {
    const { fillForm, submitForm } = new CreateUserActions(page);
    await fillForm(UserData.Invalid.EmptyName);
    await submitForm();
    await expect(page.locator('.field-error')).toBeVisible();
  });
});
```

I try to avoid using page.locator and hardcoded string/number data in tests.
Locators belong to components, and data to data files - this makes modification in one place easier.

**This is an example - real code is in a GitHub repository.**

#### Differences from Standard POM

In the classic Page Object Model:

- Code is organized around pages/views (e.g., LoginPage, DashboardPage)
- Each Page Object class contains both locators and methods for interaction

In my approach:

- Code is organized around business functionalities (e.g., create-user, user-profile)
- For each functionality, we apply an additional division into components, actions, data, and tests

#### Benefits of this Architecture

1. **Clear Separation of Responsibilities**

- Each file has a single responsibility
- Dependencies flow in one direction

2. **Reusability**

- Components and actions can be reused across multiple tests
- Data patterns can be templated and extended

3. **Maintainability**

- Locator changes need to be updated only in component files
- Business logic changes affect only action files

4. **Readability**

- Tests follow the Given-When-Then pattern
- Descriptive test names provide documentation

5. **Scalability**

- New features can be added without modifying existing ones
- Common patterns can be standardized across the codebase

### This hybrid architecture works particularly well for testing complex applications, especially when dealing with functionalities that have multiple states and variants, such as the user management system described above.

---


## Architektura testów

**URL:** https://portfolio.sdet.pl/how-i-do-it/test-architecture
**Published:** 2025-03-14
**Language:** pl
Tags: architecture, testing, cdat, scalability

Jak projektuję architekturę testów - warstwy, odpowiedzialność, skalowalność na 3000+ testów w produkcji bez koszmaru utrzymania.

## Architektura testów oparta na hybrydowym podejściu Vertical Slice i Page Object Model z wykorzystaniem Playwright

```mermaid
flowchart TD
    Start([Start]) --> ProjectStructure[Project Structure Setup]

    ProjectStructure --> AppDir[app/]
    ProjectStructure --> FeaturesDir[features/]

    AppDir --> ConfigFile[config.ts]
    AppDir --> ComponentsDir[components/]

    ComponentsDir --> InputComp[input/]
    ComponentsDir --> TableComp[table/]
    ComponentsDir --> OtherComp[other components...]

    FeaturesDir --> UserManagement[user-management/]

    UserManagement --> CreateUser[create-user/]
    UserManagement --> UserProfile[user-profile/]

    CreateUser --> CrUserCompFile[components.ts]
    CreateUser --> CrUserActFile[actions.ts]
    CreateUser --> CrUserDataFile[data.ts]
    CreateUser --> CrUserTestFile[test.ts]

    UserProfile --> GeneralInfo[general-info/]
    UserProfile --> PermissionSettings[permission-settings/]

    GeneralInfo --> GenInfoCompFile[components.ts]
    GeneralInfo --> GenInfoActFile[actions.ts]
    GeneralInfo --> GenInfoDataFile[data.ts]
    GeneralInfo --> GenInfoTestFile[test.ts]

    PermissionSettings --> PermCompFile[components.ts]
    PermissionSettings --> PermActFile[actions.ts]
    PermissionSettings --> PermDataFile[data.ts]
    PermissionSettings --> PermTestFile[test.ts]

    subgraph Dependencies [Dependencies between files]
        Components[Components - No dependencies]
        Data[Data - No dependencies]
        Actions[Actions - Depends on Components and Data]
        Tests[Tests - Depends on Components, Data and Actions]
    end

    CrUserTestFile --> TestExecution[Test Execution]
    GenInfoTestFile --> TestExecution
    PermTestFile --> TestExecution

    TestExecution --> ReportGen[Report Generation]
    ReportGen --> End([End])
```

Moje podejście do testów automatycznych opiera się na hybrydowej architekturze łączącej koncepcje Vertical Slice i Page Object Model (POM). Warto rozróżnić te dwa pojęcia:

- **Vertical Slice Architecture** to wzorzec architektury aplikacji organizujący kod według funkcjonalności biznesowych (pionowo), a nie warstw technicznych (poziomo). Tradycyjnie jest stosowany w rozwoju aplikacji, nie w testach.

- **Page Object Model (POM)** to klasyczny wzorzec projektowy w testach automatycznych, gdzie każda strona aplikacji jest reprezentowana jako osobna klasa z metodami do interakcji z elementami tej strony.

Moje podejście łączy te koncepcje: organizuję kod testowy wokół funkcjonalności biznesowych (jak w Vertical Slice), ale wewnątrz każdej funkcjonalności stosuję strukturę podobną do POM z wyraźnym podziałem odpowiedzialności.

#### Struktura projektu

```
├── app/
│   ├── config.ts
│   └── components/
│       ├── input/
│       ├── table/
│       └── ...
└── features/
    └── user-management/
        ├── create-user/
        │   ├── components.ts
        │   ├── actions.ts
        │   ├── data.ts
        │   └── test.ts
        └── user-profile/
            ├── general-info/
            │   ├── components.ts
            │   ├── actions.ts
            │   ├── data.ts
            │   └── test.ts
            └── permission-settings/
                ├── components.ts
                ├── actions.ts
                ├── data.ts
                └── test.ts
```

#### Odpowiedzialność poszczególnych plików

Każdy moduł funkcjonalności zawiera cztery kluczowe typy plików z ścisłym podziałem odpowiedzialności:

1. **Components (components.ts)**

- Zawiera tylko lokatory elementów UI (podobnie jak w POM)
- Brak zależności od innych plików
- Przykład:

  ```typescript
  export class CreateUserComponents {
    readonly addButton = this.page.locator('text="+ Create User"');
    readonly nameField = this.page.locator('[data-testid="name-field"]');
    readonly saveButton = this.page.locator('text="Save"');

    constructor(private page: Page) {}
  }
  ```

2. **Data (data.ts)**

- Zawiera dane testowe i wymagane typy
- Brak zależności od innych plików
- Przykład:

  ```typescript
  export const UserData = {
    Valid: {
      role: 'admin',
      name: 'John Smith',
      email: 'john.smith@example.com',
      // inne dane
    },
    Invalid: {
      EmptyName: {
        role: 'admin',
        name: '',
        email: 'john.smith@example.com',
        // inne dane
      },
      // inne zestawy niepoprawnych danych
    },
  };
  ```

3. **Actions (actions.ts)**

- Zawiera interakcje ze stroną bez asercji (odpowiednik metod w POM)
- Zależy od Components i Data
- Przykład:

  ```typescript
  export class CreateUserActions {
    private components: CreateUserComponents;

    constructor(private page: Page) {
      this.components = new CreateUserComponents(page);
    }

    async fillForm(data: typeof UserData.Valid) {
      await this.components.nameField.fill(data.name);
      // wypełnianie innych pól
    }

    async submitForm() {
      await this.components.saveButton.click();
    }
  }
  ```

4. **Tests (test.ts)**

- Zawiera przypadki testowe z asercjami
- Zależy od Components, Data i Actions
- Przykład:

```typescript
test.describe('CreateUser', () => {
  test.beforeEach(async ({ page }) => {
    await new AuthActions(page).loginAsAdmin();
  });

  test('TC_User_001.GivenValidUserData_WhenSubmitForm_ThenUserIsCreated', async ({ page }) => {
    const { fillForm, submitForm } = new CreateUserActions(page);
    await fillForm(UserData.Valid);
    await submitForm();
    await expect(page.locator('.notification')).toContainText('User created successfully');
  });

  test('TC_User_002.GivenMissingName_WhenSubmitForm_ThenErrorDisplayed', async ({ page }) => {
    const { fillForm, submitForm } = new CreateUserActions(page);
    await fillForm(UserData.Invalid.EmptyName);
    await submitForm();
    await expect(page.locator('.field-error')).toBeVisible();
  });
});
```

Choć staram się unikać w testach stosowania page.locator i hardcodowanych w testach danych string/number.
Locatory należą do components, a dane do data - ułatwia to modyfikację w jednym miejscu.

**To przykład - real code w repo na github'ie.**

#### Różnice względem standardowego POM

W klasycznym Page Object Model:

- Kod organizowany jest wokół stron/widoków (np. LoginPage, DashboardPage)
- Każda klasa Page Object zawiera zarówno lokatory jak i metody do interakcji

W moim podejściu:

- Kod organizowany jest wokół funkcjonalności biznesowych (np. create-user, user-profile)
- Dla każdej funkcjonalności stosujemy dodatkowy podział na components, actions, data i tests

#### Korzyści tej architektury

1. **Wyraźny podział odpowiedzialności**

- Każdy plik ma jedną odpowiedzialność
- Zależności płyną w jednym kierunku

2. **Możliwość ponownego wykorzystania**

- Komponenty i akcje mogą być ponownie używane w wielu testach
- Wzorce danych można templować i rozszerzać

3. **Łatwość utrzymania**

- Zmiany lokatorów muszą być aktualizowane tylko w plikach komponentów
- Zmiany logiki biznesowej wpływają tylko na pliki akcji

4. **Czytelność**

- Testy podążają za wzorcem Given-When-Then
- Opisowe nazwy testów zapewniają dokumentację

5. **Skalowalność**

- Nowe funkcje można dodawać bez modyfikowania istniejących
- Wspólne wzorce można standaryzować w całej bazie kodu

### Ta hybrydowa architektura sprawdza się szczególnie dobrze w testowaniu złożonych aplikacji, zwłaszcza gdy mamy do czynienia z funkcjonalnościami posiadającymi wiele stanów i wariantów, jak system zarządzania użytkownikami opisany powyżej.

---


## Test Case Writing

**URL:** https://portfolio.sdet.it/how-i-do-it/test-case
**Published:** 2025-03-16
**Language:** en
Tags: qa, test-case, methodology

How I write test cases - Given/When/Then structure, single responsibility, deterministic assertions.

## Test Cases - Invoice Module (light version)

These test cases cover the most important scenarios for the invoice module:

1. **TC-FAK-001** - Successful creation and approval of an invoice:
   - Tests the basic "happy path" for invoice creation
   - Verifies the correctness of amount and tax calculations
   - Checks state transitions from Draft to Approved invoice

2. **TC-FAK-002** - Validation of incorrect invoice data:
   - Tests data validation mechanisms
   - Verifies if the system correctly detects various types of errors
   - Checks error messages

3. **TC-FAK-003** - Cancellation of an issued invoice:
   - Tests the invoice cancellation process
   - Verifies required information (reason for cancellation)
   - Checks the edit lock on a cancelled invoice

#

## Detailed test cases - Invoice Module (solid version)

#### **TC_FAK_001.GivenUserCreatesNewInvoiceAndFillsAllRequiredFields_WhenUserValidatesAndApproves_ThenInvoiceIsSuccessfullyCreated**

**Title:** Successful creation and approval of a new invoice.

**Description**: The test case covers the complete process of creating a new invoice with filling in all required fields and its approval using the graphical user interface.

| **No.** | **Action**                                                                     | **Effect**                                                                                                                                   |
| :-----: | ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
|  **1**  | Click on the invoice icon in the side menu.                                    | The _Invoices_ section opens                                                                                                                 |
|  **2**  | Click on the _New invoice_ button in the upper right corner of the application | Expect a _snackbar_ to appear informing about the successful creation of the invoice. The system creates a new invoice in the "Draft" state. |
|  **3**  | In the _Customer_ section, click the _Select_ button                           | A dialog box with a list of available customers will open.                                                                                   |
|  **4**  | Select a customer from the list (e.g., "XYZ Company")                          | The customer data will be entered into the invoice form.                                                                                     |
|  **5**  | In the _Invoice data_ section, set the issue date to the current day           | The issue date will be updated.                                                                                                              |
|  **6**  | Set the sale date to the current day                                           | The sale date will be updated.                                                                                                               |
|  **7**  | Set the payment term to 14 days from the issue date                            | The payment term will be updated.                                                                                                            |
|  **8**  | In the _Payment method_ section, select: _Bank transfer_                       | The payment method will be set to "Bank transfer".                                                                                           |
|  **9**  | Click the _Add item_ button                                                    | An invoice item addition window will open.                                                                                                   |
| **10**  | In the add item window, select "Product A" from the list                       | The product will be selected.                                                                                                                |
| **11**  | Set the quantity to "2"                                                        | The quantity will be updated.                                                                                                                |
| **12**  | Set the net unit price to "100.00"                                             | The price will be updated.                                                                                                                   |
| **13**  | Select VAT rate "23%"                                                          | The VAT rate will be updated.                                                                                                                |
| **14**  | Click the _Add_ button                                                         | The item will be added to the invoice. The system will automatically calculate the net, VAT, and gross values.                               |
| **15**  | Click the _Verify_ button                                                      | The system will start the invoice data verification process.                                                                                 |
| **16**  | Wait for the verification to complete                                          | The system will change the invoice state to "Approved" and display an appropriate message.                                                   |
| **17**  | Click the _Issue invoice_ button                                               | The invoice will be issued (state change to "Issued") and will receive a number.                                                             |
| **18**  | Click the _Download PDF_ button                                                | The system will generate a PDF file with the invoice.                                                                                        |

**Expected result**: After performing all steps, an invoice will be created, approved, and issued with correctly calculated amounts (net: 200.00, VAT: 46.00, gross: 246.00). The invoice will be available for download in PDF format.

---

#### **TC_FAK_002.GivenUserStartsEditingExistingInvoice_WhenInvalidDataIsEntered_ThenValidationErrorMessagesAreDisplayed**

**Title:** Validation of incorrect data during invoice editing.

**Description**: The test case verifies validation mechanisms during editing of an existing invoice by entering incorrect data in various fields and checking error messages.

| **No.** | **Action**                                                           | **Effect**                                                                                                                                                                                                                    |
| :-----: | -------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|  **1**  | Click on the invoice icon in the side menu.                          | The _Invoices_ section opens                                                                                                                                                                                                  |
|  **2**  | From the invoice list, select an invoice in the "Draft" state        | The invoice details view will open.                                                                                                                                                                                           |
|  **3**  | Click the _Edit_ button                                              | The invoice form will switch to edit mode.                                                                                                                                                                                    |
|  **4**  | In the _Customer_ section, click the _Clear_ button                  | The customer data will be cleared.                                                                                                                                                                                            |
|  **5**  | In the _Invoice data_ section, set the issue date to the current day | The issue date will be updated.                                                                                                                                                                                               |
|  **6**  | Set the sale date to 40 days before the issue date                   | The sale date will be updated.                                                                                                                                                                                                |
|  **7**  | Set the payment term to the day before the issue date                | The payment term will be updated.                                                                                                                                                                                             |
|  **8**  | Click the _Add item_ button                                          | An invoice item addition window will open.                                                                                                                                                                                    |
|  **9**  | In the add item window, select "Product B" from the list             | The product will be selected.                                                                                                                                                                                                 |
| **10**  | Set the quantity to "-2" (negative quantity)                         | The quantity will be entered.                                                                                                                                                                                                 |
| **11**  | Set the net unit price to "100.00"                                   | The price will be updated.                                                                                                                                                                                                    |
| **12**  | Select VAT rate "23%"                                                | The VAT rate will be updated.                                                                                                                                                                                                 |
| **13**  | Click the _Add_ button                                               | The system will display an error message: "Quantity must be greater than zero".                                                                                                                                               |
| **14**  | Correct the quantity to "2" and click the _Add_ button               | The item will be added to the invoice.                                                                                                                                                                                        |
| **15**  | Click the _Verify_ button                                            | The system will start the invoice data verification process and display errors: "Customer is required", "Sale date cannot be earlier than 30 days from the issue date", "Payment term cannot be earlier than the issue date". |
| **16**  | Correct the customer data by selecting a company from the list       | The customer data will be completed.                                                                                                                                                                                          |
| **17**  | Correct the sale date to the current day                             | The sale date will be updated.                                                                                                                                                                                                |
| **18**  | Correct the payment term to 14 days from the issue date              | The payment term will be updated.                                                                                                                                                                                             |
| **19**  | Click the _Verify_ button again                                      | The system will perform verification and change the invoice state to "Approved".                                                                                                                                              |

**Expected result**: The system will correctly identify all errors in the invoice data and display appropriate validation messages. After correcting the errors, the invoice will successfully pass the verification process and will be approved.

---

#### **TC_FAK_003.GivenInvoiceIsInIssuedState_WhenUserSelectsCorrectionOption_ThenCorrectionInvoiceIsCreatedWithProperReference**

**Title:** Creating a correction invoice.

**Description**: The test case covers the process of creating a correction invoice for an existing, issued invoice, including specifying the reason for the correction and linking it to the original invoice.

| **No.** | **Action**                                                                               | **Effect**                                                                                                                                                        |
| :-----: | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|  **1**  | Click on the invoice icon in the side menu.                                              | The _Invoices_ section opens                                                                                                                                      |
|  **2**  | From the invoice list, select an invoice in the "Issued" state                           | The invoice details view will open.                                                                                                                               |
|  **3**  | Click the _Create correction_ button                                                     | A form for creating a correction invoice will open with predefined data from the original invoice.                                                                |
|  **4**  | In the _Correction reason_ section, enter "Change in product quantity"                   | The correction reason will be saved.                                                                                                                              |
|  **5**  | Verify that the form displays the number of the invoice being corrected                  | The system should display information about the original invoice number.                                                                                          |
|  **6**  | In the invoice items section, find the "Product A" item                                  | The item will be displayed with data from the original invoice.                                                                                                   |
|  **7**  | Click the _Edit_ button next to the "Product A" item                                     | An item editing window will open.                                                                                                                                 |
|  **8**  | Change the quantity from "2" to "1"                                                      | The quantity will be updated.                                                                                                                                     |
|  **9**  | Click the _Save_ button                                                                  | The item will be updated, and the system will automatically calculate new net, VAT, and gross values, as well as the difference compared to the original invoice. |
| **10**  | Verify that the system displays both the original values and the values after correction | The system should show: Before correction (2 pcs., 200.00 net), After correction (1 pc., 100.00 net), Difference (-1 pc., -100.00 net).                           |
| **11**  | Click the _Verify_ button                                                                | The system will start the correction invoice data verification process.                                                                                           |
| **12**  | Wait for the verification to complete                                                    | The system will change the correction invoice state to "Approved" and display an appropriate message.                                                             |
| **13**  | Click the _Issue correction invoice_ button                                              | The correction invoice will be issued and receive a number in the format "CORRECTION/[original invoice number]".                                                  |
| **14**  | Click the _Download PDF_ button                                                          | The system will generate a PDF file with the correction invoice.                                                                                                  |
| **15**  | Return to the invoice list                                                               | The list of invoices will be displayed.                                                                                                                           |
| **16**  | Check if the original invoice is marked as "Corrected"                                   | The original invoice should be marked as "Corrected" with a reference to the correction invoice.                                                                  |

**Expected result**: After performing all steps, a correction invoice will be created and issued with a correct reference to the original invoice. The correction invoice will contain information about the differences in values before and after the correction. The original invoice will be marked as corrected.

## Benefits of a systematic approach to test documentation

Using detailed test cases in the GIVEN-WHEN-THEN format with numbering and tabular presentation of steps brings a number of measurable benefits for the entire development team:

### Data consistency and understanding within the team

1. **Uniform interpretation of requirements** - Precise test cases ensure that all team members understand the functionality in the same way. This eliminates interpretative discrepancies between developers, testers, and business analysts.

2. **Consistent mental model** - The GIVEN-WHEN-THEN format creates a common language and mental model for the entire team, which facilitates communication and reduces errors resulting from misunderstandings.

3. **Clarity of states and transitions between them** - The presentation of the invoice state diagram and related test cases gives a comprehensive picture of the document lifecycle, which ensures a consistent understanding of data flow.

### Usefulness for automated tests

1. **Ready basis for automation** - Detailed test steps with precisely defined actions and expected results can be directly translated into automation scripts, saving time on analysis and test design.

2. **Easier assertion implementation** - Each "THEN" in the GIVEN-WHEN-THEN structure can be directly translated into assertions in automated tests, which increases the accuracy and completeness of verification.

3. **Easier regression detection** - Precise test scenarios allow for quick detection of regression through automatic execution of all defined steps and verification of results.

4. **Test maintainability** - When automated tests are based on well-defined test cases, changes in functionality can be more easily tracked and both documentation and test code can be updated.

### Documentation for future team members

1. **Shortened onboarding time** - New team members can quickly understand how the system works by analyzing state diagrams and detailed test cases, without the need to dig through source code.

2. **Source of domain knowledge** - Test cases serve not only as technical documentation but also contain valuable information about business logic and domain rules.

3. **Living documentation** - Regularly maintained test cases constitute current system documentation that evolves along with the product, unlike traditional documentation that often becomes outdated.

4. **Self-documenting requirements** - The GIVEN-WHEN-THEN format combined with test steps allows new team members to understand not only HOW the system works but also WHY it works in a certain way.

### Additional business benefits

1. **Reduced error costs** - Precise tests reduce the number of errors that make it to production, which directly translates into financial savings and better product reputation.

2. **Increased transparency for stakeholders** - Test cases in an understandable format can also be presented to non-technical stakeholders, which increases their trust in the development process.

3. **Accelerated requirements validation** - Test cases in the GIVEN-WHEN-THEN format can be verified by the business even before implementation, which allows for early detection of inconsistencies in requirements.

Adopting such a systematic approach to testing and documentation creates a positive cycle in which each new test enriches the team's knowledge base, improves communication, and enhances the quality of the end product.

---


## Pisanie przypadków testowych

**URL:** https://portfolio.sdet.pl/how-i-do-it/test-case
**Published:** 2025-03-16
**Language:** pl
Tags: qa, test-case, methodology

Jak piszę przypadki testowe - struktura Given/When/Then, pojedyncza odpowiedzialność, deterministyczne asercje.

## Przypadki testowe - Moduł faktur (light version)

Te przypadki testowe obejmują najważniejsze scenariusze dla modułu faktur:

1. **TC-FAK-001** - Pomyślne utworzenie i zatwierdzenie faktury:
   - Testuje podstawowy "happy path" dla tworzenia faktury
   - Weryfikuje poprawność obliczeń kwot i podatków
   - Sprawdza przejścia stanów od Projektu do Zatwierdzonej faktury

2. **TC-FAK-002** - Walidacja niepoprawnych danych faktury:
   - Testuje mechanizmy walidacji danych
   - Weryfikuje, czy system prawidłowo wykrywa różne typy błędów
   - Sprawdza komunikaty o błędach

3. **TC-FAK-003** - Anulowanie wystawionej faktury:
   - Testuje proces anulowania faktury
   - Weryfikuje wymagane informacje (powód anulowania)
   - Sprawdza blokadę edycji anulowanej faktury

#

## Szczegółowe przypadki testowe - Moduł faktur (solid version)

#### **TC_FAK_001.GivenUserCreatesNewInvoiceAndFillsAllRequiredFields_WhenUserValidatesAndApproves_ThenInvoiceIsSuccessfullyCreated**

**Tytuł:** Pomyślne utworzenie i zatwierdzenie nowej faktury.

**Opis**: Przypadek testowy obejmuje kompletny proces tworzenia nowej faktury z wypełnieniem wszystkich wymaganych pól oraz jej zatwierdzenie za pomocą interfejsu graficznego użytkownika.

| **Lp.** | **Akcja**                                                         | **Efekt**                                                                                                                     |
| :-----: | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
|  **1**  | Kliknij na ikonę faktury w bocznym menu.                          | Otwiera się sekcja _Faktury_                                                                                                  |
|  **2**  | Kliknij w prawym górnym rogu aplikacji na przycisk _Nowa faktura_ | Oczekuj pojawienia się _snackbar_ informującego o sukcesie utworzenia faktury. System tworzy nową fakturę w stanie "Projekt". |
|  **3**  | W sekcji _Kontrahent_ kliknij przycisk _Wybierz_                  | Otworzy się okno dialogowe z listą dostępnych kontrahentów.                                                                   |
|  **4**  | Wybierz kontrahenta z listy (np. "Firma XYZ")                     | Dane kontrahenta zostaną wprowadzone do formularza faktury.                                                                   |
|  **5**  | W sekcji _Dane faktury_ ustaw datę wystawienia na dzień bieżący   | Data wystawienia zostanie zaktualizowana.                                                                                     |
|  **6**  | Ustaw datę sprzedaży na dzień bieżący                             | Data sprzedaży zostanie zaktualizowana.                                                                                       |
|  **7**  | Ustaw termin płatności na 14 dni od daty wystawienia              | Termin płatności zostanie zaktualizowany.                                                                                     |
|  **8**  | W sekcji _Metoda płatności_ wybierz: _Przelew_                    | Metoda płatności zostanie ustawiona na "Przelew".                                                                             |
|  **9**  | Kliknij przycisk _Dodaj pozycję_                                  | Otworzy się okno dodawania pozycji faktury.                                                                                   |
| **10**  | W oknie dodawania pozycji wybierz produkt "Produkt A" z listy     | Produkt zostanie wybrany.                                                                                                     |
| **11**  | Ustaw ilość na wartość "2"                                        | Ilość zostanie zaktualizowana.                                                                                                |
| **12**  | Ustaw cenę jednostkową netto na "100,00 zł"                       | Cena zostanie zaktualizowana.                                                                                                 |
| **13**  | Wybierz stawkę VAT "23%"                                          | Stawka VAT zostanie zaktualizowana.                                                                                           |
| **14**  | Kliknij przycisk _Dodaj_                                          | Pozycja zostanie dodana do faktury. System automatycznie obliczy wartości netto, VAT i brutto.                                |
| **15**  | Kliknij przycisk _Weryfikuj_                                      | System rozpocznie proces weryfikacji danych faktury.                                                                          |
| **16**  | Poczekaj na zakończenie weryfikacji                               | System zmieni stan faktury na "Zatwierdzona" i wyświetli stosowny komunikat.                                                  |
| **17**  | Kliknij przycisk _Wystaw fakturę_                                 | Faktura zostanie wystawiona (zmiana stanu na "Wystawiona") i otrzyma numer.                                                   |
| **18**  | Kliknij przycisk _Pobierz PDF_                                    | System wygeneruje plik PDF z fakturą.                                                                                         |

**Oczekiwany wynik**: Po wykonaniu wszystkich kroków zostanie utworzona, zatwierdzona i wystawiona faktura z poprawnie obliczonymi kwotami (netto: 200,00 zł, VAT: 46,00 zł, brutto: 246,00 zł). Faktura będzie dostępna do pobrania w formacie PDF.

---

#### **TC_FAK_002.GivenUserStartsEditingExistingInvoice_WhenInvalidDataIsEntered_ThenValidationErrorMessagesAreDisplayed**

**Tytuł:** Walidacja błędnych danych podczas edycji faktury.

**Opis**: Przypadek testowy weryfikuje mechanizmy walidacji podczas edycji istniejącej faktury, poprzez wprowadzenie niepoprawnych danych w różnych polach i sprawdzenie komunikatów o błędach.

| **Lp.** | **Akcja**                                                       | **Efekt**                                                                                                                                                                                                                                        |
| :-----: | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|  **1**  | Kliknij na ikonę faktury w bocznym menu.                        | Otwiera się sekcja _Faktury_                                                                                                                                                                                                                     |
|  **2**  | Z listy faktur wybierz fakturę w stanie "Projekt"               | Otworzy się widok szczegółów faktury.                                                                                                                                                                                                            |
|  **3**  | Kliknij przycisk _Edytuj_                                       | Formularz faktury przejdzie w tryb edycji.                                                                                                                                                                                                       |
|  **4**  | W sekcji _Kontrahent_ kliknij przycisk _Wyczyść_                | Dane kontrahenta zostaną usunięte.                                                                                                                                                                                                               |
|  **5**  | W sekcji _Dane faktury_ ustaw datę wystawienia na dzień bieżący | Data wystawienia zostanie zaktualizowana.                                                                                                                                                                                                        |
|  **6**  | Ustaw datę sprzedaży na 40 dni przed datą wystawienia           | Data sprzedaży zostanie zaktualizowana.                                                                                                                                                                                                          |
|  **7**  | Ustaw termin płatności na dzień przed datą wystawienia          | Termin płatności zostanie zaktualizowany.                                                                                                                                                                                                        |
|  **8**  | Kliknij przycisk _Dodaj pozycję_                                | Otworzy się okno dodawania pozycji faktury.                                                                                                                                                                                                      |
|  **9**  | W oknie dodawania pozycji wybierz produkt "Produkt B" z listy   | Produkt zostanie wybrany.                                                                                                                                                                                                                        |
| **10**  | Ustaw ilość na wartość "-2" (ujemna ilość)                      | Ilość zostanie wprowadzona.                                                                                                                                                                                                                      |
| **11**  | Ustaw cenę jednostkową netto na "100,00 zł"                     | Cena zostanie zaktualizowana.                                                                                                                                                                                                                    |
| **12**  | Wybierz stawkę VAT "23%"                                        | Stawka VAT zostanie zaktualizowana.                                                                                                                                                                                                              |
| **13**  | Kliknij przycisk _Dodaj_                                        | System wyświetli komunikat o błędzie: "Ilość musi być większa od zera".                                                                                                                                                                          |
| **14**  | Popraw ilość na "2" i kliknij przycisk _Dodaj_                  | Pozycja zostanie dodana do faktury.                                                                                                                                                                                                              |
| **15**  | Kliknij przycisk _Weryfikuj_                                    | System rozpocznie proces weryfikacji danych faktury i wyświetli błędy: "Kontrahent jest wymagany", "Data sprzedaży nie może być wcześniejsza niż 30 dni od daty wystawienia", "Termin płatności nie może być wcześniejszy niż data wystawienia". |
| **16**  | Popraw dane kontrahenta wybierając firmę z listy                | Dane kontrahenta zostaną uzupełnione.                                                                                                                                                                                                            |
| **17**  | Popraw datę sprzedaży na dzień bieżący                          | Data sprzedaży zostanie zaktualizowana.                                                                                                                                                                                                          |
| **18**  | Popraw termin płatności na 14 dni od daty wystawienia           | Termin płatności zostanie zaktualizowany.                                                                                                                                                                                                        |
| **19**  | Kliknij ponownie przycisk _Weryfikuj_                           | System przeprowadzi weryfikację i zmieni stan faktury na "Zatwierdzona".                                                                                                                                                                         |

**Oczekiwany wynik**: System poprawnie zidentyfikuje wszystkie błędy w danych faktury i wyświetli odpowiednie komunikaty walidacyjne. Po poprawieniu błędów, faktura przejdzie pomyślnie proces weryfikacji i zostanie zatwierdzona.

---

#### **TC_FAK_003.GivenInvoiceIsInIssuedState_WhenUserSelectsCorrectionOption_ThenCorrectionInvoiceIsCreatedWithProperReference**

**Tytuł:** Tworzenie faktury korygującej.

**Opis**: Przypadek testowy obejmuje proces utworzenia faktury korygującej do istniejącej, wystawionej faktury, wraz z określeniem powodu korekty i powiązaniem z fakturą oryginalną.

| **Lp.** | **Akcja**                                                                              | **Efekt**                                                                                                                                         |
| :-----: | -------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
|  **1**  | Kliknij na ikonę faktury w bocznym menu.                                               | Otwiera się sekcja _Faktury_                                                                                                                      |
|  **2**  | Z listy faktur wybierz fakturę w stanie "Wystawiona"                                   | Otworzy się widok szczegółów faktury.                                                                                                             |
|  **3**  | Kliknij przycisk _Utwórz korektę_                                                      | Otworzy się formularz tworzenia faktury korygującej z predefiniowanymi danymi z faktury oryginalnej.                                              |
|  **4**  | W sekcji _Powód korekty_ wpisz "Zmiana ilości towaru"                                  | Powód korekty zostanie zapisany.                                                                                                                  |
|  **5**  | Zweryfikuj, że w formularzu wyświetla się numer faktury korygowanej                    | System powinien wyświetlać informację o numerze faktury oryginalnej.                                                                              |
|  **6**  | W sekcji pozycji faktury, znajdź pozycję "Produkt A"                                   | Pozycja zostanie wyświetlona z danymi z faktury oryginalnej.                                                                                      |
|  **7**  | Kliknij przycisk _Edytuj_ przy pozycji "Produkt A"                                     | Otworzy się okno edycji pozycji.                                                                                                                  |
|  **8**  | Zmień ilość z "2" na "1"                                                               | Ilość zostanie zaktualizowana.                                                                                                                    |
|  **9**  | Kliknij przycisk _Zapisz_                                                              | Pozycja zostanie zaktualizowana, a system automatycznie obliczy nowe wartości netto, VAT i brutto oraz różnicę w stosunku do faktury oryginalnej. |
| **10**  | Zweryfikuj, że system wyświetla zarówno pierwotne wartości, jak i wartości po korekcie | System powinien pokazywać: Przed korektą (2 szt., 200,00 zł netto), Po korekcie (1 szt., 100,00 zł netto), Różnica (-1 szt., -100,00 zł netto).   |
| **11**  | Kliknij przycisk _Weryfikuj_                                                           | System rozpocznie proces weryfikacji danych faktury korygującej.                                                                                  |
| **12**  | Poczekaj na zakończenie weryfikacji                                                    | System zmieni stan faktury korygującej na "Zatwierdzona" i wyświetli stosowny komunikat.                                                          |
| **13**  | Kliknij przycisk _Wystaw fakturę korygującą_                                           | Faktura korygująca zostanie wystawiona i otrzyma numer w formacie "KOREKTA/[numer oryginalnej faktury]".                                          |
| **14**  | Kliknij przycisk _Pobierz PDF_                                                         | System wygeneruje plik PDF z fakturą korygującą.                                                                                                  |
| **15**  | Wróć do listy faktur                                                                   | Zostanie wyświetlona lista faktur.                                                                                                                |
| **16**  | Sprawdź, czy faktura oryginalna ma oznaczenie "Skorygowana"                            | Faktura oryginalna powinna być oznaczona jako "Skorygowana" z odnośnikiem do faktury korygującej.                                                 |

**Oczekiwany wynik**: Po wykonaniu wszystkich kroków zostanie utworzona i wystawiona faktura korygująca z poprawnym odniesieniem do faktury oryginalnej. Faktura korygująca będzie zawierać informacje o różnicach wartości przed i po korekcie. Faktura oryginalna zostanie oznaczona jako skorygowana.

## Korzyści wynikające z systematycznego podejścia do dokumentacji testowej

Zastosowanie szczegółowych przypadków testowych w formacie GIVEN-WHEN-THEN z numeracją i tabelarycznym przedstawieniem kroków przynosi szereg wymiernych korzyści dla całego zespołu deweloperskiego:

### Spójność danych i zrozumienia w zespole

1. **Jednolita interpretacja wymagań** - Precyzyjne przypadki testowe zapewniają, że wszyscy członkowie zespołu identycznie rozumieją sposób działania funkcjonalności. Eliminuje to rozbieżności interpretacyjne między developerami, testerami i analitykami biznesowymi.

2. **Spójny model mentalny** - Format GIVEN-WHEN-THEN tworzy wspólny język i model mentalny dla całego zespołu, co ułatwia komunikację i redukuje błędy wynikające z nieporozumień.

3. **Przejrzystość stanów i przejść między nimi** - Prezentacja diagramu stanów faktury oraz powiązanych przypadków testowych daje całościowy obraz cyklu życia dokumentu, co zapewnia spójne zrozumienie przepływu danych.

### Użyteczność pod kątem testów automatycznych

1. **Gotowa podstawa do automatyzacji** - Szczegółowe kroki testowe z precyzyjnie określonymi akcjami i oczekiwanymi rezultatami można bezpośrednio przełożyć na skrypty automatyzacyjne, oszczędzając czas na analizę i projektowanie testów.

2. **Łatwiejsza implementacja asercji** - Każdy "THEN" w strukturze GIVEN-WHEN-THEN może być bezpośrednio przetłumaczony na asercje w testach automatycznych, co zwiększa dokładność i kompletność weryfikacji.

3. **Ułatwione wykrywanie regresji** - Precyzyjne scenariusze testowe pozwalają na szybkie wykrycie regresji poprzez automatyczne wykonanie wszystkich zdefiniowanych kroków i weryfikację rezultatów.

4. **Utrzymywalność testów** - Gdy testy automatyczne są oparte na dobrze zdefiniowanych przypadkach testowych, zmiany w funkcjonalności można łatwiej śledzić i aktualizować zarówno dokumentację, jak i kod testów.

### Dokumentacja dla przyszłych członków zespołu

1. **Skrócenie czasu onboardingu** - Nowi członkowie zespołu mogą szybko zrozumieć działanie systemu poprzez analizę diagramów stanów i szczegółowych przypadków testowych, bez konieczności przekopywania się przez kod źródłowy.

2. **Źródło wiedzy domenowej** - Przypadki testowe stanowią nie tylko dokumentację techniczną, ale również zawierają cenne informacje o logice biznesowej i regułach domeny.

3. **Żywa dokumentacja** - Utrzymywane na bieżąco przypadki testowe stanowią aktualną dokumentację systemu, która ewoluuje wraz z produktem, w przeciwieństwie do tradycyjnej dokumentacji, która często staje się nieaktualna.

4. **Samodokumentujące się wymagania** - Format GIVEN-WHEN-THEN w połączeniu z krokami testowymi pozwala nowym członkom zespołu zrozumieć nie tylko JAK działa system, ale również DLACZEGO działa w określony sposób.

### Dodatkowe korzyści biznesowe

1. **Redukcja kosztów błędów** - Precyzyjne testy zmniejszają liczbę błędów przechodzących do produkcji, co bezpośrednio przekłada się na oszczędności finansowe i lepszą reputację produktu.

2. **Zwiększona transparentność dla interesariuszy** - Przypadki testowe w zrozumiałym formacie mogą być prezentowane również nietechnicznym interesariuszom, co zwiększa ich zaufanie do procesu wytwórczego.

3. **Przyspieszona walidacja wymagań** - Przypadki testowe w formacie GIVEN-WHEN-THEN mogą być weryfikowane przez biznes jeszcze przed implementacją, co pozwala na wczesne wykrycie nieścisłości w wymaganiach.

Przyjęcie takiego systematycznego podejścia do testów i dokumentacji tworzy pozytywny cykl, w którym każdy nowy test wzbogaca bazę wiedzy zespołu, usprawnia komunikację i podnosi jakość produktu końcowego.

---


## Test Plan Template

**URL:** https://portfolio.sdet.it/how-i-do-it/test-plan
**Published:** 2025-03-18
**Language:** en
Tags: qa, test-plan, methodology

How I write test plans - risk-based, traceable to requirements, with explicit out-of-scope section.

### Automated Testing Methodology for the Invoice Module

My approach to automating tests for the invoice module is based on a solid process that ensures high quality and testing effectiveness. Below, I describe my typical process:

#### **1. Manual Testing as the Foundation**

Before I proceed with automation, I always test the functionality manually first. This allows me to understand the invoicing system and identify potential issues that may require special attention during automation.

#### **2. Documentation Analysis**

- If documentation is available (specifications, user stories, use cases) - I analyze it thoroughly
- In case of missing documentation, I apply an exploratory approach to discover all aspects of the invoice module

#### **3. Testing Positive Paths**

I start with the "happy path" - the basic, correct user path. For invoices, this includes the full invoice lifecycle - from creation, through approval, issuance, to payment.

#### **4. Creating Test Scenarios**

Based on the acquired knowledge, I create test scenarios for invoices, which then help me extract specific test cases. Example scenarios for invoices:

- Adding a new invoice
- Editing an invoice in Draft state
- Validation of required invoice fields
- Issuing an invoice
- Canceling an invoice
- Creating a correction invoice

#### **5. Testing Negative Paths**

After testing positive paths, I focus on negative tests - checking how the system responds to incorrect invoice data, exceeding value limits, or unusual user interactions.

#### **6. Applying Testing Techniques**

I use various testing techniques, such as:

- Equivalence classes - grouping invoice data into categories (e.g., different types of invoices: VAT, proforma, advance payment)
- State transition diagrams - especially useful in the invoicing process, where an invoice goes through various states from draft to closure

```mermaid
stateDiagram-v2
    [*] --> Draft

    Draft --> Verification: Filling in data

    Verification --> Draft: Verification rejection
    Verification --> Approved: Positive validation

    Approved --> Draft: Return to editing
    Approved --> Issued: Assigning number

    Issued --> Paid: Payment registration
    Issued --> Corrected: Invoice correction
    Issued --> Canceled: Cancellation

    Corrected --> Paid: Payment registration
    Corrected --> Canceled: Cancellation

    Paid --> Closed: Accounting closure
    Paid --> Corrected: Invoice correction

    Canceled --> [*]
    Closed --> [*]
```

#### **7. Test Automation**

After thorough manual analysis, I proceed to automation, which includes:

- Selecting the appropriate testing framework - obviously Playwright with TS :D
- Implementing test scripts for various invoicing scenarios
- Executing tests and analyzing results
- Integration with a CI/CD system for continuous testing of the invoice module

The diagram above presents the states and transitions of an invoice in the system, which forms the basis for creating both manual and automated tests.

---


## Szablon planu testów

**URL:** https://portfolio.sdet.pl/how-i-do-it/test-plan
**Published:** 2025-03-18
**Language:** pl
Tags: qa, test-plan, methodology

Jak piszę plany testów - oparte na ryzyku, śledzone do wymagań, z jawną sekcją out-of-scope.

### Metodologia testowania automatycznego dla modułu Faktur

Moje podejście do automatyzacji testów modułu faktur opiera się na solidnym procesie, który zapewnia wysoką jakość i skuteczność testowania. Poniżej opisuję mój typowy proces:

#### **1. Testowanie manualne jako podstawa**

Zanim przystąpię do automatyzacji, zawsze najpierw testuję funkcjonalność manualnie. Pozwala mi to zrozumieć system fakturowania i zidentyfikować potencjalne problemy, które mogą wymagać szczególnej uwagi podczas automatyzacji.

#### **2. Analiza dokumentacji**

- Jeśli dostępna jest dokumentacja (specyfikacja, user stories, przypadki użycia) - analizuję ją dokładnie
- W przypadku braku dokumentacji, stosuję podejście eksploracyjne, aby odkryć wszystkie aspekty modułu faktur

#### **3. Testowanie ścieżek pozytywnych**

Zaczynam od tzw. "happy path" - czyli podstawowej, poprawnej ścieżki użytkownika. Dla faktur obejmuje to pełny cykl życia faktury - od utworzenia, przez zatwierdzenie, wystawienie, aż po opłacenie.

#### **4. Tworzenie scenariuszy testowych**

Na podstawie zdobytej wiedzy tworzę scenariusze testowe dla faktur, które następnie pomagają mi wyodrębnić konkretne przypadki testowe. Przykładowe scenariusze dla faktur:

- Dodawanie nowej faktury
- Edycja faktury w stanie Projekt
- Walidacja wymaganych pól faktury
- Wystawianie faktury
- Anulowanie faktury
- Tworzenie faktury korygującej

#### **5. Testowanie ścieżek negatywnych**

Po przetestowaniu ścieżek pozytywnych, skupiam się na testach negatywnych - sprawdzam jak system reaguje na niepoprawne dane faktur, przekroczenie limitów wartości, czy nietypowe interakcje użytkownika.

#### **6. Zastosowanie technik testowych**

Wykorzystuję różne techniki testowe, takie jak:

- Klasy równoważności - grupowanie danych faktur w kategorie (np. różne typy faktur: VAT, proforma, zaliczkowa)
- Diagramy przejścia stanów - szczególnie przydatne w procesie fakturowania, gdzie faktura przechodzi przez różne stany od projektu do zamknięcia

```mermaid
stateDiagram-v2
    [*] --> Projekt

    Projekt --> Weryfikacja: Wypełnienie danych

    Weryfikacja --> Projekt: Odrzucenie weryfikacji
    Weryfikacja --> Zatwierdzona: Pozytywna walidacja

    Zatwierdzona --> Projekt: Powrót do edycji
    Zatwierdzona --> Wystawiona: Nadanie numeru

    Wystawiona --> Opłacona: Rejestracja płatności
    Wystawiona --> Skorygowana: Korekta faktury
    Wystawiona --> Anulowana: Anulowanie

    Skorygowana --> Opłacona: Rejestracja płatności
    Skorygowana --> Anulowana: Anulowanie

    Opłacona --> Zamknięta: Zamknięcie księgowe
    Opłacona --> Skorygowana: Korekta faktury

    Anulowana --> [*]
    Zamknięta --> [*]
```

#### **7. Automatyzacja testów**

Po dokładnej analizie manualnej przystępuję do automatyzacji, która obejmuje:

- Wybór odpowiedniego frameworka testowego - wiadomo Playwright z TS :D
- Implementację skryptów testowych dla różnych scenariuszy fakturowania
- Wykonanie testów i analizę wyników
- Integrację z systemem CI/CD dla ciągłego testowania modułu faktur

Powyższy diagram prezentuje stany i przejścia faktury w systemie, co stanowi podstawę do tworzenia zarówno testów manualnych jak i automatycznych.

---


---

# Projects (13)


## AI QA Manual · Jarvis destylat

**Category:** ai-tooling
**Status:** coming-soon
**License:** AGPL-3.0
**Stack:** TypeScript, Node.js, Claude Agent SDK, Playwright, MCP, n8n, PostgreSQL


Production-ready manual QA workflow extracted from Jarvis. Context-first pipeline: Figma MCP + Jira webhook + Playwright CLI + Claude Agent SDK. Scale: 100-200 tasks in 2-3 days vs team-week classical.

- Request early access: /contact?interest=ai-qa-manual


## CDAT Pattern · Playwright architecture

**Category:** testing
**Status:** public
**GitHub:** https://github.com/darco81/cdat-pattern
**License:** MIT
**Stack:** TypeScript, Playwright, Node.js


Components-Data-Actions-Tests - 4-layer architectural pattern for Playwright + TypeScript. Alternative to Page Object Model. Battle-tested across 9 production systems over 2 years.

- Deep dive: /articles/cdat-pattern-deep-dive
- Docs & Landing: https://cdat.sdet.it
- Source on GitHub: https://github.com/darco81/cdat-pattern


## sdet-wcag-toolkit · Public WCAG 2.2 AA pipeline

**Category:** ai-tooling
**Status:** public
**GitHub:** https://github.com/darco81/sdet-wcag-toolkit
**License:** AGPL-3.0
**Stack:** TypeScript, Claude Agent SDK, Playwright, axe-core, MCP


Public AGPL-3.0 distillate of multi-agent WCAG audit pipeline. 5 AI specialists reading source via Read/Grep/Glob, plus static TypeScript analyzer and Playwright + axe-core dynamic testing. A-F grading. Case study in From the Field series #01.

- Source on GitHub: https://github.com/darco81/sdet-wcag-toolkit
- From the Field series: /from-the-field


## sdet-brain · Persistent RAG over MCP

**Category:** ai-tooling
**Status:** public
**GitHub:** https://github.com/darco81/sdet-brain
**License:** source-available
**Stack:** Python, FastAPI, FastMCP, Qdrant, MLX, MCP, Apple Silicon, Docker


Local-first persistent RAG for personal Markdown corpus. Qdrant + MLX + FastAPI + FastMCP 3.0. 11 MCP tools, 213 tests, source-available. Replaces copy-paste of context across Claude Desktop / Code / OpenCode chats.

- Source on GitHub: https://github.com/darco81/sdet-brain
- Case study: /articles/sdet-brain


## skills-radar · Lazy-loading skill discovery for Claude Code

**Category:** ai-tooling
**Status:** public
**GitHub:** https://github.com/darco81/skills-radar
**License:** MIT
**Stack:** Python, FastMCP, ChromaDB, Qdrant, MLX, sentence-transformers, BM25, Apple Silicon


Open-source MCP server fixing Claude Code skill bloat. Two-Tier Discovery: ~1k token mini-index always preloaded, full SKILL.md loaded on demand. 68% token reduction at 60 skills, roughly flat at 500. Hybrid retrieval (BM25 + dense), trust tiers, 100% local Apple Silicon stack via MLX (Qwen3-Embedding-8B + Qwen3-Coder-30B rewriter/reranker). No Ollama, no HTTP, no network.

- Source on GitHub: https://github.com/darco81/skills-radar
- Case study: /articles/skills-radar


## Claude VSCode Controller

**Category:** ai-tooling
**Status:** public
**GitHub:** https://github.com/darco81/claude-vscode-controller
**License:** MIT
**Stack:** TypeScript, VSCode Extension API, WebSocket, MCP, Claude SDK


Cross-platform AI-IDE bridge. Real-time integration between Claude Desktop and VSCode using Extension API, WebSocket, and Model Context Protocol. 30+ native IDE commands via natural language.

- Source on GitHub: https://github.com/darco81/claude-vscode-controller
- Case study: /articles/claude-vscode-controller


## MAF E2E Tests · Playwright

**Category:** testing
**Status:** public
**GitHub:** https://github.com/darco81/maf-e2e-pw
**Stack:** Playwright, TypeScript, CDAT


Automated E2E test suite for MAF app. CDAT pattern in production: UI testing, data validation, stability across user scenarios.

- Source on GitHub: https://github.com/darco81/maf-e2e-pw


## K6 Performance Dashboard

**Category:** testing
**Status:** public
**GitHub:** https://github.com/darco81/k6-dashboard
**Stack:** React, Node.js, K6, Clean Architecture


Enterprise-grade web app for K6 performance test visualization. Clean Architecture, live terminal, clone-and-run test repos without local DEV/QA environment.

- Source on GitHub: https://github.com/darco81/k6-dashboard


## MAF API Tests · Playwright

**Category:** testing
**Status:** public
**GitHub:** https://github.com/darco81/M-A-F
**Stack:** Playwright, TypeScript, API Testing


How to avoid going crazy and create a scalable structure for stable API testing. Part of MAF monorepo.

- Source on GitHub: https://github.com/darco81/M-A-F/tree/main/maf-api-tests
- Article: /articles/api-tests-playwright-maf


## MAF · Moja Aplikacja Faktur

**Category:** utility
**Status:** public
**GitHub:** https://github.com/darco81/M-A-F
**Stack:** Vue, TypeScript, Node.js, PostgreSQL


Full-stack invoicing app. Real-time financial analysis tool built for practical use - the backbone for all MAF testing case studies (API tests, UI tests, 4-layer pattern).

- Live demo: http://maf.sdet.pl
- Source on GitHub: https://github.com/darco81/M-A-F


## Confluence Headers Manager

**Category:** utility
**Status:** public
**GitHub:** https://github.com/darco81/confluence-headers-manager-pro
**Stack:** TypeScript, Confluence API


Intelligent tool that solves one of the most frustrating problems for Confluence users - bulk header management and consistent structure across pages.

- Source on GitHub: https://github.com/darco81/confluence-headers-manager-pro


## DEMO BI · Business Intelligence Dashboard

**Category:** infrastructure
**Status:** public
**GitHub:** https://github.com/darco81/demo-bi
**Stack:** Next.js, TypeScript, Tailwind, PostgreSQL, Docker


Production-ready BI platform built from scratch. Next.js 14, TypeScript, Tailwind, PostgreSQL 15, Docker. Real-time KPI dashboards, AI-powered analytics (forecasting, anomaly detection), RBAC, glassmorphism UI.

- Live demo: http://54.36.174.173:3200/
- Source on GitHub: https://github.com/darco81/demo-bi


## Portfolio v1 · Matrix-style legacy

**Category:** archive
**Status:** archived
**GitHub:** https://github.com/darco81/portfolio
**Stack:** React, Framer Motion, Vite


Archived first version of sdet.pl portfolio. React + Framer Motion + Matrix animations. Kept for historical continuity - proves brand presence since early 2025.

- Source on GitHub: https://github.com/darco81/portfolio


---

# Ecosystem Highlights (5)

AI tooling ecosystem - Jarvis platform + destylaty (open-source PoCs).


## Jarvis · Production multi-agent QA platform

**Status:** private
**Subtitle:** Private · 34K LOC · 9 microservices · event-driven


Private multi-agent platform powering real-world QA workflows. 9 microservices, event-driven architecture, real-time UI, NATS messaging, PostgreSQL, WebSocket, 15 production pipelines. WCAG 7-agent + Performance 5-agent auditing at production scale. Modules published as standalone destylaty below.

**Metrics:**
- Lines of code: 34K
- Microservices: 9
- Production pipelines: 15
- UI components: 57


## Jarvis Manual · QA workflow destylat

**Status:** coming-soon
**Subtitle:** Context-first pipeline · human-in-the-loop


Open-source destylat of the Jarvis manual QA workflow. Figma MCP + Jira webhook + Playwright CLI + Claude Agent SDK. Human approval checkpoints. Educational, not operational - method shown on dummy data, production integration = 2-4 weeks.


## Jarvis Figma · deterministic design pipeline

**Status:** coming-soon
**Subtitle:** CSS token mapping · visual diff · codegen spec


Deterministic Figma pipeline. Extract design tokens via Figma MCP (no LLM at data layer), map to CSS variables, diff expected vs computed, pixel-level visual diff via odiff, codegen spec generation. LLM only in logic layer.


## Tempo Timesheet · audit & automation destylat

**Status:** coming-soon
**Subtitle:** Git commits vs Tempo worklogs · risk scoring


Compare git commits vs Tempo worklogs, detect overtime patterns, risk scoring per developer. Jira + Tempo API integration, scheduled batch sync, human-in-the-loop approval queue.


## jarvis-brain · cross-project knowledge vault

**Status:** coming-soon
**Subtitle:** Dev meta-tool · stops Claude Code from burning tokens


Private dev tool - cross-project knowledge graph for Claude Code. FastAPI + PostgreSQL + Redis + Docker. 175 tests, mypy strict. MCP server exposes searchable project context so CC doesn't re-scan files on each session.

**Metrics:**
- Test coverage: 175
- Phase: P2-S1


---

# End of content

For programmatic access, use MCP endpoint: https://portfolio.sdet.pl/mcp (SSE, read-only).
For HTML pages, start from: https://portfolio.sdet.pl/