Power BI report visual monitoring

Long-form documentation: rendered output as an observability signal, how the open-source monitor works, and how to run it safely. Project: Power-bi-report-visual-monitoring.

Definition

Visual monitoring here means: open a published report in a controlled browser session, capture a rendered bitmap after the platform has executed the model (DAX, relationships, filters), compare it to a stored baseline, and persist a compact record (status, difference score, hashes, optional compressed delta). The observable is the user-visible state of the report, not an internal engine trace.

This is distinct from Usage Metrics (who consumed what) and from the Performance Analyzer (per-visual timing inside Desktop). Those tools answer different questions. Visual monitoring answers: “Did the rendered surface drift from what we agreed was correct?”

The approach is complementary to warehouse-level tests: you still want data tests where they matter, but the report layer adds layout, interactions, and measure semantics that are expensive to re-specify as pure SQL invariants for every tile.

Why it is effective

Many production defects surface first as layout, conditional formatting, filter interactions, or silently wrong measures while refresh and gateway health remain green. A pixel-oriented or perceptual-hash-oriented check catches regressions that are tedious to encode as invariant SQL on the warehouse, because the bug lives in the last mile of semantics inside the report layer.

A quadtree / MSE style comparison (as in this codebase) trades absolute fidelity for bounded work: cost scales with canvas structure and thresholds, not with the cardinality of every fact table touched by the model.

From an operations perspective, a scheduled visual check is also a contract on the external appearance of the service: if the screenshot changes, something in the chain (model, filters, theme, or host embedding) changed enough to matter for readers of that canvas.

Dashboard geometry and KPI sensitivity

Large operational dashboards often allocate a dominant share of canvas area to headline KPIs, trend cards, and variance callouts. Under typical design patterns, those visuals carry disproportionate business attention relative to peripheral navigation chrome.

Monitoring the rendered canvas therefore attaches high semantic leverage per check: a visible shift in a large KPI tile is likely to be noticed by humans even when underlying row-level aggregates would require multiple ad hoc queries to reconstruct the same gestalt.

That does not mean small visuals are unimportant; it means prioritization for monitoring capacity often starts with the largest, most-read tiles when you have a limited render budget.

Observability cost model (render + diff vs repeated warehouse probes)

Consider two families of validation strategy:

Strategy A — run several heavy SQL (or DAX-equivalent) probes directly against the same warehouse that feeds the model, recreating fragments of each KPI definition. Each probe may scan large partitions, join dimensions, and recompute complex measures. Total latency and load often grow roughly with the number and depth of probes, especially when KPI logic is non-linear in data volume.
Strategy B — let the report engine render once, then compute a bounded visual diff against a baseline and write a small log row (status, diff_percent, perceptual hash, duration). The expensive data movement has already been amortized by the BI platform; the incremental monitoring step is dominated by render stability and image comparison, whose cost is tied to image resolution and diff parameters rather than to issuing k separate full-model interrogations.

In many deployments, Strategy A with a moderate k of non-trivial reconciliation queries imposes higher cumulative warehouse pressure and tail latency than Strategy B at a fixed schedule, provided render time is acceptable for the SLA. This is a qualitative ordering statement, not a universal constant: a trivial single-query invariant can be cheaper than a long headless render. The useful claim is comparative under realistic KPI complexity, not “always faster than any SQL”.

Concrete millisecond figures belong in a controlled benchmark appendix once measured in your environment; they are not asserted here.

Architecture and data flow

Configuration sources

The monitor reads report definitions from reports.json (or the path in REPORTS_FILE): each entry carries id, name, url, interval, threshold, and enabled. Those values are not duplicated inside PostgreSQL; the database stores outcomes of checks only.

Runtime settings (PostgreSQL connection, screenshot size, diff policy, retry policy, Selenium wait, optional Basic auth for Power BI) come from environment variables loaded via Settings. See the repository README for the full variable list.

On-disk artifacts

Under the configured screenshots_dir (default ./Data), the tool maintains:

Data/baselines/<report_id>_init_baseline.png — the first accepted full-page capture used as the stable reference for XOR delta encoding.
Data/changes/<report_id>/last_baseline.png — the rolling baseline image used for the next comparison.
Data/changes/<report_id>/current_screenshot.png — the latest capture before it is promoted to last_baseline.png when the check succeeds.
Data/changes/<report_id>/current_screenshot_diff.png — optional diff overlay when diff rendering is enabled.

Runtime pipeline (happy path)

For each scheduled run the CheckReport use case: ensures directories exist; determines whether a baseline already exists (row in baselines plus both baseline images on disk); calls Selenium to capture either the initial baseline or the current page; on first run copies the init image to last_baseline.png, computes a perceptual hash (dhash), upserts baselines, and writes a baseline_created row to monitoring_checks.

On subsequent runs it compares the new capture to last_baseline.png using the configured diff policy (quadtree / MSE path in the domain service), optionally builds a delta against the init baseline, updates metrics, sets changed when diff_percent > 0 else unchanged, replaces last_baseline.png with the new image on success, updates the hash in baselines, and inserts a row into monitoring_checks including gzip-compressed XOR delta bytes when produced.

Persistence layer

PostgreSQL holds baselines (one row per report id) and the append-oriented log monitoring_checks. Views v_latest_checks and v_report_stats support dashboards and ad hoc queries. The schema is applied with python -m pbimonitor --init-db, which creates the schema if needed and executes schema.sql.

Scheduling and workers

An in-process scheduler dispatches enabled reports to worker threads with backpressure between runs. Failures in the worker that escape the use case are logged separately; normally errors are persisted as status = error rows so history stays coherent.

Check statuses

Each attempt produces one row in monitoring_checks with a string status:

baseline_created — first successful capture for that report: baseline images exist on disk, baselines row is written, diff_percent is 0, no delta payload.
unchanged — baseline existed; diff pipeline returned diff_percent == 0; the new screenshot becomes the next last_baseline.png.
changed — baseline existed; diff_percent > 0; indicates a non-zero visual delta under the current policy before promotion of the screenshot.
error — any exception in the check path (Selenium, diff, storage). diff_percent and screenshot hash are cleared in the stored record; error carries the message for triage.

The per-report threshold from reports.json is used for metrics (tracking how often the diff score sits below that threshold); it does not by itself flip changed vs unchanged in code, where zero vs positive diff area is the branch.

Configuration overview

reports.json

Beyond URL and interval, pay attention to threshold (0–100) as a reporting knob for “small visual drift” tracking, and to start_time if you later align schedules with business hours. Disabled reports are skipped entirely when loading.

Environment highlights

PG_* and PG_SCHEMA_ALLOWLIST — database target and safety rail for schema application.
PAGE_LOAD_WAIT, SCREENSHOT_WIDTH, SCREENSHOT_HEIGHT — trade stability vs fidelity; oversized viewports slow rendering.
MSE_THRESHOLD, MIN_BLOCK_SIZE, MAX_DEPTH, DIFF_ENABLED — control quadtree sensitivity and whether diff images are drawn.
RETRY_* — backoff for flaky networks or transient auth.
POWERBI_USERNAME / POWERBI_PASSWORD and POWERBI_AUTH_SERVER_WHITELIST — optional Basic auth and host allowlisting for embedded flows.

Sensitivity and tuning

When diffs are noisy

Animations, live clocks, rotating ads, or non-deterministic web fonts can create pixel churn without meaningful business change. Mitigations: lengthen wait, narrow viewport to the stable report region if your layout allows, increase MSE_THRESHOLD or MIN_BLOCK_SIZE cautiously, and verify that the report itself is not rendering timestamps into the canvas.

When changes are missed

Very small single-pixel shifts or low-contrast edits may fall under thresholds. Tighten MSE_THRESHOLD, reduce MIN_BLOCK_SIZE within performance limits, and confirm last_baseline.png is updating (so you are not comparing stale captures).

Per-report threshold

Use threshold consistently with how you read below_diff_threshold_rate in metrics: it is a product-level knob for “nuisance vs signal” analytics, not a second hidden diff gate in the status transition.

Operations

Bootstrap sequence

Prepare PostgreSQL credentials and allowlist, copy .env.example to .env.
Install dependencies, set PYTHONPATH=src (or install the package editable).
python -m pbimonitor --init-db — applies DDL.
python -m pbimonitor --build — establishes baselines for all enabled reports.
python -m pbimonitor --check <id> — smoke a single report.
python -m pbimonitor --start — continuous scheduler loop.

Docker

docker compose up --build -d brings up the monitor process alongside PostgreSQL; optional Redis service is reserved for future distributed queueing and is not required for the current in-process scheduler.

Operational prerequisites

The host running Chrome must reach the report URLs; integrated authentication flows may need allowlists or manual sign-in strategies beyond Basic auth. Keep secrets out of git; treat Data/ as sensitive cache because screenshots may contain confidential numbers.

Limitations and fit

Visual monitoring does not prove that numbers are arithmetically correct, only that the rendered page matches or diverges from a reference image under the chosen diff policy. Pair it with data quality checks where correctness is defined at row level.

Headless rendering consumes CPU and memory; dense schedules across many large viewports can contend with interactive users if collocated. Row-level security and per-user render differences mean baselines must be captured under representative credentials.

Features listed in the repository roadmap (user click scripts, OCR, external API probes, clustered deployment) are forward-looking until shipped; today’s scope is the visual detection pipeline described above.

Glossary

Init baseline — first stable screenshot for XOR delta reference.
Last baseline — rolling image used for the next comparison.
diff_percent — percentage of canvas area flagged as different by the quadtree/MSE pipeline.
dhash — perceptual hash of the baseline or current capture for quick fingerprinting.
XOR delta — compressed binary difference between current pixels and the init baseline for forensic reconstruction.

Визуальный мониторинг отчётов Power BI

Развёрнутая документация: отрендеренный отчёт как сигнал наблюдаемости, устройство открытого монитора и безопасная эксплуатация. Репозиторий: Power-bi-report-visual-monitoring.

Определение

Визуальный мониторинг здесь: открыть опубликованный отчёт в управляемой браузерной сессии, снять растровый снимок после того, как платформа выполнила модель (DAX, связи, фильтры), сравнить с сохранённым baseline и записать компактный результат (статус, метрика отличия, хеши, опционально сжатая дельта). Наблюдаемое — видимое пользователю состояние отчёта, а не внутренний трейс движка.

Это не то же самое, что Usage Metrics (кто и что смотрел) и не Performance Analyzer (тайминг визуалов в Desktop). Визуальный мониторинг отвечает на вопрос: «Уехала ли отрисованная поверхность от согласованного эталона?»

Подход дополняет проверки на витрине: данные по-прежнему нужно тестировать там, где критична строковая корректность, а слой отчёта добавляет вёрстку, взаимодействия и семантику мер, которую тяжело выразить чистым SQL-инвариантом для каждой плитки.

Почему это работает

Многие дефекты в проде проявляются как вёрстка, условное форматирование, фильтры, «тихо» неверные меры, пока обновление и шлюз зелёные. Пиксельная или хеш-ориентированная проверка ловит регрессии, которые тяжело закодировать инвариантным SQL по витрине, потому что ошибка сидит в последней миле семантики слоя отчёта.

Сравнение в духе quadtree / MSE (как в коде проекта) обменивает абсолютную точность на ограниченную работу: стоимость связана со структурой канваса и порогами, а не с кардинальностью каждой факт-таблицы модели.

С точки зрения эксплуатации периодическая визуальная проверка — это ещё и контракт на внешний вид сервиса: если снимок изменился, в цепочке (модель, фильтры, тема или хост встраивания) произошло достаточно сильное изменение для читателя этого канваса.

Геометрия дашборда и чувствительность к KPI

Крупные операционные дашборды часто отдают большую долю площади канваса заголовочным KPI, трендам и отклонениям. Такие визуалы концентрируют внимание бизнеса сильнее, чем навигационный «хром».

Мониторинг отрендеренного канваса даёт высокую семантическую отдачу на одну проверку: сдвиг крупной KPI-плитки человек заметит раньше, чем восстановит тот же «жест» несколькими ad hoc запросами к агрегатам.

Это не обесценивает мелкие визуалы; при ограниченном бюджете рендера приоритет обычно начинают с самых крупных и читаемых плиток.

Модель стоимости наблюдения (рендер + diff против повторных запросов к хранилищу)

Две стратегии проверки:

Стратегия A — несколько тяжёлых SQL (или эквивалентных проверок) к той же витрине, что питает модель, пересобирая фрагменты определений KPI. Каждая проверка может сканировать большие партиции, join-ить измерения и пересчитывать сложные меры. Суммарная задержка и нагрузка на DWH часто растут с числом и глубиной запросов.
Стратегия B — один раз отрендерить отчёт движком BI, затем ограниченный визуальный diff к baseline и компактная строка журнала (status, diff_percent, хеш, длительность). Дорогое движение данных уже амортизировано платформой; шаг мониторинга упирается в стабильность рендера и сравнение изображения, стоимость которого завязана на разрешение и параметры diff, а не на k отдельных полных опросов модели на стороне источника.

Во многих сценариях A с умеренным k нетривиальных запросов даёт большую суммарную нагрузку и хвост задержек, чем B на фиксированном расписании, если время рендера укладывается в SLA. Это качественное сравнение, не универсальная константа: один простой SELECT может быть дешевле долгого headless-рендера. Утверждение полезно при реалистично сложных KPI, а не в форме «всегда быстрее любого SQL».

Конкретные миллисекунды — только после бенчмарка в вашей среде; здесь они не приводятся.

Архитектура и поток данных

Источники конфигурации

Определения отчётов читаются из reports.json (или пути в REPORTS_FILE): у каждой записи есть id, name, url, interval, threshold, enabled. Эти поля не дублируются в PostgreSQL: база хранит только результаты проверок.

Параметры окружения (подключение к PostgreSQL, размер снимка, политика diff, повторы Selenium, ожидание загрузки, опциональная Basic-auth для Power BI) задаются переменными и классом Settings. Полный список — в README репозитория.

Файлы на диске

В каталоге screenshots_dir (по умолчанию ./Data) хранятся:

Data/baselines/<report_id>_init_baseline.png — первый принятый полностраничный снимок, опорный для XOR-дельты.
Data/changes/<report_id>/last_baseline.png — скользящий baseline для следующего сравнения.
Data/changes/<report_id>/current_screenshot.png — последний снимок до продвижения в last_baseline.png при успехе.
Data/changes/<report_id>/current_screenshot_diff.png — опциональная картинка отличий, если включена отрисовка diff.

Пайплайн выполнения (успешный сценарий)

Сценарий CheckReport: гарантирует каталоги; проверяет, есть ли baseline (строка в baselines и оба файла на диске); вызывает Selenium для снимка в init_baseline или current; при первом запуске копирует init в last_baseline.png, считает dhash, upsert в baselines, пишет в monitoring_checks статус baseline_created.

При последующих запусках сравнивает новый кадр с last_baseline.png по политике diff (quadtree/MSE в доменном сервисе), при необходимости строит дельту к init, обновляет метрики, выставляет changed при diff_percent > 0 иначе unchanged, заменяет last_baseline.png при успехе, обновляет хеш в baselines, вставляет строку в monitoring_checks с gzip-сжатыми байтами XOR-дельты, если они есть.

Слой хранения

PostgreSQL содержит baselines (одна строка на report_id) и журнал monitoring_checks. Представления v_latest_checks и v_report_stats упрощают дашборды и разбор инцидентов. Схема накатывается командой python -m pbimonitor --init-db (создание схемы при необходимости + schema.sql).

Планировщик и воркеры

В процессе работает планировщик с очередью и пулом потоков с ограничением нагрузки. Исключения в воркере вне сценария логируются отдельно; в норме ошибки попадают в monitoring_checks как status = error, чтобы история оставалась цельной.

Статусы проверки

Каждая попытка даёт строку в monitoring_checks со строковым status:

baseline_created — первый успешный снимок для отчёта: файлы baseline на диске, строка в baselines, diff_percent = 0, без дельты.
unchanged — baseline уже был; пайплайн diff вернул diff_percent == 0; новый снимок становится следующим last_baseline.png.
changed — baseline был; diff_percent > 0; ненулевое визуальное отличие при текущей политике до продвижения снимка.
error — исключение в цепочке (Selenium, diff, хранилище). В записи очищаются diff_percent и хеш снимка; в error пишется текст для разбора.

Порог threshold из reports.json используется для метрик (доля проверок, где отличие ниже порога); он не является в коде вторым скрытым условием для ветки changed / unchanged, где сравнивается ноль и положительная доля отличия.

Конфигурация (обзор)

reports.json

Помимо URL и интервала важны threshold (0–100) как шкала для аналитики «мелкий дрейф» и start_time, если позже выровняете расписание под рабочие часы. Отключённые отчёты при загрузке не попадают в планировщик.

Ключевые переменные окружения

PG_* и PG_SCHEMA_ALLOWLIST — целевая БД и защита от применения схемы не туда.
PAGE_LOAD_WAIT, SCREENSHOT_WIDTH, SCREENSHOT_HEIGHT — баланс стабильности и детализации; слишком большой viewport замедляет рендер.
MSE_THRESHOLD, MIN_BLOCK_SIZE, MAX_DEPTH, DIFF_ENABLED — чувствительность quadtree и отрисовка diff-картинки.
RETRY_* — повторы при нестабильной сети или сессии.
POWERBI_USERNAME / POWERBI_PASSWORD и POWERBI_AUTH_SERVER_WHITELIST — опциональная Basic-auth и allowlist хостов.

Чувствительность и настройка

Когда diff шумит

Анимации, живые часы, ротирующие баннеры или недетерминированные веб-шрифты дают пиксельный шум без бизнес-смысла. Что делать: увеличить ожидание, по возможности сузить область снимка до стабильной части отчёта, осторожно поднять MSE_THRESHOLD или MIN_BLOCK_SIZE, убедиться, что сам отчёт не рисует время внутри канваса.

Когда изменения пропускаются

Очень мелкие сдвиги или низкий контраст могут уйти под порог. Ужесточайте MSE_THRESHOLD, уменьшайте MIN_BLOCK_SIZE в разумных пределах и проверяйте, что last_baseline.png обновляется.

Порог отчёта

threshold согласуйте с тем, как читаете below_diff_threshold_rate: это продуктовый рычаг «шум/сигнал» в метриках, а не второй скрытый гейт статуса в коде.

Эксплуатация

Порядок первого запуска

Подготовить PostgreSQL и allowlist, скопировать .env.example в .env.
Установить зависимости, выставить PYTHONPATH=src (или editable install).
python -m pbimonitor --init-db — применить DDL.
python -m pbimonitor --build — построить baseline для всех включённых отчётов.
python -m pbimonitor --check <id> — проверить один отчёт.
python -m pbimonitor --start — непрерывный цикл планировщика.

Docker

docker compose up --build -d поднимает процесс монитора и PostgreSQL; профиль Redis в compose зарезервирован под будущую распределённую очередь и для текущего планировщика в процессе не обязателен.

Предпосылки

Хост с Chrome должен достукиваться до URL отчётов; интегрированная аутентификация может потребовать allowlist или сценариев входа сложнее Basic. Секреты не кладите в git; каталог Data/ считайте чувствительным кэшем, так как на снимках могут быть конфиденциальные цифры.

Ограничения и применимость

Визуальный мониторинг не доказывает арифметическую корректность чисел, только совпадение или расхождение растра с эталоном при выбранной политике diff. Для корректности на уровне строк витрины оставьте профильные data-quality проверки.

Headless-рендер потребляет CPU и память; плотное расписание по многим большим вьюпортам может конкурировать с интерактивными пользователями на той же машине. RLS и различия прав пользователя означают, что baseline нужно снимать под репрезентативными учётными данными.

Пункты roadmap (клики, OCR, внешние API, кластер) — впереди, пока не влиты в релиз; текущий охват — описанный выше визуальный пайплайн.

Глоссарий

Init baseline — первый стабильный снимок, опорный для XOR-дельты.
Last baseline — скользящий кадр для следующего сравнения.
diff_percent — доля площади канваса, помеченная отличной quadtree/MSE.
dhash — перцептивный хеш снимка для отпечатка.
XOR-дельта — сжатый бинарный XOR текущего кадра к init baseline для последующего восстановления.