Notifications
A failing job that nobody knows about is a worse problem than a failing job. Nomaflow's notification layer routes job events — failures, long-running detections, optional successes — to Slack, email or generic webhooks.
The setup is layered:
- Transports (Slack workspace, SMTP server, webhook URLs) are configured once at framework level — they're a property of the install, not the job.
- Job alerts blocks decide which events to emit and who to address.
- Routing picks a transport for each (job tag, recipient) pair.
This page covers the wiring end-to-end.
Transports — framework-level setup
Open Settings → Notifications. The page lists every configured channel.
Each transport carries:
| Field | Notes |
|---|---|
| URL / host | Slack webhook URL · SMTP host:port · generic webhook URL. |
| Credentials | 🔒 encrypted at rest (Slack URL counts as a secret; SMTP password; webhook bearer / signing secret). |
| Default recipient | The channel / address / endpoint used when a job's alerts block doesn't specify recipients. |
| Test button | Sends a one-line "test from Nomaflow" message — confirms the wiring works before a real failure puts it to the test. |
Slack
The default mapping: Slack receives a one-line message styled with the job's state colour (red for failure, yellow for long-run, green for success).
| Setting | Required | Notes |
|---|---|---|
| Webhook URL | Yes | Get this from Slack admin → Apps → Incoming Webhooks. One webhook can fan-out to multiple channels through Slack's own routing. |
| Default channel | No | Overrides the channel baked into the webhook. Use #nomaflow-alerts if you have one. |
| Username | No | The bot name. Defaults to "Nomaflow". |
| Icon emoji | No | Defaults to :gear:. |
Email
Standard SMTP. The framework sends a small HTML message with a link back to the Run detail page.
| Setting | Required | Notes |
|---|---|---|
| Host | Yes | SMTP server hostname. |
| Port | Yes | 587 for STARTTLS, 465 for TLS. |
| Username / Password | If the server requires auth. | Password is 🔒 encrypted. |
| Default from | Yes | Address the mail comes from. |
| TLS mode | Defaults to STARTTLS. | Pick TLS for legacy servers that need it. |
Generic webhook
For OpsGenie, PagerDuty, Mattermost, your own dispatcher — anything that accepts a JSON POST.
| Setting | Required | Notes |
|---|---|---|
| URL | Yes | The endpoint. |
| Headers | No | Auth headers, content-type override. Header values can be 🔒 encrypted. |
| Body template | No | Override the default body shape. Variables: ${job_id}, ${run_id}, ${state}, ${error}, ${started_at}. |
The default body shape:
{
"job_id": "reporting-nightly-sync",
"run_id": "run_a8c4d",
"state": "FAILED",
"triggered_by": "cron",
"started_at": "2026-05-26T02:00:00Z",
"finished_at": "2026-05-26T02:14:22Z",
"error": "OperationalError: …",
"url": "https://liberty.corp.local/nomaflow/runs/run_a8c4d/"
}
The url is the link operators click to reach the Run detail page directly.
Job alerts block
Inside the Job editor's Alerts section:
| Field | Default | Notes |
|---|---|---|
on_failure | true (when the block exists) | Emit on a FAILED run. The most common setting — leave it on. |
on_long_run_minutes | none | Emit a warning if the run is still RUNNING after N minutes. The run keeps going — this is a heads-up, not an abort. |
recipients | [] | Channel-specific identifiers. Empty = use the transport's default recipient. |
Recipients per transport
The recipients field is type-aware — same string can map to several transports.
| Transport | Recipient format | Example |
|---|---|---|
| Slack | #channel or @user | ["#data-oncall", "@alice"] |
| RFC 5322 address | ["data-team@corp.local"] | |
| Webhook | Endpoint id (registered at framework level) | ["opsgenie-data"] |
When a job is fired and an alert is due, the framework iterates over the recipient list. Each recipient matches one transport (by format / by registration); each matched transport gets the alert.
When recipients is empty
The transport's default recipient is used (#nomaflow-alerts for Slack, the SMTP default-from for mail, the webhook's primary URL). This is the right default for most installs — one job-level place sets policy, transports decide.
Per-transport routing by tag
Some installs want different teams to receive different jobs. The framework's notification routing supports tag rules:
| Job tag | Routes to | Configured in |
|---|---|---|
team-data | Slack #data-team | Settings → Notifications → Routing rules. |
team-security | Email security@corp.local | Same. |
team-platform | PagerDuty webhook | Same. |
A job tagged team-data, etl flows through both the team-data rule and any tag-less default. The rule engine de-duplicates so a single recipient doesn't get the same message twice.
What events emit what
| Event | Triggered by | Default level |
|---|---|---|
| Run failed | on_failure = true (default). | High — pages people. |
| Run long-running | on_long_run_minutes = N set, run still in flight past N. | Medium — warning. |
| Run succeeded | Set on_success = true (off by default — most installs don't want it). | Low. |
| Job re-enabled / disabled | Operator toggled the catalogue card. | Low — informational, off by default. |
| Run cancelled by user | Operator clicked ✕ Cancel. | Low — visible in the run history; alert is opt-in. |
on_success is intentionally off by default. A job that runs hourly successfully ten thousand times a year shouldn't generate ten thousand "OK!" messages. Turn it on for high-value jobs where success itself is news ("the monthly report was delivered").
Anatomy of a failure alert
When a run fails:
- The runner writes the
FAILEDstate to the run row. - The notifications layer reads the job's alerts block.
- For each matched recipient, it builds a transport-specific message:
| Transport | Message |
|---|---|
| Slack | One-line red message: ❌ reporting-nightly-sync run failed at step copy-orders · OperationalError: connection refused with a "View run" link. |
Subject: [Nomaflow] FAILED reporting-nightly-sync. Body: same one-liner plus the full traceback as a code block, plus a link. | |
| Webhook | JSON POST as described above. |
- The HTTP / SMTP call is fire-and-forget. If the upstream is unreachable, the notification fails — but the run's failure is already recorded. Nomaflow doesn't retry notification delivery (a flaky notification path shouldn't fail a job).
The framework log records every notification attempt with its outcome — search there if a recipient reports "I didn't get the page".
Sending success alerts conditionally
A common pattern: alert on success only for jobs where success itself matters. Two ways to do it:
| Pattern | How |
|---|---|
| Per-job flag. | Set on_success = true in the job's alerts block. Fires on every successful run. |
| In-step push. | Add an HTTP step at the end of the job that POSTs to the webhook. Fires only when the preceding steps succeed (because steps run in order). Gives you full control over the message body. |
The second pattern is what the Scheduled DB sync recipe uses — the success notification is just another step.
Quiet hours
Some installs don't want pages at 03:00 for low-priority jobs. Two approaches:
| Pattern | Behaviour |
|---|---|
| Tag-based routing. | A low-priority tag routes to email (no page); a high-priority tag routes to PagerDuty (pages). The setting is per job. |
| Recipient-side rules. | The recipient channel (PagerDuty's own service policies, Slack's notification preferences) handles quiet hours. Nomaflow always sends; the receiver mutes. |
The second approach scales better — Nomaflow has one notification policy (alert on every failure), the receivers decide what to do with it. Adding a "quiet hours" mode to Nomaflow itself would multiply the moving parts.
Common pitfalls
| Mistake | Symptom | Fix |
|---|---|---|
Webhook URL stored in plain text in jobs.toml. | URL leaks into version control. | Always store at framework level (🔒 encrypted), reference by transport name. |
| Test button green, real alert never lands. | Network blocks production-time traffic that the test allowed. | Check firewall rules; the test uses the same transport but at config save time. |
on_success = true on every job. | Channel is full of green ticks; failures get lost in the noise. | Turn off on_success except where it matters. |
on_long_run_minutes = 1 on an ETL that always takes 5. | Spurious warnings every night. | Tune to the job's normal runtime + headroom. |
| Multiple recipients on the same channel. | Same message delivered twice. | The de-dup engine should catch this; if not, narrow the recipients list. |
Inspecting notification history
The framework log records every notification dispatch with its outcome (SENT, FAILED_TRANSPORT, FAILED_DELIVERY). Search the log for notification to find delivery issues.
For a long-term audit (six months back: who got paged for what?), some installs add a small notification audit table populated by a Python helper. The framework doesn't ship this by default — most teams find the framework log sufficient.
What's next
- Administration — restart behaviour and multi-replica notifications.
- Recipe — Scheduled DB sync — uses the failure alert path end-to-end.
- Custom Python steps — push notifications from within a step.