Alerts Quick Reference & Cheatsheet | Nife Deploy

Quick answers to common alert questions.


I Want To...#

Create an Alert Rule#

Alerts โ†’ Alert Rules โ†’ New Rule
โ†“
Choose what to monitor (CPU, errors, etc.)
โ†“
Set threshold (> 80%, < 10%, etc.)
โ†“
Choose severity (Critical/Warning/Info)
โ†“
Name it clearly
โ†“
Save and Enable

Time: 2-3 minutes


Get Alert Notifications#

Alerts โ†’ Alert Config โ†’ Add Channel
โ†“
Choose type (Email/Slack/PagerDuty/Webhook)
โ†“
Enter your details
โ†“
Test the channel
โ†“
Link to your alert rule

Time: 5-10 minutes


Respond to a Firing Alert#

Get notification
โ†“
Click link or go to SRE โ†’ Alerts
โ†“
Find the alert with Firing status
โ†“
Click Acknowledge
โ†“
Investigate and fix the issue
โ†“
Click Resolve

Time: Varies on issue complexity


Find a Specific Alert#

Use the filters:

Status: Firing / Acknowledged / Resolved
Severity: Critical / High / Medium

Or scroll through the list at SRE โ†’ Alerts


Fix Too Many Alerts#

Step 1: Find the noisy rule

Alerts โ†’ Alert Rules โ†’ Look for frequently triggered rule

Step 2: Increase threshold

Example: Change from CPU > 70% to CPU > 85%

Step 3: Test and adjust as needed


Test if Notifications Work#

Alerts โ†’ Alert Config
โ†“
Find your channel
โ†“
Click "Test"
โ†“
Check if you received it

Disable Alerts Temporarily#

Alerts โ†’ Alert Rules
โ†“
Find the rule
โ†“
Toggle Enabled OFF
โ†“
(Remember to turn back ON later!)

Quick Reference Tables#

Alert Statuses#

StatusIconColorMeaningAction
Firing๐Ÿ””RedActive alert, needs attentionAcknowledge & Fix
Acknowledgedโฑ๏ธYellowSomeone investigatingWait or help
Resolvedโœ“GreenIssue is fixedNone

Severity Levels#

SeverityIconWhenResponse Time
Critical๐Ÿ”ดImmediate action neededMinutes
Warning๐ŸŸ Important, needs attention1 hour
Info๐ŸŸกFYI informationAs time allows

Notification Channels#

ChannelSpeedBest ForSetup Time
EmailSlowDocumentation1 min
SlackFastTeam visibility2 min
PagerDutyImmediateCritical alerts5 min
WebhookDependsCustom integration10+ min

Common Alert Examples#

Website/API#

AlertThresholdSeverity
Service DownHTTP 503Critical
Slow Response> 5 secondsWarning
High Error Rate> 5%Warning
CPU High> 80%Warning

Database#

AlertThresholdSeverity
Service DownNot respondingCritical
CPU High> 80%Warning
Memory High> 90%Warning
Low Disk Space< 10%Critical

Infrastructure#

AlertThresholdSeverity
Server DownUnreachableCritical
Disk Full< 5%Critical
High Latency> 100msWarning
Network Error> 1% lossWarning

Troubleshooting#

Alert Won't Fire#

Check:

  1. Is rule Enabled? (toggle should be ON)
  2. Is threshold correct? (Test with extreme value)
  3. Is metric actually hitting threshold?

Fix:

  • Enable the rule
  • Lower threshold temporarily to test
  • Verify correct metric is selected

Not Getting Notifications#

Check:

  1. Click "Test" on your channel
  2. Did you receive test?
  3. Check email spam folder
  4. Check Slack channel settings
  5. Verify email/phone is correct

Fix:

  • Resend test
  • Check spam folder
  • Verify email/channel settings
  • Check connection/credentials

Too Many Alerts#

Solutions:

  1. Increase threshold (less sensitive)
  2. Disable low-priority rules
  3. Use digest instead of single emails
  4. Combine related alerts

Example:

Before: CPU > 70% = 50 alerts/day
After: CPU > 85% = 5 alerts/day

False Alarms#

Why: Threshold too sensitive

Fix:

  1. Review when it fired
  2. Increase threshold
  3. Make condition more specific
  4. Or delete if not important

Duplicate Alerts#

Problem: Same issue creates multiple alerts

Fix:

  1. Delete duplicate rules
  2. Keep only one rule per metric
  3. Or adjust thresholds so only one fires

Time-Saving Tips#

Fastest to Respond#

Get Slack notification (2 sec)
โ†“
Click link (1 sec)
โ†“
Acknowledge (5 sec)
โ†“
Investigate (5-30 min)
โ†“
Resolve (5 sec)
Total: < 1 minute to acknowledge

Fastest to Create Rule#

New Rule โ†’ Pick template โ†’ Set threshold
โ†’ Name it โ†’ Save โ†’ Enable
Total: ~1 minute

Fastest to Test Channel#

Alert Config โ†’ Find channel โ†’ Click Test
โ†’ Receive notification
Total: ~30 seconds

Key Locations#

NeedPath
Create ruleAlerts โ†’ Alert Rules
Configure notificationsAlerts โ†’ Alert Config
View alertsSRE โ†’ Alerts
Help? button on page

Common Mistakes#

MistakeImpactFix
Too many alertsAlert fatigueIncrease thresholds
Vague namesConfusionUse specific names
No testingAlerts don't workAlways test
Never adjustingGets worse over timeReview monthly
Not documentingTeam confusionDocument each rule

Best Threshold Examples#

CPU Usage#

Normal: 20-50%
Peak: 60-70%
Alert: 80% โ† Not too sensitive
Critical: 95% โ† Only if really bad

Memory Usage#

Normal: 30-60%
Peak: 70-80%
Alert: 90% โ† Getting concerning
Critical: 95% โ† Critically high

Response Time#

Normal: 100-300ms
Slow: 500-1000ms
Alert: 2000ms โ† Warning level
Critical: 5000ms โ† Very slow

Error Rate#

Normal: 0-0.5%
Alert: 1% โ† One in hundred
Critical: 5% โ† One in twenty

Checklist: First Alert Setup#

  • Created alert rule
  • Set realistic threshold
  • Chose appropriate severity
  • Added clear name
  • Set up notification channel
  • Tested notification
  • Linked rule to channel
  • Enabled the rule
  • Documented the rule
  • Tested end-to-end

Do's and Don'ts#

Do โœ…#

  • โœ… Create alerts for important things
  • โœ… Use clear, specific names
  • โœ… Test before relying
  • โœ… Respond quickly to alerts
  • โœ… Review and adjust regularly

Don't โŒ#

  • โŒ Create too many rules
  • โŒ Use vague names
  • โŒ Skip testing
  • โŒ Ignore alerts
  • โŒ Let system grow stale

Quick Definitions#

Alert Rule Defines WHEN an alert should fire (the condition)

Notification Channel Defines HOW you get notified (email, Slack, etc.)

Severity How serious the alert is (Critical/Warning/Info)

Threshold The value that triggers the alert (> 80%, < 10%, etc.)

Status Current state of alert (Firing/Acknowledged/Resolved)


Response Times#

ActionTime
Create rule2-3 min
Setup channel5-10 min
Acknowledge alert30 sec
Resolve alert30 sec
Test channel30 sec

Support & Links#

NeedContact
Questions[email protected]
Help on page? button
DocumentationAlerts Overview
Best PracticesBest Practices

Related Pages#


Last Updated: January 2026

Need Help? [email protected]