Skip to main content

Alerts Quick Reference

Quick answers to common alert questions.


I Want To...

Create an Alert Rule

Alerts → Alert Rules → New Rule

Choose what to monitor (CPU, errors, etc.)

Set threshold (> 80%, < 10%, etc.)

Choose severity (Critical/Warning/Info)

Name it clearly

Save and Enable

Time: 2-3 minutes


Get Alert Notifications

Alerts → Alert Config → Add Channel

Choose type (Email/Slack/PagerDuty/Webhook)

Enter your details

Test the channel

Link to your alert rule

Time: 5-10 minutes


Respond to a Firing Alert

Get notification

Click link or go to SRE → Alerts

Find the alert with Firing status

Click Acknowledge

Investigate and fix the issue

Click Resolve

Time: Varies on issue complexity


Find a Specific Alert

Use the filters:

Status: Firing / Acknowledged / Resolved
Severity: Critical / High / Medium

Or scroll through the list at SRE → Alerts


Fix Too Many Alerts

Step 1: Find the noisy rule

Alerts → Alert Rules → Look for frequently triggered rule

Step 2: Increase threshold

Example: Change from CPU &gt; 70% to CPU &gt; 85%

Step 3: Test and adjust as needed


Test if Notifications Work

Alerts → Alert Config

Find your channel

Click "Test"

Check if you received it

Disable Alerts Temporarily

Alerts → Alert Rules

Find the rule

Toggle Enabled OFF

(Remember to turn back ON later!)

Quick Reference Tables

Alert Statuses

StatusIconColorMeaningAction
Firing🔔RedActive alert, needs attentionAcknowledge & Fix
Acknowledged⏱️YellowSomeone investigatingWait or help
ResolvedGreenIssue is fixedNone

Severity Levels

SeverityIconWhenResponse Time
Critical🔴Immediate action neededMinutes
Warning🟠Important, needs attention1 hour
Info🟡FYI informationAs time allows

Notification Channels

ChannelSpeedBest ForSetup Time
EmailSlowDocumentation1 min
SlackFastTeam visibility2 min
PagerDutyImmediateCritical alerts5 min
WebhookDependsCustom integration10+ min

Common Alert Examples

Website/API

AlertThresholdSeverity
Service DownHTTP 503Critical
Slow Response> 5 secondsWarning
High Error Rate> 5%Warning
CPU High> 80%Warning

Database

AlertThresholdSeverity
Service DownNot respondingCritical
CPU High> 80%Warning
Memory High> 90%Warning
Low Disk Space< 10%Critical

Infrastructure

AlertThresholdSeverity
Server DownUnreachableCritical
Disk Full< 5%Critical
High Latency> 100msWarning
Network Error> 1% lossWarning

Troubleshooting

Alert Won't Fire

Check:

  1. Is rule Enabled? (toggle should be ON)
  2. Is threshold correct? (Test with extreme value)
  3. Is metric actually hitting threshold?

Fix:

  • Enable the rule
  • Lower threshold temporarily to test
  • Verify correct metric is selected

Not Getting Notifications

Check:

  1. Click "Test" on your channel
  2. Did you receive test?
  3. Check email spam folder
  4. Check Slack channel settings
  5. Verify email/phone is correct

Fix:

  • Resend test
  • Check spam folder
  • Verify email/channel settings
  • Check connection/credentials

Too Many Alerts

Solutions:

  1. Increase threshold (less sensitive)
  2. Disable low-priority rules
  3. Use digest instead of single emails
  4. Combine related alerts

Example:

Before: CPU &gt; 70% = 50 alerts/day
After: CPU &gt; 85% = 5 alerts/day

False Alarms

Why: Threshold too sensitive

Fix:

  1. Review when it fired
  2. Increase threshold
  3. Make condition more specific
  4. Or delete if not important

Duplicate Alerts

Problem: Same issue creates multiple alerts

Fix:

  1. Delete duplicate rules
  2. Keep only one rule per metric
  3. Or adjust thresholds so only one fires

Time-Saving Tips

Fastest to Respond

Get Slack notification (2 sec)

Click link (1 sec)

Acknowledge (5 sec)

Investigate (5-30 min)

Resolve (5 sec)
Total: < 1 minute to acknowledge

Fastest to Create Rule

New Rule → Pick template → Set threshold
→ Name it → Save → Enable
Total: ~1 minute

Fastest to Test Channel

Alert Config → Find channel → Click Test
→ Receive notification
Total: ~30 seconds

Key Locations

NeedPath
Create ruleAlerts → Alert Rules
Configure notificationsAlerts → Alert Config
View alertsSRE → Alerts
Help? button on page

Common Mistakes

MistakeImpactFix
Too many alertsAlert fatigueIncrease thresholds
Vague namesConfusionUse specific names
No testingAlerts don't workAlways test
Never adjustingGets worse over timeReview monthly
Not documentingTeam confusionDocument each rule

Best Threshold Examples

CPU Usage

Normal: 20-50%
Peak: 60-70%
Alert: 80% ← Not too sensitive
Critical: 95% ← Only if really bad

Memory Usage

Normal: 30-60%
Peak: 70-80%
Alert: 90% ← Getting concerning
Critical: 95% ← Critically high

Response Time

Normal: 100-300ms
Slow: 500-1000ms
Alert: 2000ms ← Warning level
Critical: 5000ms ← Very slow

Error Rate

Normal: 0-0.5%
Alert: 1% ← One in hundred
Critical: 5% ← One in twenty

Checklist: First Alert Setup

  • Created alert rule
  • Set realistic threshold
  • Chose appropriate severity
  • Added clear name
  • Set up notification channel
  • Tested notification
  • Linked rule to channel
  • Enabled the rule
  • Documented the rule
  • Tested end-to-end

Do's and Don'ts

Do ✅

  • ✅ Create alerts for important things
  • ✅ Use clear, specific names
  • ✅ Test before relying
  • ✅ Respond quickly to alerts
  • ✅ Review and adjust regularly

Don't ❌

  • ❌ Create too many rules
  • ❌ Use vague names
  • ❌ Skip testing
  • ❌ Ignore alerts
  • ❌ Let system grow stale

Quick Definitions

Alert Rule Defines WHEN an alert should fire (the condition)

Notification Channel Defines HOW you get notified (email, Slack, etc.)

Severity How serious the alert is (Critical/Warning/Info)

Threshold The value that triggers the alert (> 80%, < 10%, etc.)

Status Current state of alert (Firing/Acknowledged/Resolved)


Response Times

ActionTime
Create rule2-3 min
Setup channel5-10 min
Acknowledge alert30 sec
Resolve alert30 sec
Test channel30 sec

NeedContact
Questions[email protected]
Help on page? button
DocumentationAlerts Overview
Best PracticesBest Practices


Last Updated: January 2026

Need Help? [email protected]