Creating Alert Rules - Step-by-Step Guide | Nife Deploy
Alert rules are the foundation of your monitoring system. A rule defines WHEN an alert should fire.
What is an Alert Rule?#
An alert rule is a condition you set up that automatically watches your system. When the condition becomes true, an alert fires.
Simple Example:
Step-by-Step: Create Your First Alert Rule#
Step 1: Navigate to Alert Rules#
- Go to Alerts in the main navigation menu
- Click the Alert Rules tab
- Click the New Rule button
Step 2: Choose What to Monitor#
Select what metric or condition you want to watch.
Infrastructure Metrics:
- CPU usage (%)
- Memory usage (%)
- Disk space remaining (%)
- Network traffic (bytes/sec)
Application Metrics:
- Error rate (%)
- Response time (ms)
- Request count (per second)
- Success rate (%)
Service Status:
- Service is up/down
- Endpoint responding (yes/no)
- Health check status
Step 3: Set the Threshold#
Define the exact value that triggers the alert:
Examples:
- CPU usage > 80%
- Memory > 90%
- Response time > 5000 ms (5 seconds)
- Error rate > 5%
- Disk space < 10% remaining
- Failed requests โฅ 10
Tips for setting good thresholds:
- Start with a conservative value (higher threshold = fewer alerts)
- You can always adjust it later
- Consider your system's normal patterns
- Think about what value actually needs action
Step 4: Name Your Rule#
Give it a clear, specific name that describes the problem:
Good Names:
- "Production API - Response Time Over 5 Seconds"
- "Database Server - Memory Usage High"
- "Web Application - High Error Rate"
- "Backup Job - Failed"
Bad Names:
- "Alert 1"
- "CPU"
- "Problem"
Pro Tip: Name it so someone new to your team understands immediately what it monitors.
Step 5: Choose Severity Level#
Select how serious this problem is:
Critical ๐ด
- Use when: System is down or critical function broken
- Examples: App unreachable, database offline, data loss risk
- Response time: Immediate (within minutes)
Warning ๐
- Use when: Problem is impacting users or approaching critical
- Examples: High response times, approaching resource limits, elevated error rate
- Response time: Soon (within 1 hour)
Info ๐ก
- Use when: Informational only, no urgent action needed
- Examples: Deployment completed, scheduled maintenance finished
- Response time: As time allows (no rush)
Step 6: Add Description (Optional but Recommended)#
Explain what this alert means and what to do about it:
Step 7: Save and Enable#
- Click Save Rule
- The rule is created and appears in your rules list
- Toggle the Enabled switch to ON
- Rule is now monitoring
Done! Your first alert rule is now active.
Common Alert Rule Examples#
Example 1: Website Down#
Example 2: High CPU Usage#
Example 3: Low Disk Space#
Example 4: Slow API Response#
Example 5: High Error Rate#
Managing Your Rules#
View All Rules#
Go to Alerts โ Alert Rules tab to see all your rules:
- Rule name
- Whether it's enabled
- Current status
- Last triggered time
Enable/Disable a Rule#
When to disable:
- Rule is creating too many false alerts
- You're doing maintenance
- Rule is outdated
How:
- Find the rule in the list
- Toggle the Enabled switch
- OFF = disabled, ON = enabled
Edit a Rule#
Change a rule's condition, threshold, or name:
- Find the rule
- Click Edit or the rule name
- Change the settings
- Click Save
The updated rule applies immediately.
Delete a Rule#
If you no longer need a rule:
- Find the rule
- Click Delete
- Confirm deletion
Note: Deleted rules cannot be recovered.
Best Practices for Alert Rules#
1. Start Simple#
- Begin with 2-3 critical rules
- Add more as you get comfortable
- Don't try to monitor everything at once
2. Use Clear Names#
- Someone new should understand immediately
- Include the metric and what triggers it
- Use consistent naming across your rules
3. Set Realistic Thresholds#
- Based on your normal operations
- Not so sensitive you get alert fatigue (100+ per day)
- Not so loose you miss real problems
4. Document Your Rules#
- Explain what the rule monitors
- Note what typically causes it
- Include steps to fix the issue
5. Review Regularly#
- Check which rules fire most often
- Identify false alarms
- Adjust thresholds based on patterns
- Remove rules that aren't useful
6. Test Before Relying#
- Create a test rule with very low threshold
- Verify it fires when it should
- Check that you receive notifications
- Delete the test rule when done
Adjusting Your Rules#
Rule Triggers Too Often#
Problem: You're getting 20 alerts per day but only 2 are real issues.
Solution - Increase Threshold:
Original:
Updated:
Now it only alerts when truly critical.
Rule Never Triggers#
Problem: You never get the alert, even when the problem exists.
Solution - Lower Threshold:
Original:
Updated:
Now it catches the problem earlier.
Rule No Longer Needed#
Solution:
- Disable: Keeps it but turns it off (easy to re-enable)
- Delete: Removes it completely (good if truly not needed)
When to Create an Alert Rule#
Create a rule for:
- โ Production issues (outages, errors)
- โ Performance degradation (slowness)
- โ Resource exhaustion (disk full, memory high)
- โ Security events
- โ Important business metrics
Don't create a rule for:
- โ Normal/expected variations
- โ Things you can't act on
- โ Low priority "nice to know" info
- โ Duplicate alerts for the same issue
Alert Rule Templates#
Use these templates to create rules for common scenarios:
API/Web Application:
- Response Time High (> 5 seconds)
- Error Rate High (> 5%)
- Service Down (status code 503)
Database:
- CPU Usage High (> 80%)
- Memory Usage High (> 90%)
- Connection Pool Exhausted
Infrastructure:
- Disk Space Low (< 10%)
- Network Latency High (> 100ms)
- Server Unreachable
Scheduled Jobs:
- Job Failed (0 success)
- Job Took Too Long (> expected time)
- Job Didn't Run
Next Steps#
Now that you've created alert rules:
- Configure Notifications - Set up how you're notified
- Respond to Alerts - Handle active alerts
- Alert Management - More advanced topics
Getting Help#
Questions about creating rules?
- Check the examples above
- Click the ? icon on the Alerts page
- Contact support: [email protected]
Rule isn't working?
- Make sure it's Enabled
- Check the threshold is set correctly
- Verify the metric you're monitoring
- Test with a temporary rule first