Creating Alert Rules - Step-by-Step Guide | Nife Deploy

Alert rules are the foundation of your monitoring system. A rule defines WHEN an alert should fire.

What is an Alert Rule?#

An alert rule is a condition you set up that automatically watches your system. When the condition becomes true, an alert fires.

Simple Example:

Rule: "Alert me if CPU usage goes above 80%"
When: CPU reaches 80% or higher
Then: Fire an alert and notify me

Step-by-Step: Create Your First Alert Rule#

Step 1: Navigate to Alert Rules#

Go to Alerts in the main navigation menu
Click the Alert Rules tab
Click the New Rule button

Step 2: Choose What to Monitor#

Select what metric or condition you want to watch.

Infrastructure Metrics:

CPU usage (%)
Memory usage (%)
Disk space remaining (%)
Network traffic (bytes/sec)

Application Metrics:

Error rate (%)
Response time (ms)
Request count (per second)
Success rate (%)

Service Status:

Service is up/down
Endpoint responding (yes/no)
Health check status

Step 3: Set the Threshold#

Define the exact value that triggers the alert:

Examples:

CPU usage > 80%
Memory > 90%
Response time > 5000 ms (5 seconds)
Error rate > 5%
Disk space < 10% remaining
Failed requests ≥ 10

Tips for setting good thresholds:

Start with a conservative value (higher threshold = fewer alerts)
You can always adjust it later
Consider your system's normal patterns
Think about what value actually needs action

Step 4: Name Your Rule#

Give it a clear, specific name that describes the problem:

Good Names:

"Production API - Response Time Over 5 Seconds"
"Database Server - Memory Usage High"
"Web Application - High Error Rate"
"Backup Job - Failed"

Bad Names:

"Alert 1"
"CPU"
"Problem"

Pro Tip: Name it so someone new to your team understands immediately what it monitors.

Step 5: Choose Severity Level#

Select how serious this problem is:

Critical 🔴

Use when: System is down or critical function broken
Examples: App unreachable, database offline, data loss risk
Response time: Immediate (within minutes)

Warning 🟠

Use when: Problem is impacting users or approaching critical
Examples: High response times, approaching resource limits, elevated error rate
Response time: Soon (within 1 hour)

Info 🟡

Use when: Informational only, no urgent action needed
Examples: Deployment completed, scheduled maintenance finished
Response time: As time allows (no rush)

Step 6: Add Description (Optional but Recommended)#

Explain what this alert means and what to do about it:

"Fires when API response time exceeds 5 seconds. 
Usually caused by:
- Database slowness
- High traffic

To fix:
1. Check database performance
2. Review application logs
3. Scale if needed"

Step 7: Save and Enable#

Click Save Rule
The rule is created and appears in your rules list
Toggle the Enabled switch to ON
Rule is now monitoring

Done! Your first alert rule is now active.

Common Alert Rule Examples#

Example 1: Website Down#

What to Monitor: HTTP Status Code
Condition: = 503 or = 0
Severity: Critical
Name: "Website Is Down"
Description: "Service is not responding to requests"

Example 2: High CPU Usage#

What to Monitor: CPU Usage
Condition: > 80%
Severity: Warning
Name: "Server CPU Too High"
Description: "CPU usage is above normal, performance degrading"

Example 3: Low Disk Space#

What to Monitor: Disk Space Free
Condition: < 10%
Severity: Critical
Name: "Disk Space Running Out"
Description: "Less than 10% disk space remaining"

Example 4: Slow API Response#

What to Monitor: API Response Time
Condition: > 5000 ms
Severity: Warning
Name: "API Response Time High"
Description: "API is responding slower than acceptable"

Example 5: High Error Rate#

What to Monitor: Error Rate
Condition: > 5%
Severity: Warning
Name: "Application Error Rate High"
Description: "More than 5% of requests are failing"

Managing Your Rules#

View All Rules#

Go to Alerts → Alert Rules tab to see all your rules:

Rule name
Whether it's enabled
Current status
Last triggered time

Enable/Disable a Rule#

When to disable:

Rule is creating too many false alerts
You're doing maintenance
Rule is outdated

How:

Find the rule in the list
Toggle the Enabled switch
OFF = disabled, ON = enabled

Edit a Rule#

Change a rule's condition, threshold, or name:

Find the rule
Click Edit or the rule name
Change the settings
Click Save

The updated rule applies immediately.

Delete a Rule#

If you no longer need a rule:

Find the rule
Click Delete
Confirm deletion

Note: Deleted rules cannot be recovered.

Best Practices for Alert Rules#

1. Start Simple#

Begin with 2-3 critical rules
Add more as you get comfortable
Don't try to monitor everything at once

2. Use Clear Names#

Someone new should understand immediately
Include the metric and what triggers it
Use consistent naming across your rules

3. Set Realistic Thresholds#

Based on your normal operations
Not so sensitive you get alert fatigue (100+ per day)
Not so loose you miss real problems

4. Document Your Rules#

Explain what the rule monitors
Note what typically causes it
Include steps to fix the issue

5. Review Regularly#

Check which rules fire most often
Identify false alarms
Adjust thresholds based on patterns
Remove rules that aren't useful

6. Test Before Relying#

Create a test rule with very low threshold
Verify it fires when it should
Check that you receive notifications
Delete the test rule when done

Adjusting Your Rules#

Rule Triggers Too Often#

Problem: You're getting 20 alerts per day but only 2 are real issues.

Solution - Increase Threshold:

Original:

CPU > 70%

Updated:

CPU > 85%

Now it only alerts when truly critical.

Rule Never Triggers#

Problem: You never get the alert, even when the problem exists.

Solution - Lower Threshold:

Original:

CPU > 95%

Updated:

CPU > 80%

Now it catches the problem earlier.

Rule No Longer Needed#

Solution:

Disable: Keeps it but turns it off (easy to re-enable)
Delete: Removes it completely (good if truly not needed)

When to Create an Alert Rule#

Create a rule for:

✅ Production issues (outages, errors)
✅ Performance degradation (slowness)
✅ Resource exhaustion (disk full, memory high)
✅ Security events
✅ Important business metrics

Don't create a rule for:

❌ Normal/expected variations
❌ Things you can't act on
❌ Low priority "nice to know" info
❌ Duplicate alerts for the same issue

Alert Rule Templates#

Use these templates to create rules for common scenarios:

API/Web Application:

Response Time High (> 5 seconds)
Error Rate High (> 5%)
Service Down (status code 503)

Database:

CPU Usage High (> 80%)
Memory Usage High (> 90%)
Connection Pool Exhausted

Infrastructure:

Disk Space Low (< 10%)
Network Latency High (> 100ms)
Server Unreachable

Scheduled Jobs:

Job Failed (0 success)
Job Took Too Long (> expected time)
Job Didn't Run

Next Steps#

Now that you've created alert rules:

Configure Notifications - Set up how you're notified
Respond to Alerts - Handle active alerts
Alert Management - More advanced topics

Getting Help#

Questions about creating rules?

Check the examples above
Click the ? icon on the Alerts page
Contact support: [email protected]

Rule isn't working?

Make sure it's Enabled
Check the threshold is set correctly
Verify the metric you're monitoring
Test with a temporary rule first