Alert Management & Response | Incident Monitoring Guide | Nife Docs
Learn about the alerts on your dashboard and how to respond to them.
What are Alerts?#
Alerts are notifications about issues, warnings, or important events in your infrastructure. They help you:
- Detect problems - Before they impact users
- Respond quickly - To critical issues
- Monitor health - Of your entire system
- Track changes - Important events and updates
Alert Types#
Critical Alerts ๐ด#
Immediate action required
- Service down or unavailable
- Data loss risk
- Security issue
- Resource exhaustion
Response time: Immediately (within minutes)
Warning Alerts ๐ก#
Attention needed soon
- High resource usage
- Performance degradation
- Configuration issues
- Approaching limits
Response time: Within hours
Info Alerts โช#
Informational only
- Successful deployment
- Maintenance completed
- Configuration changes
- Routine information
Response time: For reference
Alert Severity Levels#
| Severity | Icon | Color | Meaning | Action |
|---|---|---|---|---|
| Critical | ๐ด | Red | Urgent issue | Immediate |
| Warning | ๐ก | Orange | Needs attention | Soon |
| Info | โช | Blue | FYI | Reference |
Reading Alert Messages#
Each alert shows:
- Alert title - What the issue is
- Severity - How urgent it is
- Timestamp - When it occurred
- Details - More information about the issue
Example Alert Messages#
Critical Alert: "Application down: Payment Service unavailable for 5 minutes"
- Severity: Critical
- Action: Investigate immediately
- Next step: Check app status, restart if needed
Warning Alert: "High CPU usage: API server 85% utilization"
- Severity: Warning
- Action: Monitor or scale up
- Next step: Check performance, increase resources
Info Alert: "Deployment successful: New version of website deployed"
- Severity: Info
- Action: None required
- Next step: Monitor for issues
Viewing Alerts#
On Dashboard#
- Find the Active Alerts section
- See up to 5 recent alerts
- Click alert for more details
Full Alert List#
- Click View All in alerts section
- Or navigate to Monitoring โ Alerts
- See complete alert history
- Filter and search alerts
Responding to Alerts#
Critical Alert Response#
- Read the alert - Understand the issue
- Assess impact - How does this affect users?
- Take action:
- Restart service
- Scale up resources
- Rollback deployment
- Contact support
- Verify fix - Confirm issue is resolved
- Document - Note what happened and how you fixed it
Warning Alert Response#
- Investigate - Understand the cause
- Monitor - Watch the situation
- Take action if needed:
- Optimize performance
- Increase resources
- Fix configuration
- Prevent recurrence - Plan long-term solution
Info Alert Response#
- Review - Note the information
- Archive - Mark as read if needed
- No action usually required
Common Alert Scenarios#
Scenario: High CPU Usage Alert#
Alert: "CPU usage: 95% on app server"
Actions:
- Check what's using CPU
- Optimize code if possible
- Increase instance size
- Add more instances
- Monitor improvement
Scenario: Deployment Failed Alert#
Alert: "Deployment failed: Image pull error"
Actions:
- Check Docker image registry
- Verify credentials
- Check image availability
- Retry deployment
- Investigate root cause
Scenario: Database Connection Alert#
Alert: "Database connections: 450/500 limit"
Actions:
- Check database query efficiency
- Add connection pooling
- Increase connection limit
- Optimize queries
- Monitor usage
Scenario: Service Down Alert#
Alert: "Application unavailable: API Service"
Actions:
- Check service status immediately
- Review recent changes
- Check logs for errors
- Restart service if safe
- Rollback if necessary
- Contact support if needed
Alert Management#
Marking Alerts as Read#
- In the alerts section, click Mark all as read
- Or click individual alert to mark
- Read alerts stay visible but marked
Viewing Alert History#
- Go to Monitoring โ Alerts
- See all alerts (new and old)
- Filter by severity
- Filter by date range
- Search by keyword
Setting Alert Rules#
Create custom alerts for:
- Specific resources
- Threshold values
- Application errors
- Performance metrics
See Alert Configuration for details.
Best Practices for Alert Management#
โ Act quickly on critical alerts - Don't delay
โ Read the full message - Understand context
โ Document responses - Keep records
โ Set up notifications - Get alerted via email or Slack
โ Review alert history - Identify patterns
โ Adjust thresholds - Reduce false alarms
โ Team communication - Notify team of issues
โ Escalate if needed - Contact support for help
Preventing Alerts#
Proactive Monitoring#
- Regular checks - Review metrics daily
- Capacity planning - Don't run near limits
- Code optimization - Reduce resource usage
- Health checks - Ensure services are responding
- Load testing - Test before high-traffic events
Configuration#
- Set reasonable thresholds - Not too sensitive
- Right-size resources - Match actual needs
- Plan growth - Scale before hitting limits
- Automate scaling - Use auto-scaling rules
- Redundancy - Have backups for critical services
Alert Notifications#
Email Notifications#
- Receive critical alerts via email
- Immediate for urgent issues
- Digest emails for less urgent
Slack Notifications#
- Real-time alerts in Slack
- Integrate with your workflow
- Team visibility
Configure Notifications#
- Go to Settings โ Notifications
- Choose notification method
- Select alert types to receive
- Set notification schedule
Troubleshooting#
Not Receiving Alerts?#
- Check notification settings
- Verify email address
- Check Slack workspace connection
- Look in spam/junk folder
- Contact support if still not working
Getting too many alerts?#
- Adjust threshold values
- Remove false alarm rules
- Group related alerts
- Filter less important severities
- Set quiet hours if available
Alert seems wrong?#
- Verify the data it's based on
- Check system status independently
- Investigate recent changes
- Consider updating threshold
- Report to support if it's a bug