Alert Management & Response | Incident Monitoring Guide | Nife Docs

Learn about the alerts on your dashboard and how to respond to them.

What are Alerts?#

Alerts are notifications about issues, warnings, or important events in your infrastructure. They help you:

  • Detect problems - Before they impact users
  • Respond quickly - To critical issues
  • Monitor health - Of your entire system
  • Track changes - Important events and updates

Alert Types#

Critical Alerts ๐Ÿ”ด#

Immediate action required

  • Service down or unavailable
  • Data loss risk
  • Security issue
  • Resource exhaustion

Response time: Immediately (within minutes)

Warning Alerts ๐ŸŸก#

Attention needed soon

  • High resource usage
  • Performance degradation
  • Configuration issues
  • Approaching limits

Response time: Within hours

Info Alerts โšช#

Informational only

  • Successful deployment
  • Maintenance completed
  • Configuration changes
  • Routine information

Response time: For reference

Alert Severity Levels#

SeverityIconColorMeaningAction
Critical๐Ÿ”ดRedUrgent issueImmediate
Warning๐ŸŸกOrangeNeeds attentionSoon
InfoโšชBlueFYIReference

Reading Alert Messages#

Each alert shows:

  • Alert title - What the issue is
  • Severity - How urgent it is
  • Timestamp - When it occurred
  • Details - More information about the issue

Example Alert Messages#

Critical Alert: "Application down: Payment Service unavailable for 5 minutes"

  • Severity: Critical
  • Action: Investigate immediately
  • Next step: Check app status, restart if needed

Warning Alert: "High CPU usage: API server 85% utilization"

  • Severity: Warning
  • Action: Monitor or scale up
  • Next step: Check performance, increase resources

Info Alert: "Deployment successful: New version of website deployed"

  • Severity: Info
  • Action: None required
  • Next step: Monitor for issues

Viewing Alerts#

On Dashboard#

  1. Find the Active Alerts section
  2. See up to 5 recent alerts
  3. Click alert for more details

Full Alert List#

  1. Click View All in alerts section
  2. Or navigate to Monitoring โ†’ Alerts
  3. See complete alert history
  4. Filter and search alerts

Responding to Alerts#

Critical Alert Response#

  1. Read the alert - Understand the issue
  2. Assess impact - How does this affect users?
  3. Take action:
    • Restart service
    • Scale up resources
    • Rollback deployment
    • Contact support
  4. Verify fix - Confirm issue is resolved
  5. Document - Note what happened and how you fixed it

Warning Alert Response#

  1. Investigate - Understand the cause
  2. Monitor - Watch the situation
  3. Take action if needed:
    • Optimize performance
    • Increase resources
    • Fix configuration
  4. Prevent recurrence - Plan long-term solution

Info Alert Response#

  1. Review - Note the information
  2. Archive - Mark as read if needed
  3. No action usually required

Common Alert Scenarios#

Scenario: High CPU Usage Alert#

Alert: "CPU usage: 95% on app server"

Actions:

  1. Check what's using CPU
  2. Optimize code if possible
  3. Increase instance size
  4. Add more instances
  5. Monitor improvement

Scenario: Deployment Failed Alert#

Alert: "Deployment failed: Image pull error"

Actions:

  1. Check Docker image registry
  2. Verify credentials
  3. Check image availability
  4. Retry deployment
  5. Investigate root cause

Scenario: Database Connection Alert#

Alert: "Database connections: 450/500 limit"

Actions:

  1. Check database query efficiency
  2. Add connection pooling
  3. Increase connection limit
  4. Optimize queries
  5. Monitor usage

Scenario: Service Down Alert#

Alert: "Application unavailable: API Service"

Actions:

  1. Check service status immediately
  2. Review recent changes
  3. Check logs for errors
  4. Restart service if safe
  5. Rollback if necessary
  6. Contact support if needed

Alert Management#

Marking Alerts as Read#

  1. In the alerts section, click Mark all as read
  2. Or click individual alert to mark
  3. Read alerts stay visible but marked

Viewing Alert History#

  1. Go to Monitoring โ†’ Alerts
  2. See all alerts (new and old)
  3. Filter by severity
  4. Filter by date range
  5. Search by keyword

Setting Alert Rules#

Create custom alerts for:

  • Specific resources
  • Threshold values
  • Application errors
  • Performance metrics

See Alert Configuration for details.

Best Practices for Alert Management#

โœ“ Act quickly on critical alerts - Don't delay
โœ“ Read the full message - Understand context
โœ“ Document responses - Keep records
โœ“ Set up notifications - Get alerted via email or Slack
โœ“ Review alert history - Identify patterns
โœ“ Adjust thresholds - Reduce false alarms
โœ“ Team communication - Notify team of issues
โœ“ Escalate if needed - Contact support for help

Preventing Alerts#

Proactive Monitoring#

  1. Regular checks - Review metrics daily
  2. Capacity planning - Don't run near limits
  3. Code optimization - Reduce resource usage
  4. Health checks - Ensure services are responding
  5. Load testing - Test before high-traffic events

Configuration#

  1. Set reasonable thresholds - Not too sensitive
  2. Right-size resources - Match actual needs
  3. Plan growth - Scale before hitting limits
  4. Automate scaling - Use auto-scaling rules
  5. Redundancy - Have backups for critical services

Alert Notifications#

Email Notifications#

  • Receive critical alerts via email
  • Immediate for urgent issues
  • Digest emails for less urgent

Slack Notifications#

  • Real-time alerts in Slack
  • Integrate with your workflow
  • Team visibility

Configure Notifications#

  1. Go to Settings โ†’ Notifications
  2. Choose notification method
  3. Select alert types to receive
  4. Set notification schedule

Troubleshooting#

Not Receiving Alerts?#

  1. Check notification settings
  2. Verify email address
  3. Check Slack workspace connection
  4. Look in spam/junk folder
  5. Contact support if still not working

Getting too many alerts?#

  1. Adjust threshold values
  2. Remove false alarm rules
  3. Group related alerts
  4. Filter less important severities
  5. Set quiet hours if available

Alert seems wrong?#

  1. Verify the data it's based on
  2. Check system status independently
  3. Investigate recent changes
  4. Consider updating threshold
  5. Report to support if it's a bug

Related Topics#