Monitor VM Performance | Real-Time Metrics and Health Status Guide
Monitor your VM instances' performance, health status, and activity in real-time from the VMs Dashboard.
Overview#
The Monitoring section within the Instance Detail Panel provides comprehensive performance tracking and health monitoring for your VM instances.
Accessing Monitoring#
From VM Card#
Quick Access to Monitoring
- Locate the VM instance on the dashboard
- Scroll to the bottom action buttons
- Click the Monitoring Button (Activity icon)
- Monitoring dashboard opens
- Real-time metrics display immediately
Monitoring Dashboard#
Dashboard Components
- Real-time metric displays
- Historical trend graphs
- Performance alerts
- Activity log
- Health status indicators
- Recommendation suggestions
Real-Time Metrics#
Key Performance Indicators#
The monitoring dashboard displays live metrics updated every 1-5 minutes:
CPU Usage
- Current CPU utilization percentage (0-100%)
- Usage trend graph
- Sustained high usage indicates bottleneck
- Normal range: 20-60% for typical workloads
Memory Usage
- RAM currently in use (GB and percentage)
- Available memory remaining
- Memory trend over time
- High usage >85% may cause slowdowns
Disk Usage
- Storage space used vs. total
- Percentage utilization
- Free space available
- Alert if approaching 80%
Network Activity
- Data in (ingress) - Mbps
- Data out (egress) - Mbps
- Network trend graph
- Spike detection for anomalies
Understanding Metrics#
CPU Metrics
- Usage shows processor load
- Peaks indicate heavy computation
- Sustained high indicates need for optimization
- Multi-core systems show per-core breakdown
Memory Metrics
- Usage shows RAM consumption
- Includes OS, applications, buffers
- Trend shows if memory leaking
- Available shows headroom for new processes
Disk Metrics
- Usage shows storage capacity consumption
- Includes OS files and application data
- Growing trend indicates data accumulation
- Alerting at 80-90% prevents out-of-disk errors
Network Metrics
- Ingress: Data coming into instance
- Egress: Data leaving instance
- Spikes may indicate:
- Large file transfers
- Data download/upload
- Network attack or compromise
- High traffic periods
Metric Graphs#
Viewing Historical Trends#
Time Range Selection
- 1 Hour: Recent behavior and current issues
- 24 Hours: Daily patterns and peak usage
- 7 Days: Weekly trends and recurring issues
- 30 Days: Long-term trends and capacity planning
- Custom Range: Specific date range analysis
Graph Features
- X-axis shows time
- Y-axis shows metric value
- Hover to see exact values
- Zoom in/out for detailed view
- Download graph as image
Analyzing Trends#
Identifying Patterns
- Regular spikes at specific times
- Steady growth over time
- Sudden changes or anomalies
- Correlation between metrics
Planning Decisions
- Capacity planning based on trends
- Identifying optimal performance windows
- Predicting when upgrades needed
- Detecting performance degradation
Performance Status#
Instance Health Status#
Status Indicators
| Status | Color | Meaning |
|---|---|---|
| Healthy | Green | All metrics normal, no issues |
| Warning | Yellow | One metric approaching threshold |
| Critical | Red | Metric exceeded critical threshold |
Health Components
- CPU usage level
- Memory utilization
- Disk space availability
- Network connectivity
- Service responsiveness
Status Summary#
Overall Health
- Quick visual indicator
- Summary of all components
- Recommendations if issues detected
- Actions to improve health
Performance Alerts#
Setting Up Alerts#
Available Alert Types
CPU Alerts
- Trigger when CPU exceeds threshold
- Default threshold: 80%
- Duration: Sustained for 5+ minutes
- Notification: Immediate
Memory Alerts
- Trigger when memory usage exceeds threshold
- Default threshold: 85%
- Duration: Sustained for 5+ minutes
- Notification: Immediate
Disk Alerts
- Trigger when disk usage exceeds threshold
- Default threshold: 80%
- Duration: Triggered immediately
- Notification: High priority
Network Alerts
- High traffic detected
- Packet loss detected
- Connection timeouts
- Unusual patterns
Creating Alert#
Step-by-Step
- Click "Create Alert" button in monitoring panel
- Select metric to monitor (CPU, Memory, Disk, Network)
- Set threshold value (percentage or GB)
- Choose duration (immediate or sustained)
- Select notification method:
- Slack
- SMS (premium)
- Webhook
- Save alert
Managing Alerts#
Viewing Alerts
- List of active alerts
- Alert status and severity
- Last triggered time
- Alert history
Modifying Alerts
- Click alert in list
- Select "Edit"
- Change threshold or settings
- Click "Save Changes"
Disabling Alerts
- Click alert
- Select "Disable"
- Alert won't trigger
- Can be re-enabled later
Deleting Alerts
- Click alert
- Select "Delete"
- Confirm deletion
- Alert is permanently removed
Activity Log#
Recent Activity#
Activity Types
- Instance created/deleted
- Configuration changed
- Instance started/stopped
- Snapshots created
- Network changes
- Security updates
- User logins
- Failed operations
Activity Details
- Action performed
- Timestamp
- User who performed action
- Status (success/failure)
- Details of change
Viewing Activities#
Activity List
- Scroll to Activity section in detail panel
- Shows 5-10 most recent activities
- Click "View All" to see complete history
- Filter activities by type
- Search by user or action
Exporting Activity Log#
Export Options
- Click "Export" button
- Choose format:
- CSV for spreadsheets
- JSON for integration
- Select date range
- File downloads
Use Cases
- Audit trail for compliance
- Troubleshooting recent changes
- Understanding infrastructure changes
- Team activity tracking
Performance Optimization#
When to Optimize#
Signs Your Instance Needs Optimization
High CPU Usage
- Sustained >80%
- Affecting application performance
- Slowing response times
High Memory Usage
- Sustained >85%
- Causing slowdowns
- Application crashes
Low Disk Space
80% full
- Application warnings
- Risk of failures
High Network Usage
- Unexpected peaks
- Data transfer bottleneck
- Cost implications
Optimization Strategies#
CPU Optimization
- Identify CPU-consuming processes
- Optimize application code
- Reduce running services
- Scale to more instances
- Upgrade to higher CPU instance type
Memory Optimization
- Check for memory leaks
- Optimize application memory use
- Reduce cache sizes
- Restart services periodically
- Increase instance memory allocation
Disk Optimization
- Clean up temporary files
- Implement log rotation
- Archive old data
- Remove unused packages
- Expand disk capacity
Network Optimization
- Optimize application payload
- Compress data transfer
- Use regional endpoints
- Implement caching
- Reduce unnecessary traffic
Predictive Metrics#
Forecasting#
Trend Prediction
- Graph shows projected usage
- Based on historical patterns
- Helps predict when limits reached
- Plan upgrades proactively
Capacity Planning
- Growth rate analysis
- Time until capacity reached
- Recommended upgrade timing
- Cost impact of upgrades
Benchmarking#
Comparing Performance#
Baseline Metrics
- Normal operation values
- Average usage patterns
- Peak usage times
- Minimum resources needed
Performance Comparison
- Current vs. historical
- Same time last week/month
- Before/after optimization
- Between instances
Export and Reporting#
Export Metrics#
Export Options
- Click "Export" button
- Choose format:
- CSV for Excel/Sheets
- JSON for integration
- PDF for reports
- Select date range
- File downloads immediately
Using Exported Data
- Import into analysis tools
- Create custom reports
- Share with team members
- Archive for compliance
- Integration with monitoring systems
Report Generation#
Creating Reports
- Select date range
- Choose metrics to include
- Add custom notes
- Generate PDF report
- Share or print
Report Contents
- Metric summaries
- Trend graphs
- Alert history
- Recommendations
- Capacity analysis
Alerting Best Practices#
Setting Appropriate Thresholds#
CPU Alerts
- Production: 70% for critical apps
- Development: 85% for general use
- Start with default, adjust based on experience
Memory Alerts
- Production: 75% to allow headroom
- Development: 85% is acceptable
- Monitor trend, not just threshold
Disk Alerts
- Critical: 80% (watch closely)
- Warning: 70% (take action)
- Emergency: 90% (immediate action)
Network Alerts
- Baseline: Establish normal usage first
- Spike: 2-3x normal usage
- Sustained: High for 5+ minutes
Alert Management#
Create Key Alerts
- Disk space critical
- Memory exhaustion
- CPU sustained high
- Service down
Configure Notifications
- Route to on-call team
- Escalate if not acknowledged
- Include remediation steps
Review and Adjust
- Monitor alert frequency
- Reduce false positives
- Adjust thresholds quarterly
- Update as workload changes
Troubleshooting with Metrics#
Diagnosis Using Metrics#
Instance Not Responding
- Check CPU - is it maxed out?
- Check Memory - running low?
- Check Network - any activity?
- Check Disk - space available?
- Review recent activity log
Slow Performance
- CPU usage high? Optimize code
- Memory low? Increase or restart
- Disk I/O high? Check what's writing
- Network latency? Check connection
Unexpected Costs
- Check network egress usage
- Review instance type vs. utilization
- Check for unused instances
- Monitor storage growth
Recommended Thresholds#
| Metric | Warning | Critical | Action |
|---|---|---|---|
| CPU | 70% | 85% | Optimize or upgrade |
| Memory | 75% | 90% | Increase RAM or restart |
| Disk | 80% | 95% | Clean up or expand |
| Network In | 500 Mbps | 900 Mbps | Monitor or optimize |
| Network Out | 500 Mbps | 900 Mbps | Monitor or optimize |
Next Steps#
- Managing VMs - Perform operations on instances
- Console and SSH Access - Connect to instances
- Export and Reporting - Generate reports