Monitor VM Performance | Real-Time Metrics and Health Status Guide
Monitor your VM instances' performance, health status, and activity in real-time from the VMs Dashboard.
Overview
The Monitoring section within the Instance Detail Panel provides comprehensive performance tracking and health monitoring for your VM instances.
Accessing Monitoring
From VM Card
Quick Access to Monitoring
- Locate the VM instance on the dashboard
- Scroll to the bottom action buttons
- Click the Monitoring Button (Activity icon)
- Monitoring dashboard opens
- Real-time metrics display immediately
Monitoring Dashboard
Dashboard Components
- Real-time metric displays
- Historical trend graphs
- Performance alerts
- Activity log
- Health status indicators
- Recommendation suggestions
Real-Time Metrics
Key Performance Indicators
The monitoring dashboard displays live metrics updated every 1-5 minutes:
CPU Usage
- Current CPU utilization percentage (0-100%)
- Usage trend graph
- Sustained high usage indicates bottleneck
- Normal range: 20-60% for typical workloads
Memory Usage
- RAM currently in use (GB and percentage)
- Available memory remaining
- Memory trend over time
- High usage >85% may cause slowdowns
Disk Usage
- Storage space used vs. total
- Percentage utilization
- Free space available
- Alert if approaching 80%
Network Activity
- Data in (ingress) - Mbps
- Data out (egress) - Mbps
- Network trend graph
- Spike detection for anomalies
Understanding Metrics
CPU Metrics
- Usage shows processor load
- Peaks indicate heavy computation
- Sustained high indicates need for optimization
- Multi-core systems show per-core breakdown
Memory Metrics
- Usage shows RAM consumption
- Includes OS, applications, buffers
- Trend shows if memory leaking
- Available shows headroom for new processes
Disk Metrics
- Usage shows storage capacity consumption
- Includes OS files and application data
- Growing trend indicates data accumulation
- Alerting at 80-90% prevents out-of-disk errors
Network Metrics
- Ingress: Data coming into instance
- Egress: Data leaving instance
- Spikes may indicate:
- Large file transfers
- Data download/upload
- Network attack or compromise
- High traffic periods
Metric Graphs
Viewing Historical Trends
Time Range Selection
- 1 Hour: Recent behavior and current issues
- 24 Hours: Daily patterns and peak usage
- 7 Days: Weekly trends and recurring issues
- 30 Days: Long-term trends and capacity planning
- Custom Range: Specific date range analysis
Graph Features
- X-axis shows time
- Y-axis shows metric value
- Hover to see exact values
- Zoom in/out for detailed view
- Download graph as image
Analyzing Trends
Identifying Patterns
- Regular spikes at specific times
- Steady growth over time
- Sudden changes or anomalies
- Correlation between metrics
Planning Decisions
- Capacity planning based on trends
- Identifying optimal performance windows
- Predicting when upgrades needed
- Detecting performance degradation
Performance Status
Instance Health Status
Status Indicators
| Status | Color | Meaning |
|---|---|---|
| Healthy | Green | All metrics normal, no issues |
| Warning | Yellow | One metric approaching threshold |
| Critical | Red | Metric exceeded critical threshold |
Health Components
- CPU usage level
- Memory utilization
- Disk space availability
- Network connectivity
- Service responsiveness
Status Summary
Overall Health
- Quick visual indicator
- Summary of all components
- Recommendations if issues detected
- Actions to improve health
Performance Alerts
Setting Up Alerts
Available Alert Types
CPU Alerts
- Trigger when CPU exceeds threshold
- Default threshold: 80%
- Duration: Sustained for 5+ minutes
- Notification: Immediate
Memory Alerts
- Trigger when memory usage exceeds threshold
- Default threshold: 85%
- Duration: Sustained for 5+ minutes
- Notification: Immediate
Disk Alerts
- Trigger when disk usage exceeds threshold
- Default threshold: 80%
- Duration: Triggered immediately
- Notification: High priority
Network Alerts
- High traffic detected
- Packet loss detected
- Connection timeouts
- Unusual patterns
Creating Alert
Step-by-Step
- Click "Create Alert" button in monitoring panel
- Select metric to monitor (CPU, Memory, Disk, Network)
- Set threshold value (percentage or GB)
- Choose duration (immediate or sustained)
- Select notification method:
- Slack
- SMS (premium)
- Webhook
- Save alert
Managing Alerts
Viewing Alerts
- List of active alerts
- Alert status and severity
- Last triggered time
- Alert history
Modifying Alerts
- Click alert in list
- Select "Edit"
- Change threshold or settings
- Click "Save Changes"
Disabling Alerts
- Click alert
- Select "Disable"
- Alert won't trigger
- Can be re-enabled later
Deleting Alerts
- Click alert
- Select "Delete"
- Confirm deletion
- Alert is permanently removed
Activity Log
Recent Activity
Activity Types
- Instance created/deleted
- Configuration changed
- Instance started/stopped
- Snapshots created
- Network changes
- Security updates
- User logins
- Failed operations
Activity Details
- Action performed
- Timestamp
- User who performed action
- Status (success/failure)
- Details of change
Viewing Activities
Activity List
- Scroll to Activity section in detail panel
- Shows 5-10 most recent activities
- Click "View All" to see complete history
- Filter activities by type
- Search by user or action
Exporting Activity Log
Export Options
- Click "Export" button
- Choose format:
- CSV for spreadsheets
- JSON for integration
- Select date range
- File downloads
Use Cases
- Audit trail for compliance
- Troubleshooting recent changes
- Understanding infrastructure changes
- Team activity tracking
Performance Optimization
When to Optimize
Signs Your Instance Needs Optimization
-
High CPU Usage
- Sustained >80%
- Affecting application performance
- Slowing response times
-
High Memory Usage
- Sustained >85%
- Causing slowdowns
- Application crashes
-
Low Disk Space
-
80% full
- Application warnings
- Risk of failures
-
-
High Network Usage
- Unexpected peaks
- Data transfer bottleneck
- Cost implications
Optimization Strategies
CPU Optimization
- Identify CPU-consuming processes
- Optimize application code
- Reduce running services
- Scale to more instances
- Upgrade to higher CPU instance type
Memory Optimization
- Check for memory leaks
- Optimize application memory use
- Reduce cache sizes
- Restart services periodically
- Increase instance memory allocation
Disk Optimization
- Clean up temporary files
- Implement log rotation
- Archive old data
- Remove unused packages
- Expand disk capacity
Network Optimization
- Optimize application payload
- Compress data transfer
- Use regional endpoints
- Implement caching
- Reduce unnecessary traffic
Predictive Metrics
Forecasting
Trend Prediction
- Graph shows projected usage
- Based on historical patterns
- Helps predict when limits reached
- Plan upgrades proactively
Capacity Planning
- Growth rate analysis
- Time until capacity reached
- Recommended upgrade timing
- Cost impact of upgrades
Benchmarking
Comparing Performance
Baseline Metrics
- Normal operation values
- Average usage patterns
- Peak usage times
- Minimum resources needed
Performance Comparison
- Current vs. historical
- Same time last week/month
- Before/after optimization
- Between instances
Export and Reporting
Export Metrics
Export Options
- Click "Export" button
- Choose format:
- CSV for Excel/Sheets
- JSON for integration
- PDF for reports
- Select date range
- File downloads immediately
Using Exported Data
- Import into analysis tools
- Create custom reports
- Share with team members
- Archive for compliance
- Integration with monitoring systems
Report Generation
Creating Reports
- Select date range
- Choose metrics to include
- Add custom notes
- Generate PDF report
- Share or print
Report Contents
- Metric summaries
- Trend graphs
- Alert history
- Recommendations
- Capacity analysis
Alerting Best Practices
Setting Appropriate Thresholds
CPU Alerts
- Production: 70% for critical apps
- Development: 85% for general use
- Start with default, adjust based on experience
Memory Alerts
- Production: 75% to allow headroom
- Development: 85% is acceptable
- Monitor trend, not just threshold
Disk Alerts
- Critical: 80% (watch closely)
- Warning: 70% (take action)
- Emergency: 90% (immediate action)
Network Alerts
- Baseline: Establish normal usage first
- Spike: 2-3x normal usage
- Sustained: High for 5+ minutes
Alert Management
-
Create Key Alerts
- Disk space critical
- Memory exhaustion
- CPU sustained high
- Service down
-
Configure Notifications
- Route to on-call team
- Escalate if not acknowledged
- Include remediation steps
-
Review and Adjust
- Monitor alert frequency
- Reduce false positives
- Adjust thresholds quarterly
- Update as workload changes
Troubleshooting with Metrics
Diagnosis Using Metrics
Instance Not Responding
- Check CPU - is it maxed out?
- Check Memory - running low?
- Check Network - any activity?
- Check Disk - space available?
- Review recent activity log
Slow Performance
- CPU usage high? Optimize code
- Memory low? Increase or restart
- Disk I/O high? Check what's writing
- Network latency? Check connection
Unexpected Costs
- Check network egress usage
- Review instance type vs. utilization
- Check for unused instances
- Monitor storage growth
Recommended Thresholds
| Metric | Warning | Critical | Action |
|---|---|---|---|
| CPU | 70% | 85% | Optimize or upgrade |
| Memory | 75% | 90% | Increase RAM or restart |
| Disk | 80% | 95% | Clean up or expand |
| Network In | 500 Mbps | 900 Mbps | Monitor or optimize |
| Network Out | 500 Mbps | 900 Mbps | Monitor or optimize |
Next Steps
- Managing VMs - Perform operations on instances
- Console and SSH Access - Connect to instances
- Export and Reporting - Generate reports