Monitor VM Performance | Real-Time Metrics and Health Status Guide

Monitor your VM instances' performance, health status, and activity in real-time from the VMs Dashboard.

Overview#

The Monitoring section within the Instance Detail Panel provides comprehensive performance tracking and health monitoring for your VM instances.

Accessing Monitoring#

From VM Card#

Quick Access to Monitoring

  1. Locate the VM instance on the dashboard
  2. Scroll to the bottom action buttons
  3. Click the Monitoring Button (Activity icon)
  4. Monitoring dashboard opens
  5. Real-time metrics display immediately

Monitoring Dashboard#

Dashboard Components

  • Real-time metric displays
  • Historical trend graphs
  • Performance alerts
  • Activity log
  • Health status indicators
  • Recommendation suggestions

Real-Time Metrics#

Key Performance Indicators#

The monitoring dashboard displays live metrics updated every 1-5 minutes:

CPU Usage

  • Current CPU utilization percentage (0-100%)
  • Usage trend graph
  • Sustained high usage indicates bottleneck
  • Normal range: 20-60% for typical workloads

Memory Usage

  • RAM currently in use (GB and percentage)
  • Available memory remaining
  • Memory trend over time
  • High usage >85% may cause slowdowns

Disk Usage

  • Storage space used vs. total
  • Percentage utilization
  • Free space available
  • Alert if approaching 80%

Network Activity

  • Data in (ingress) - Mbps
  • Data out (egress) - Mbps
  • Network trend graph
  • Spike detection for anomalies

Understanding Metrics#

CPU Metrics

  • Usage shows processor load
  • Peaks indicate heavy computation
  • Sustained high indicates need for optimization
  • Multi-core systems show per-core breakdown

Memory Metrics

  • Usage shows RAM consumption
  • Includes OS, applications, buffers
  • Trend shows if memory leaking
  • Available shows headroom for new processes

Disk Metrics

  • Usage shows storage capacity consumption
  • Includes OS files and application data
  • Growing trend indicates data accumulation
  • Alerting at 80-90% prevents out-of-disk errors

Network Metrics

  • Ingress: Data coming into instance
  • Egress: Data leaving instance
  • Spikes may indicate:
    • Large file transfers
    • Data download/upload
    • Network attack or compromise
    • High traffic periods

Metric Graphs#

Viewing Historical Trends#

Time Range Selection

  • 1 Hour: Recent behavior and current issues
  • 24 Hours: Daily patterns and peak usage
  • 7 Days: Weekly trends and recurring issues
  • 30 Days: Long-term trends and capacity planning
  • Custom Range: Specific date range analysis

Graph Features

  • X-axis shows time
  • Y-axis shows metric value
  • Hover to see exact values
  • Zoom in/out for detailed view
  • Download graph as image

Analyzing Trends#

Identifying Patterns

  • Regular spikes at specific times
  • Steady growth over time
  • Sudden changes or anomalies
  • Correlation between metrics

Planning Decisions

  • Capacity planning based on trends
  • Identifying optimal performance windows
  • Predicting when upgrades needed
  • Detecting performance degradation

Performance Status#

Instance Health Status#

Status Indicators

StatusColorMeaning
HealthyGreenAll metrics normal, no issues
WarningYellowOne metric approaching threshold
CriticalRedMetric exceeded critical threshold

Health Components

  • CPU usage level
  • Memory utilization
  • Disk space availability
  • Network connectivity
  • Service responsiveness

Status Summary#

Overall Health

  • Quick visual indicator
  • Summary of all components
  • Recommendations if issues detected
  • Actions to improve health

Performance Alerts#

Setting Up Alerts#

Available Alert Types

CPU Alerts

  • Trigger when CPU exceeds threshold
  • Default threshold: 80%
  • Duration: Sustained for 5+ minutes
  • Notification: Immediate

Memory Alerts

  • Trigger when memory usage exceeds threshold
  • Default threshold: 85%
  • Duration: Sustained for 5+ minutes
  • Notification: Immediate

Disk Alerts

  • Trigger when disk usage exceeds threshold
  • Default threshold: 80%
  • Duration: Triggered immediately
  • Notification: High priority

Network Alerts

  • High traffic detected
  • Packet loss detected
  • Connection timeouts
  • Unusual patterns

Creating Alert#

Step-by-Step

  1. Click "Create Alert" button in monitoring panel
  2. Select metric to monitor (CPU, Memory, Disk, Network)
  3. Set threshold value (percentage or GB)
  4. Choose duration (immediate or sustained)
  5. Select notification method:
    • Email
    • Slack
    • SMS (premium)
    • Webhook
  6. Save alert

Managing Alerts#

Viewing Alerts

  • List of active alerts
  • Alert status and severity
  • Last triggered time
  • Alert history

Modifying Alerts

  1. Click alert in list
  2. Select "Edit"
  3. Change threshold or settings
  4. Click "Save Changes"

Disabling Alerts

  1. Click alert
  2. Select "Disable"
  3. Alert won't trigger
  4. Can be re-enabled later

Deleting Alerts

  1. Click alert
  2. Select "Delete"
  3. Confirm deletion
  4. Alert is permanently removed

Activity Log#

Recent Activity#

Activity Types

  • Instance created/deleted
  • Configuration changed
  • Instance started/stopped
  • Snapshots created
  • Network changes
  • Security updates
  • User logins
  • Failed operations

Activity Details

  • Action performed
  • Timestamp
  • User who performed action
  • Status (success/failure)
  • Details of change

Viewing Activities#

Activity List

  1. Scroll to Activity section in detail panel
  2. Shows 5-10 most recent activities
  3. Click "View All" to see complete history
  4. Filter activities by type
  5. Search by user or action

Exporting Activity Log#

Export Options

  1. Click "Export" button
  2. Choose format:
    • CSV for spreadsheets
    • JSON for integration
  3. Select date range
  4. File downloads

Use Cases

  • Audit trail for compliance
  • Troubleshooting recent changes
  • Understanding infrastructure changes
  • Team activity tracking

Performance Optimization#

When to Optimize#

Signs Your Instance Needs Optimization

  1. High CPU Usage

    • Sustained >80%
    • Affecting application performance
    • Slowing response times
  2. High Memory Usage

    • Sustained >85%
    • Causing slowdowns
    • Application crashes
  3. Low Disk Space

    • 80% full

    • Application warnings
    • Risk of failures
  4. High Network Usage

    • Unexpected peaks
    • Data transfer bottleneck
    • Cost implications

Optimization Strategies#

CPU Optimization

  1. Identify CPU-consuming processes
  2. Optimize application code
  3. Reduce running services
  4. Scale to more instances
  5. Upgrade to higher CPU instance type

Memory Optimization

  1. Check for memory leaks
  2. Optimize application memory use
  3. Reduce cache sizes
  4. Restart services periodically
  5. Increase instance memory allocation

Disk Optimization

  1. Clean up temporary files
  2. Implement log rotation
  3. Archive old data
  4. Remove unused packages
  5. Expand disk capacity

Network Optimization

  1. Optimize application payload
  2. Compress data transfer
  3. Use regional endpoints
  4. Implement caching
  5. Reduce unnecessary traffic

Predictive Metrics#

Forecasting#

Trend Prediction

  • Graph shows projected usage
  • Based on historical patterns
  • Helps predict when limits reached
  • Plan upgrades proactively

Capacity Planning

  • Growth rate analysis
  • Time until capacity reached
  • Recommended upgrade timing
  • Cost impact of upgrades

Benchmarking#

Comparing Performance#

Baseline Metrics

  • Normal operation values
  • Average usage patterns
  • Peak usage times
  • Minimum resources needed

Performance Comparison

  • Current vs. historical
  • Same time last week/month
  • Before/after optimization
  • Between instances

Export and Reporting#

Export Metrics#

Export Options

  1. Click "Export" button
  2. Choose format:
    • CSV for Excel/Sheets
    • JSON for integration
    • PDF for reports
  3. Select date range
  4. File downloads immediately

Using Exported Data

  • Import into analysis tools
  • Create custom reports
  • Share with team members
  • Archive for compliance
  • Integration with monitoring systems

Report Generation#

Creating Reports

  1. Select date range
  2. Choose metrics to include
  3. Add custom notes
  4. Generate PDF report
  5. Share or print

Report Contents

  • Metric summaries
  • Trend graphs
  • Alert history
  • Recommendations
  • Capacity analysis

Alerting Best Practices#

Setting Appropriate Thresholds#

CPU Alerts

  • Production: 70% for critical apps
  • Development: 85% for general use
  • Start with default, adjust based on experience

Memory Alerts

  • Production: 75% to allow headroom
  • Development: 85% is acceptable
  • Monitor trend, not just threshold

Disk Alerts

  • Critical: 80% (watch closely)
  • Warning: 70% (take action)
  • Emergency: 90% (immediate action)

Network Alerts

  • Baseline: Establish normal usage first
  • Spike: 2-3x normal usage
  • Sustained: High for 5+ minutes

Alert Management#

  1. Create Key Alerts

    • Disk space critical
    • Memory exhaustion
    • CPU sustained high
    • Service down
  2. Configure Notifications

    • Route to on-call team
    • Escalate if not acknowledged
    • Include remediation steps
  3. Review and Adjust

    • Monitor alert frequency
    • Reduce false positives
    • Adjust thresholds quarterly
    • Update as workload changes

Troubleshooting with Metrics#

Diagnosis Using Metrics#

Instance Not Responding

  • Check CPU - is it maxed out?
  • Check Memory - running low?
  • Check Network - any activity?
  • Check Disk - space available?
  • Review recent activity log

Slow Performance

  • CPU usage high? Optimize code
  • Memory low? Increase or restart
  • Disk I/O high? Check what's writing
  • Network latency? Check connection

Unexpected Costs

  • Check network egress usage
  • Review instance type vs. utilization
  • Check for unused instances
  • Monitor storage growth

Recommended Thresholds#

MetricWarningCriticalAction
CPU70%85%Optimize or upgrade
Memory75%90%Increase RAM or restart
Disk80%95%Clean up or expand
Network In500 Mbps900 MbpsMonitor or optimize
Network Out500 Mbps900 MbpsMonitor or optimize

Next Steps#