Monitor VM Performance | Real-Time Metrics and Health Status Guide

Monitor your VM instances' performance, health status, and activity in real-time from the VMs Dashboard.

Overview#

The Monitoring section within the Instance Detail Panel provides comprehensive performance tracking and health monitoring for your VM instances.

Accessing Monitoring#

From VM Card#

Quick Access to Monitoring

Locate the VM instance on the dashboard
Scroll to the bottom action buttons
Click the Monitoring Button (Activity icon)
Monitoring dashboard opens
Real-time metrics display immediately

Monitoring Dashboard#

Dashboard Components

Real-time metric displays
Historical trend graphs
Performance alerts
Activity log
Health status indicators
Recommendation suggestions

Real-Time Metrics#

Key Performance Indicators#

The monitoring dashboard displays live metrics updated every 1-5 minutes:

CPU Usage

Current CPU utilization percentage (0-100%)
Usage trend graph
Sustained high usage indicates bottleneck
Normal range: 20-60% for typical workloads

Memory Usage

RAM currently in use (GB and percentage)
Available memory remaining
Memory trend over time
High usage >85% may cause slowdowns

Disk Usage

Storage space used vs. total
Percentage utilization
Free space available
Alert if approaching 80%

Network Activity

Data in (ingress) - Mbps
Data out (egress) - Mbps
Network trend graph
Spike detection for anomalies

Understanding Metrics#

CPU Metrics

Usage shows processor load
Peaks indicate heavy computation
Sustained high indicates need for optimization
Multi-core systems show per-core breakdown

Memory Metrics

Usage shows RAM consumption
Includes OS, applications, buffers
Trend shows if memory leaking
Available shows headroom for new processes

Disk Metrics

Usage shows storage capacity consumption
Includes OS files and application data
Growing trend indicates data accumulation
Alerting at 80-90% prevents out-of-disk errors

Network Metrics

Ingress: Data coming into instance
Egress: Data leaving instance
Spikes may indicate:
- Large file transfers
- Data download/upload
- Network attack or compromise
- High traffic periods

Metric Graphs#

Viewing Historical Trends#

Time Range Selection

1 Hour: Recent behavior and current issues
24 Hours: Daily patterns and peak usage
7 Days: Weekly trends and recurring issues
30 Days: Long-term trends and capacity planning
Custom Range: Specific date range analysis

Graph Features

X-axis shows time
Y-axis shows metric value
Hover to see exact values
Zoom in/out for detailed view
Download graph as image

Analyzing Trends#

Identifying Patterns

Regular spikes at specific times
Steady growth over time
Sudden changes or anomalies
Correlation between metrics

Planning Decisions

Capacity planning based on trends
Identifying optimal performance windows
Predicting when upgrades needed
Detecting performance degradation

Performance Status#

Instance Health Status#

Status Indicators

Status	Color	Meaning
Healthy	Green	All metrics normal, no issues
Warning	Yellow	One metric approaching threshold
Critical	Red	Metric exceeded critical threshold

Health Components

CPU usage level
Memory utilization
Disk space availability
Network connectivity
Service responsiveness

Status Summary#

Overall Health

Quick visual indicator
Summary of all components
Recommendations if issues detected
Actions to improve health

Performance Alerts#

Setting Up Alerts#

Available Alert Types

CPU Alerts

Trigger when CPU exceeds threshold
Default threshold: 80%
Duration: Sustained for 5+ minutes
Notification: Immediate

Memory Alerts

Trigger when memory usage exceeds threshold
Default threshold: 85%
Duration: Sustained for 5+ minutes
Notification: Immediate

Disk Alerts

Trigger when disk usage exceeds threshold
Default threshold: 80%
Duration: Triggered immediately
Notification: High priority

Network Alerts

High traffic detected
Packet loss detected
Connection timeouts
Unusual patterns

Creating Alert#

Step-by-Step

Click "Create Alert" button in monitoring panel
Select metric to monitor (CPU, Memory, Disk, Network)
Set threshold value (percentage or GB)
Choose duration (immediate or sustained)
Select notification method:
- Email
- Slack
- SMS (premium)
- Webhook
Save alert

Managing Alerts#

Viewing Alerts

List of active alerts
Alert status and severity
Last triggered time
Alert history

Modifying Alerts

Click alert in list
Select "Edit"
Change threshold or settings
Click "Save Changes"

Disabling Alerts

Click alert
Select "Disable"
Alert won't trigger
Can be re-enabled later

Deleting Alerts

Click alert
Select "Delete"
Confirm deletion
Alert is permanently removed

Activity Log#

Recent Activity#

Activity Types

Instance created/deleted
Configuration changed
Instance started/stopped
Snapshots created
Network changes
Security updates
User logins
Failed operations

Activity Details

Action performed
Timestamp
User who performed action
Status (success/failure)
Details of change

Viewing Activities#

Activity List

Scroll to Activity section in detail panel
Shows 5-10 most recent activities
Click "View All" to see complete history
Filter activities by type
Search by user or action

Exporting Activity Log#

Export Options

Click "Export" button
Choose format:
- CSV for spreadsheets
- JSON for integration
Select date range
File downloads

Use Cases

Audit trail for compliance
Troubleshooting recent changes
Understanding infrastructure changes
Team activity tracking

Performance Optimization#

When to Optimize#

Signs Your Instance Needs Optimization

High CPU Usage
- Sustained >80%
- Affecting application performance
- Slowing response times
High Memory Usage
- Sustained >85%
- Causing slowdowns
- Application crashes
Low Disk Space
- 80% full
- Application warnings
- Risk of failures
High Network Usage
- Unexpected peaks
- Data transfer bottleneck
- Cost implications

Optimization Strategies#

CPU Optimization

Identify CPU-consuming processes
Optimize application code
Reduce running services
Scale to more instances
Upgrade to higher CPU instance type

Memory Optimization

Check for memory leaks
Optimize application memory use
Reduce cache sizes
Restart services periodically
Increase instance memory allocation

Disk Optimization

Clean up temporary files
Implement log rotation
Archive old data
Remove unused packages
Expand disk capacity

Network Optimization

Optimize application payload
Compress data transfer
Use regional endpoints
Implement caching
Reduce unnecessary traffic

Predictive Metrics#

Forecasting#

Trend Prediction

Graph shows projected usage
Based on historical patterns
Helps predict when limits reached
Plan upgrades proactively

Capacity Planning

Growth rate analysis
Time until capacity reached
Recommended upgrade timing
Cost impact of upgrades

Benchmarking#

Comparing Performance#

Baseline Metrics

Normal operation values
Average usage patterns
Peak usage times
Minimum resources needed

Performance Comparison

Current vs. historical
Same time last week/month
Before/after optimization
Between instances

Export and Reporting#

Export Metrics#

Export Options

Click "Export" button
Choose format:
- CSV for Excel/Sheets
- JSON for integration
- PDF for reports
Select date range
File downloads immediately

Using Exported Data

Import into analysis tools
Create custom reports
Share with team members
Archive for compliance
Integration with monitoring systems

Report Generation#

Creating Reports

Select date range
Choose metrics to include
Add custom notes
Generate PDF report
Share or print

Report Contents

Metric summaries
Trend graphs
Alert history
Recommendations
Capacity analysis

Alerting Best Practices#

Setting Appropriate Thresholds#

CPU Alerts

Production: 70% for critical apps
Development: 85% for general use
Start with default, adjust based on experience

Memory Alerts

Production: 75% to allow headroom
Development: 85% is acceptable
Monitor trend, not just threshold

Disk Alerts

Critical: 80% (watch closely)
Warning: 70% (take action)
Emergency: 90% (immediate action)

Network Alerts

Baseline: Establish normal usage first
Spike: 2-3x normal usage
Sustained: High for 5+ minutes

Alert Management#

Create Key Alerts
- Disk space critical
- Memory exhaustion
- CPU sustained high
- Service down
Configure Notifications
- Route to on-call team
- Escalate if not acknowledged
- Include remediation steps
Review and Adjust
- Monitor alert frequency
- Reduce false positives
- Adjust thresholds quarterly
- Update as workload changes

Troubleshooting with Metrics#

Diagnosis Using Metrics#

Instance Not Responding

Check CPU - is it maxed out?
Check Memory - running low?
Check Network - any activity?
Check Disk - space available?
Review recent activity log

Slow Performance

CPU usage high? Optimize code
Memory low? Increase or restart
Disk I/O high? Check what's writing
Network latency? Check connection

Unexpected Costs

Check network egress usage
Review instance type vs. utilization
Check for unused instances
Monitor storage growth

Recommended Thresholds#

Metric	Warning	Critical	Action
CPU	70%	85%	Optimize or upgrade
Memory	75%	90%	Increase RAM or restart
Disk	80%	95%	Clean up or expand
Network In	500 Mbps	900 Mbps	Monitor or optimize
Network Out	500 Mbps	900 Mbps	Monitor or optimize

Next Steps#

Managing VMs - Perform operations on instances
Console and SSH Access - Connect to instances
Export and Reporting - Generate reports