Monitor VM Performance | Real-Time Metrics and Health Status Guide

Comprehensive monitoring is essential for maintaining optimal VM performance and identifying issues before they impact your applications.

Monitoring Overview#

Nife provides real-time and historical monitoring capabilities for all your VM instances across AWS, GCP, and Azure.

Key Monitoring Features#

Real-time Metrics: Live performance data updated continuously
Historical Data: Track metrics over time for trend analysis
Performance Alerts: Get notified of performance issues
Resource Tracking: Monitor CPU, memory, disk, and network usage
Health Status: Quick view of instance health
Activity Logs: Review recent instance actions and changes

Accessing Monitoring#

From Instance Card#

Locate the instance in the VM Management list
Click the Monitoring button in the instance actions
Monitoring dashboard opens in a panel

From Detail Panel#

Open the instance detail panel
Click the Monitoring tab
Comprehensive monitoring dashboard displays

Monitoring Dashboard#

The monitoring dashboard shows:

Overview Cards

Current CPU usage percentage
Memory utilization (GB and percentage)
Network throughput (inbound/outbound)
Disk usage (GB and percentage)

Performance Graphs

CPU usage over time (last 24 hours, 7 days, 30 days)
Memory usage trending
Network I/O (bytes in/out)
Disk I/O operations

Health Status

Instance status (Running, Stopped, etc.)
Uptime duration
Last state change
Network reachability

Key Metrics Explained#

CPU Metrics#

CPU Usage Percentage

How much of the CPU is being utilized
Normal: 20-60% for typical workloads
High: >80% sustained indicates capacity issues
Action: Consider scaling or optimizing application

CPU Cores

Number of vCPUs allocated to instance
Check if application can utilize all cores
Consider upgrading if CPU-bound

Memory Metrics#

Memory Usage

How much RAM is currently in use
Shown in GB and as percentage
Normal: 40-70% of available memory
High: >85% may cause slowdowns or crashes

Memory Available

Free memory available for applications
Should have buffer (10-20% minimum)
Low memory can cause swapping and poor performance

Network Metrics#

Network In

Incoming data to the instance
Measured in Mbps (megabits per second)
Normal: Depends on application type
Spikes: May indicate traffic surge or attack

Network Out

Outgoing data from the instance
Measured in Mbps
Monitor for unexpected data transfers
High: May indicate data exfiltration or misconfiguration

Packet Loss

Percentage of network packets lost
Should be <0.1% in healthy network
High: Indicates network issues
Action: Check network configuration and cloud provider status

Disk Metrics#

Disk Usage

How much storage is in use
Shown in GB and percentage
Target: Keep <80% for optimal performance
90%: Risk of out-of-disk errors

Disk I/O

Read/write operations per second (IOPS)
High: May indicate disk bottleneck
Sustained high: Consider upgrading disk

Disk Latency

Time taken for disk operations
Normal: <5ms for SSD
High: >20ms indicates performance issues
Action: Check for background processes

Performance Analysis#

Setting Time Ranges#

View metrics for different time periods:

1 Hour: Recent performance and current issues
24 Hours: Daily patterns and peak usage times
7 Days: Weekly trends and recurring issues
30 Days: Long-term trends and capacity planning
Custom Range: Specific date range analysis

Identifying Performance Issues#

High CPU Usage

Check which processes are consuming CPU
Review application logs for errors
Check for runaway processes
Monitor network I/O for correlation
Consider application optimization or scaling

High Memory Usage

Review running processes and services
Check for memory leaks in applications
Monitor for unnecessary background tasks
Consider increasing memory allocation
Check for caching issues

High Network Usage

Verify application is performing as expected
Check for data downloads/uploads
Monitor for malware or unauthorized access
Review firewall and security rules
Check bandwidth costs and limits

Low Disk Space

Identify large files and directories
Clean up logs and temporary files
Review application data growth
Consider disk expansion
Implement log rotation policies

Health Monitoring#

Instance Health Status#

Running

Instance is active and operational
Applications can be deployed and accessed
Monitoring data is current
Can perform all operations

Stopped

Instance is powered down
No monitoring data available (shows last known state)
Cannot run applications
Resources are released

Paused

Instance is temporarily paused
Minimal resource usage
Monitoring paused
Quick resume available

Degraded

Instance is running but experiencing issues
Some services may be unavailable
Investigate alerts and logs
May require restart or troubleshooting

Health Checks#

Automatic health checks monitor:

Instance reachability via network
System disk status
Memory health
CPU functionality
Network connectivity

Status Indicators#

Green: All systems healthy Yellow: Warning conditions detected Red: Critical issue requires attention

Setting Up Alerts#

Alert Types#

CPU Alerts

Trigger when CPU exceeds threshold
Typical threshold: 80%
Duration: Sustained for 5+ minutes

Memory Alerts

Trigger when memory usage exceeds threshold
Typical threshold: 85%
Duration: Sustained for 5+ minutes

Disk Alerts

Trigger when disk usage exceeds threshold
Typical threshold: 80%
Action: Requires immediate attention

Network Alerts

High traffic alerts
Packet loss detection
Connection timeouts

Creating Alerts#

Navigate to monitoring dashboard
Click Set Alert button
Choose metric to monitor
Set threshold value
Set duration (5 minutes, 15 minutes, 1 hour)
Choose notification method (Email, Slack, etc.)
Save alert

Alert Notifications#

Alerts can be sent via:

Email notifications
Slack messages
Webhook calls
SMS (premium)
PagerDuty integration

Exporting Monitoring Data#

Export Formats#

CSV Export

Timestamp
CPU usage
Memory usage
Network In/Out
Disk usage
Custom metrics

JSON Export

Full metric details
Metadata information
Custom fields
API-ready format

Exporting Data#

Open monitoring dashboard
Select time range
Click Export button
Choose format (CSV or JSON)
File downloads to computer

Performance Optimization Tips#

CPU Optimization#

Identify CPU-bound Processes
- Use monitoring to identify high CPU processes
- Optimize application code
- Consider horizontal scaling
Reduce CPU Usage
- Disable unused services
- Optimize database queries
- Use caching strategies
- Implement rate limiting
Upgrade if Needed
- Consider instance type with more vCPUs
- Scale across multiple instances
- Use load balancing

Memory Optimization#

Monitor Memory Leaks
- Look for gradually increasing memory
- Restart services periodically
- Review application logs
Optimize Memory Usage
- Increase garbage collection frequency
- Reduce cache sizes
- Optimize data structures
- Limit concurrent connections
Expand Memory
- Upgrade instance type
- Consider read replicas for database loads
- Implement distributed caching

Disk Optimization#

Manage Disk Space
- Implement log rotation
- Archive old data
- Remove temporary files
- Compress backups
Improve Disk I/O
- Use SSD storage
- Implement caching
- Optimize database indexing
- Separate read/write workloads

Network Optimization#

Reduce Latency
- Use Content Delivery Network (CDN)
- Deploy closer to users
- Optimize payload sizes
- Reduce hops in architecture
Optimize Bandwidth
- Compress data transfer
- Use regional endpoints
- Implement request batching
- Monitor for data leaks

Troubleshooting with Metrics#

Common Issues and Solutions#

Instance Shows as Running but Not Accessible

Check network reachability metric
Verify security group rules
Check application status
Review error logs
Attempt restart

Sudden Performance Drop

Check if metrics show resource exhaustion
Look for spikes in CPU or memory
Review recent deployments or changes
Check for background processes
Monitor network for DDoS

Intermittent Slowness

Look for periodic spikes in metrics
Correlate with scheduled tasks
Check for backup operations
Review disk I/O patterns
Monitor network latency

High Costs Despite Low Usage

Check for reserved instance mismatches
Verify instance type allocation
Monitor network transfer costs
Check for data storage growth
Review pricing for current tier

Best Practices#

Regular Review: Check metrics weekly
Set Baselines: Know your normal usage patterns
Proactive Alerts: Set alerts before critical thresholds
Archive Data: Export historical data for long-term analysis
Document Issues: Keep records of problems and solutions
Plan Capacity: Use trends to predict future needs
Correlate Metrics: Look at multiple metrics together
Test Alerts: Verify alert notifications work

Recommended Thresholds#

Metric	Warning	Critical
CPU	70%	85%
Memory	75%	90%
Disk	80%	95%
Network Out	1000 Mbps	1500 Mbps
Packet Loss	0.5%	2%

Next Steps#

Managing VM Instances - Manage instance operations
Troubleshooting - Common issues and solutions
Cloud Provider Setup - Configure providers