K8s Logs Quick Reference
A quick reference guide for common tasks and commands in Kubernetes Logs management.
Dashboard Navigation
| Task | Location |
|---|---|
| View logs | Clusters → Select Cluster → Logs Tab |
| Search logs | Logs Tab → Enter filters → Click "Search Logs" |
| Configure collection | Collection Config Tab → Create/Edit configs |
| Set up archival | Archive Tab → Create Policy |
| Monitor metrics | Metrics Tab → View statistics |
| Check system health | Settings Tab → System Health section |
| View S3 archives | S3 Storage Tab → Search archives |
| Manage clusters | Clusters Tab → View/edit configs |
Common Search Filters
| Filter | Purpose | Example |
|---|---|---|
| Namespace | Limit to specific K8s namespace | "production" |
| Log Level | Filter by severity | "error" |
| Date Range | Specify time period | Last 24 hours |
| Search Query | Full-text search in messages | "connection timeout" |
| Pod Name | Filter by specific pod | "api-server-1" |
LogQL Quick Reference
Basic Queries
# All logs from namespace
{namespace="production"}
# Error logs from pod
{pod="api-server"} |= "error"
# Multiple namespaces
{namespace=~"prod.*"}
# Exclude pattern
{namespace="prod"} != "debug"
Operators
| Operator | Meaning | Example |
|---|---|---|
|= | Contains (matches) | |= "error" |
!= | Does not contain | != "debug" |
|~ | Regex match | |~ "err.*\[0-9\]" |
!~ | Regex not match | !~ "warn" |
> | Greater than (numeric) | > 500 |
< | Less than (numeric) | < 100 |
>= <= | Greater/less or equal | >= 1000 |
Label Filters
# Exact match
{app="api"}
# Label exists
{namespace!=""}
# Regex match
{pod=~"api.*"}
# Multiple labels
{namespace="prod", app="web"}
Advanced Queries
# Combine filters with AND
{namespace="prod"} |= "error" |= "database"
# Combine with OR (use multiple patterns)
{namespace="prod"} |= "error" |~ "fatal|critical"
# Extract JSON fields
{app="api"} | json | status >= 500 | line_format "{{.message}}"
# Parse labels
{namespace="prod"} | regexp "user=(?P<user>\w+)"
# Count occurrences
{namespace="prod"} | pattern "<_> <method> <path> <_> <status>"
Collection Configuration Defaults
| Setting | Default | Recommended Range |
|---|---|---|
| Collection Interval | 300s (5 min) | 60-600 seconds |
| Max Logs Per Fetch | 1000 | 500-5000 |
| Retention Days | 30 | 7-365 |
| Log Levels | Info, Warn, Error | Depends on needs |
Archive Policy Presets
Development Environment
Retention: 7 days
Compression: Enabled
Destination: S3
Staging Environment
Retention: 30 days
Compression: Enabled
Destination: S3
Production Environment
Retention: 90 days
Compression: Enabled
Destination: S3 Standard-IA
Compliance (Long-term)
Retention: 365+ days
Compression: Enabled
Destination: S3 Glacier Deep Archive
Common Troubleshooting
No Logs Appearing
Checklist:
- Is collection enabled for the namespace?
- Does the cluster show "healthy" status?
- Are there logs in the selected date range?
- Is at least one pod in the namespace generating logs?
Fix:
- Go to Collection Config Tab
- Verify the configuration is Enabled
- Check Settings Tab > System Health
- Expand date range and try again
LogQL Query Errors
Problem: Syntax error at position X
Solutions:
# ✗ Wrong - Missing quotes
{namespace=production}
# ✓ Correct - Quotes required
{namespace="production"}
# ✗ Wrong - Invalid operator
{namespace="prod" and app="api"}
# ✓ Correct - Use label syntax
{namespace="prod", app="api"}
High Storage Usage
Solutions:
- Reduce retention days in Archive policy
- Increase collection interval (collect less frequently)
- Reduce max logs per collection
- Exclude high-volume namespaces
Performance Tips
Searching
Fast:
- Specific namespace
- Narrow date range (24 hours)
- Log level filter
Slow:
- All namespaces
- 6+ month range
- No filters
Collection
Efficient:
- Interval: 300-600 seconds
- Max logs: 1000-2000
- Specific namespaces only
Inefficient:
- Interval: 60 seconds
- Max logs: 5000+
- All namespaces including system
API Rate Limits
| Endpoint Type | Limit | Period |
|---|---|---|
| Standard calls | 100 | /minute |
| Search/Archive | 10 | /minute |
| Heavy operations | 5 | /minute |
If rate limited: Exponential backoff (1s, 2s, 4s, 8s...)
Storage Calculation
Database Storage
Avg log size: 512 bytes
Monthly logs: 1M logs/day × 30 days = 30M logs
Space needed: 30M × 512 bytes = ~15 GB/month
Cost: ~$4.50/month (at $0.30/GB)
S3 Storage (After Compression)
Compressed size: ~70% reduction (512 → 150 bytes)
Space needed: 30M × 150 bytes = ~4.5 GB/month
Cost: ~$0.10/month (at $0.023/GB S3 Standard)
Cost: ~$0.02/month (at $0.004/GB S3 Glacier)
Common API Calls
Get Logs (cURL)
curl -X POST https://api.nife.io/v1/k8s/logs/search \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"cluster_id": "prod-cluster",
"namespace": "production",
"level": "error",
"limit": 100
}'
Create Archive Policy (cURL)
curl -X POST https://api.nife.io/v1/k8s/logs/archive/policies \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Prod Archive",
"retention_days": 90,
"compression_type": "gzip",
"is_active": true
}'
List Collections (cURL)
curl -X GET "https://api.nife.io/v1/k8s/logs/config?cluster_id=prod-cluster" \
-H "Authorization: Bearer YOUR_TOKEN"
Field Mapping (UI → API)
| UI Field | API Parameter |
|---|---|
| Cluster ID | cluster_id |
| Namespace | namespace |
| Pod Name | pod_name |
| Container | container_name |
| Log Level | log_level |
| Start Time | start_time |
| End Time | end_time |
| Log Levels | log_levels |
| Collection Interval | collection_interval |
Retention vs Archive Timeline
Day 1-30 (Default)
├─ Real-time in database
├─ Full query capability
└─ Highest cost
Day 31-90 (Archive to S3)
├─ Accessible but slower
├─ Limited query capability
└─ Low cost
Day 91+ (Glacier)
├─ Cold storage
├─ Hours to retrieve
└─ Minimal cost
Collection Options Comparison
| Option | When to Use | Cost | Speed |
|---|---|---|---|
| Interval: 60s | High-priority production | High | Real-time |
| Interval: 300s | Standard production | Medium | Near real-time |
| Interval: 600s | Staging/Dev | Low | Slightly delayed |
| On-Demand | Debugging | Variable | On request |
Status Indicators
| Indicator | Meaning | Action |
|---|---|---|
| 🟢 Healthy | All systems operational | None needed |
| 🟡 Degraded | Some components slow | Monitor closely |
| 🔴 Unhealthy | System not operational | Check immediately |
Useful Metrics to Monitor
Production:
- Logs per hour > 50,000?
- Error logs trending up?
- Storage growth > 1GB/day?
Staging:
- Collection jobs completing?
- Archive jobs succeeding?
- Any failed operations?
Quick Wins for Cost Reduction
- Archive older logs - Saves ~90% (moves to S3)
- Enable compression - Saves ~70% of storage
- Increase collection interval - Reduces volume (300s → 600s)
- Exclude system namespaces - Reduces noise
- Set shorter retention - Reduces storage
Before Contacting Support
- Verified cluster connectivity:
kubectl cluster-info - Checked system health status
- Confirmed collection configs are enabled
- Verified date range has logs
- Confirmed LogQL syntax
- Checked rate limits not exceeded
- Exported logs for analysis
Documentation Links
Help & Support
- Email: [email protected]
- Docs: https://docs.nife.io
- Status: https://status.nife.io