K8s Logs Quick Reference & Cheatsheet

A quick reference guide for common tasks and commands in Kubernetes Logs management.

Dashboard Navigation#

Task	Location
View logs	Clusters → Select Cluster → Logs Tab
Search logs	Logs Tab → Enter filters → Click "Search Logs"
Configure collection	Collection Config Tab → Create/Edit configs
Set up archival	Archive Tab → Create Policy
Monitor metrics	Metrics Tab → View statistics
Check system health	Settings Tab → System Health section
View S3 archives	S3 Storage Tab → Search archives
Manage clusters	Clusters Tab → View/edit configs

Common Search Filters#

Filter	Purpose	Example
Namespace	Limit to specific K8s namespace	"production"
Log Level	Filter by severity	"error"
Date Range	Specify time period	Last 24 hours
Search Query	Full-text search in messages	"connection timeout"
Pod Name	Filter by specific pod	"api-server-1"

LogQL Quick Reference#

Basic Queries#

# All logs from namespace
{namespace="production"}

# Error logs from pod
{pod="api-server"} |= "error"

# Multiple namespaces
{namespace=~"prod.*"}

# Exclude pattern
{namespace="prod"} != "debug"

Operators#

Operator	Meaning	Example
`\\|=`	Contains (matches)	`\\|= "error"`
`!=`	Does not contain	`!= "debug"`
`\\|~`	Regex match	`\\|~ "err.*\[0-9\]"`
`!~`	Regex not match	`!~ "warn"`
`>`	Greater than (numeric)	`> 500`
`<`	Less than (numeric)	`< 100`
`>=` `<=`	Greater/less or equal	`>= 1000`

Label Filters#

# Exact match
{app="api"}

# Label exists
{namespace!=""}

# Regex match
{pod=~"api.*"}

# Multiple labels
{namespace="prod", app="web"}

Advanced Queries#

# Combine filters with AND
{namespace="prod"} |= "error" |= "database"

# Combine with OR (use multiple patterns)
{namespace="prod"} |= "error" |~ "fatal|critical"

# Extract JSON fields
{app="api"} | json | status >= 500 | line_format "{{.message}}"

# Parse labels
{namespace="prod"} | regexp "user=(?P<user>\w+)"

# Count occurrences
{namespace="prod"} | pattern "<_> <method> <path> <_> <status>"

Collection Configuration Defaults#

Setting	Default	Recommended Range
Collection Interval	300s (5 min)	60-600 seconds
Max Logs Per Fetch	1000	500-5000
Retention Days	30	7-365
Log Levels	Info, Warn, Error	Depends on needs

Archive Policy Presets#

Development Environment#

Retention: 7 days
Compression: Enabled
Destination: S3

Staging Environment#

Retention: 30 days
Compression: Enabled
Destination: S3

Production Environment#

Retention: 90 days
Compression: Enabled
Destination: S3 Standard-IA

Compliance (Long-term)#

Retention: 365+ days
Compression: Enabled
Destination: S3 Glacier Deep Archive

Common Troubleshooting#

No Logs Appearing#

Checklist:

Is collection enabled for the namespace?
Does the cluster show "healthy" status?
Are there logs in the selected date range?
Is at least one pod in the namespace generating logs?

Fix:

Go to Collection Config Tab
Verify the configuration is Enabled
Check Settings Tab > System Health
Expand date range and try again

LogQL Query Errors#

Problem: Syntax error at position X

Solutions:

# ✗ Wrong - Missing quotes
{namespace=production}

# ✓ Correct - Quotes required
{namespace="production"}

# ✗ Wrong - Invalid operator
{namespace="prod" and app="api"}

# ✓ Correct - Use label syntax
{namespace="prod", app="api"}

High Storage Usage#

Solutions:

Reduce retention days in Archive policy
Increase collection interval (collect less frequently)
Reduce max logs per collection
Exclude high-volume namespaces

Performance Tips#

Searching#

Fast:
- Specific namespace
- Narrow date range (24 hours)
- Log level filter

Slow:
- All namespaces
- 6+ month range
- No filters

Collection#

Efficient:
- Interval: 300-600 seconds
- Max logs: 1000-2000
- Specific namespaces only

Inefficient:
- Interval: 60 seconds
- Max logs: 5000+
- All namespaces including system

API Rate Limits#

Endpoint Type	Limit	Period
Standard calls	100	/minute
Search/Archive	10	/minute
Heavy operations	5	/minute

If rate limited: Exponential backoff (1s, 2s, 4s, 8s...)

Storage Calculation#

Database Storage#

Avg log size: 512 bytes
Monthly logs: 1M logs/day × 30 days = 30M logs
Space needed: 30M × 512 bytes = ~15 GB/month
Cost: ~$4.50/month (at $0.30/GB)

S3 Storage (After Compression)#

Compressed size: ~70% reduction (512 → 150 bytes)
Space needed: 30M × 150 bytes = ~4.5 GB/month
Cost: ~$0.10/month (at $0.023/GB S3 Standard)
Cost: ~$0.02/month (at $0.004/GB S3 Glacier)

Common API Calls#

Get Logs (cURL)#

curl -X POST https://api.nife.io/v1/k8s/logs/search \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_id": "prod-cluster",
    "namespace": "production",
    "level": "error",
    "limit": 100
  }'

Create Archive Policy (cURL)#

curl -X POST https://api.nife.io/v1/k8s/logs/archive/policies \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Prod Archive",
    "retention_days": 90,
    "compression_type": "gzip",
    "is_active": true
  }'

List Collections (cURL)#

curl -X GET "https://api.nife.io/v1/k8s/logs/config?cluster_id=prod-cluster" \
  -H "Authorization: Bearer YOUR_TOKEN"

Field Mapping (UI → API)#

UI Field	API Parameter
Cluster ID	`cluster_id`
Namespace	`namespace`
Pod Name	`pod_name`
Container	`container_name`
Log Level	`log_level`
Start Time	`start_time`
End Time	`end_time`
Log Levels	`log_levels`
Collection Interval	`collection_interval`

Retention vs Archive Timeline#

Day 1-30 (Default)
├─ Real-time in database
├─ Full query capability
└─ Highest cost

Day 31-90 (Archive to S3)
├─ Accessible but slower
├─ Limited query capability
└─ Low cost

Day 91+ (Glacier)
├─ Cold storage
├─ Hours to retrieve
└─ Minimal cost

Collection Options Comparison#

Option	When to Use	Cost	Speed
Interval: 60s	High-priority production	High	Real-time
Interval: 300s	Standard production	Medium	Near real-time
Interval: 600s	Staging/Dev	Low	Slightly delayed
On-Demand	Debugging	Variable	On request

Status Indicators#

Indicator	Meaning	Action
🟢 Healthy	All systems operational	None needed
🟡 Degraded	Some components slow	Monitor closely
🔴 Unhealthy	System not operational	Check immediately

Useful Metrics to Monitor#

Production:
- Logs per hour > 50,000?
- Error logs trending up?
- Storage growth > 1GB/day?

Staging:
- Collection jobs completing?
- Archive jobs succeeding?
- Any failed operations?

Quick Wins for Cost Reduction#

Archive older logs - Saves ~90% (moves to S3)
Enable compression - Saves ~70% of storage
Increase collection interval - Reduces volume (300s → 600s)
Exclude system namespaces - Reduces noise
Set shorter retention - Reduces storage

Before Contacting Support#

Verified cluster connectivity: kubectl cluster-info
Checked system health status
Confirmed collection configs are enabled
Verified date range has logs
Confirmed LogQL syntax
Checked rate limits not exceeded
Exported logs for analysis