Skip to main content

K8s Logs Quick Reference

A quick reference guide for common tasks and commands in Kubernetes Logs management.

Dashboard Navigation

TaskLocation
View logsClusters → Select Cluster → Logs Tab
Search logsLogs Tab → Enter filters → Click "Search Logs"
Configure collectionCollection Config Tab → Create/Edit configs
Set up archivalArchive Tab → Create Policy
Monitor metricsMetrics Tab → View statistics
Check system healthSettings Tab → System Health section
View S3 archivesS3 Storage Tab → Search archives
Manage clustersClusters Tab → View/edit configs

Common Search Filters

FilterPurposeExample
NamespaceLimit to specific K8s namespace"production"
Log LevelFilter by severity"error"
Date RangeSpecify time periodLast 24 hours
Search QueryFull-text search in messages"connection timeout"
Pod NameFilter by specific pod"api-server-1"

LogQL Quick Reference

Basic Queries

# All logs from namespace
{namespace="production"}

# Error logs from pod
{pod="api-server"} |= "error"

# Multiple namespaces
{namespace=~"prod.*"}

# Exclude pattern
{namespace="prod"} != "debug"

Operators

OperatorMeaningExample
|=Contains (matches)|= "error"
!=Does not contain!= "debug"
|~Regex match|~ "err.*\[0-9\]"
!~Regex not match!~ "warn"
>Greater than (numeric)> 500
<Less than (numeric)< 100
>= <=Greater/less or equal>= 1000

Label Filters

# Exact match
{app="api"}

# Label exists
{namespace!=""}

# Regex match
{pod=~"api.*"}

# Multiple labels
{namespace="prod", app="web"}

Advanced Queries

# Combine filters with AND
{namespace="prod"} |= "error" |= "database"

# Combine with OR (use multiple patterns)
{namespace="prod"} |= "error" |~ "fatal|critical"

# Extract JSON fields
{app="api"} | json | status >= 500 | line_format "{{.message}}"

# Parse labels
{namespace="prod"} | regexp "user=(?P<user>\w+)"

# Count occurrences
{namespace="prod"} | pattern "<_> <method> <path> <_> <status>"

Collection Configuration Defaults

SettingDefaultRecommended Range
Collection Interval300s (5 min)60-600 seconds
Max Logs Per Fetch1000500-5000
Retention Days307-365
Log LevelsInfo, Warn, ErrorDepends on needs

Archive Policy Presets

Development Environment

Retention: 7 days
Compression: Enabled
Destination: S3

Staging Environment

Retention: 30 days
Compression: Enabled
Destination: S3

Production Environment

Retention: 90 days
Compression: Enabled
Destination: S3 Standard-IA

Compliance (Long-term)

Retention: 365+ days
Compression: Enabled
Destination: S3 Glacier Deep Archive

Common Troubleshooting

No Logs Appearing

Checklist:

  • Is collection enabled for the namespace?
  • Does the cluster show "healthy" status?
  • Are there logs in the selected date range?
  • Is at least one pod in the namespace generating logs?

Fix:

  1. Go to Collection Config Tab
  2. Verify the configuration is Enabled
  3. Check Settings Tab > System Health
  4. Expand date range and try again

LogQL Query Errors

Problem: Syntax error at position X

Solutions:

# ✗ Wrong - Missing quotes
{namespace=production}

# ✓ Correct - Quotes required
{namespace="production"}

# ✗ Wrong - Invalid operator
{namespace="prod" and app="api"}

# ✓ Correct - Use label syntax
{namespace="prod", app="api"}

High Storage Usage

Solutions:

  1. Reduce retention days in Archive policy
  2. Increase collection interval (collect less frequently)
  3. Reduce max logs per collection
  4. Exclude high-volume namespaces

Performance Tips

Searching

Fast:
- Specific namespace
- Narrow date range (24 hours)
- Log level filter

Slow:
- All namespaces
- 6+ month range
- No filters

Collection

Efficient:
- Interval: 300-600 seconds
- Max logs: 1000-2000
- Specific namespaces only

Inefficient:
- Interval: 60 seconds
- Max logs: 5000+
- All namespaces including system

API Rate Limits

Endpoint TypeLimitPeriod
Standard calls100/minute
Search/Archive10/minute
Heavy operations5/minute

If rate limited: Exponential backoff (1s, 2s, 4s, 8s...)

Storage Calculation

Database Storage

Avg log size: 512 bytes
Monthly logs: 1M logs/day × 30 days = 30M logs
Space needed: 30M × 512 bytes = ~15 GB/month
Cost: ~$4.50/month (at $0.30/GB)

S3 Storage (After Compression)

Compressed size: ~70% reduction (512 → 150 bytes)
Space needed: 30M × 150 bytes = ~4.5 GB/month
Cost: ~$0.10/month (at $0.023/GB S3 Standard)
Cost: ~$0.02/month (at $0.004/GB S3 Glacier)

Common API Calls

Get Logs (cURL)

curl -X POST https://api.nife.io/v1/k8s/logs/search \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"cluster_id": "prod-cluster",
"namespace": "production",
"level": "error",
"limit": 100
}'

Create Archive Policy (cURL)

curl -X POST https://api.nife.io/v1/k8s/logs/archive/policies \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Prod Archive",
"retention_days": 90,
"compression_type": "gzip",
"is_active": true
}'

List Collections (cURL)

curl -X GET "https://api.nife.io/v1/k8s/logs/config?cluster_id=prod-cluster" \
-H "Authorization: Bearer YOUR_TOKEN"

Field Mapping (UI → API)

UI FieldAPI Parameter
Cluster IDcluster_id
Namespacenamespace
Pod Namepod_name
Containercontainer_name
Log Levellog_level
Start Timestart_time
End Timeend_time
Log Levelslog_levels
Collection Intervalcollection_interval

Retention vs Archive Timeline

Day 1-30 (Default)
├─ Real-time in database
├─ Full query capability
└─ Highest cost

Day 31-90 (Archive to S3)
├─ Accessible but slower
├─ Limited query capability
└─ Low cost

Day 91+ (Glacier)
├─ Cold storage
├─ Hours to retrieve
└─ Minimal cost

Collection Options Comparison

OptionWhen to UseCostSpeed
Interval: 60sHigh-priority productionHighReal-time
Interval: 300sStandard productionMediumNear real-time
Interval: 600sStaging/DevLowSlightly delayed
On-DemandDebuggingVariableOn request

Status Indicators

IndicatorMeaningAction
🟢 HealthyAll systems operationalNone needed
🟡 DegradedSome components slowMonitor closely
🔴 UnhealthySystem not operationalCheck immediately

Useful Metrics to Monitor

Production:
- Logs per hour &gt; 50,000?
- Error logs trending up?
- Storage growth &gt; 1GB/day?

Staging:
- Collection jobs completing?
- Archive jobs succeeding?
- Any failed operations?

Quick Wins for Cost Reduction

  1. Archive older logs - Saves ~90% (moves to S3)
  2. Enable compression - Saves ~70% of storage
  3. Increase collection interval - Reduces volume (300s → 600s)
  4. Exclude system namespaces - Reduces noise
  5. Set shorter retention - Reduces storage

Before Contacting Support

  • Verified cluster connectivity: kubectl cluster-info
  • Checked system health status
  • Confirmed collection configs are enabled
  • Verified date range has logs
  • Confirmed LogQL syntax
  • Checked rate limits not exceeded
  • Exported logs for analysis

Help & Support