K8s Logs Quick Reference & Cheatsheet

A quick reference guide for common tasks and commands in Kubernetes Logs management.

Dashboard Navigation#

TaskLocation
View logsClusters โ†’ Select Cluster โ†’ Logs Tab
Search logsLogs Tab โ†’ Enter filters โ†’ Click "Search Logs"
Configure collectionCollection Config Tab โ†’ Create/Edit configs
Set up archivalArchive Tab โ†’ Create Policy
Monitor metricsMetrics Tab โ†’ View statistics
Check system healthSettings Tab โ†’ System Health section
View S3 archivesS3 Storage Tab โ†’ Search archives
Manage clustersClusters Tab โ†’ View/edit configs

Common Search Filters#

FilterPurposeExample
NamespaceLimit to specific K8s namespace"production"
Log LevelFilter by severity"error"
Date RangeSpecify time periodLast 24 hours
Search QueryFull-text search in messages"connection timeout"
Pod NameFilter by specific pod"api-server-1"

LogQL Quick Reference#

Basic Queries#

# All logs from namespace
{namespace="production"}
# Error logs from pod
{pod="api-server"} |= "error"
# Multiple namespaces
{namespace=~"prod.*"}
# Exclude pattern
{namespace="prod"} != "debug"

Operators#

OperatorMeaningExample
\|=Contains (matches)\|= "error"
!=Does not contain!= "debug"
\|~Regex match\|~ "err.*\[0-9\]"
!~Regex not match!~ "warn"
>Greater than (numeric)> 500
<Less than (numeric)< 100
>= <=Greater/less or equal>= 1000

Label Filters#

# Exact match
{app="api"}
# Label exists
{namespace!=""}
# Regex match
{pod=~"api.*"}
# Multiple labels
{namespace="prod", app="web"}

Advanced Queries#

# Combine filters with AND
{namespace="prod"} |= "error" |= "database"
# Combine with OR (use multiple patterns)
{namespace="prod"} |= "error" |~ "fatal|critical"
# Extract JSON fields
{app="api"} | json | status >= 500 | line_format "{{.message}}"
# Parse labels
{namespace="prod"} | regexp "user=(?P<user>\w+)"
# Count occurrences
{namespace="prod"} | pattern "<_> <method> <path> <_> <status>"

Collection Configuration Defaults#

SettingDefaultRecommended Range
Collection Interval300s (5 min)60-600 seconds
Max Logs Per Fetch1000500-5000
Retention Days307-365
Log LevelsInfo, Warn, ErrorDepends on needs

Archive Policy Presets#

Development Environment#

Retention: 7 days
Compression: Enabled
Destination: S3

Staging Environment#

Retention: 30 days
Compression: Enabled
Destination: S3

Production Environment#

Retention: 90 days
Compression: Enabled
Destination: S3 Standard-IA

Compliance (Long-term)#

Retention: 365+ days
Compression: Enabled
Destination: S3 Glacier Deep Archive

Common Troubleshooting#

No Logs Appearing#

Checklist:

  • Is collection enabled for the namespace?
  • Does the cluster show "healthy" status?
  • Are there logs in the selected date range?
  • Is at least one pod in the namespace generating logs?

Fix:

  1. Go to Collection Config Tab
  2. Verify the configuration is Enabled
  3. Check Settings Tab > System Health
  4. Expand date range and try again

LogQL Query Errors#

Problem: Syntax error at position X

Solutions:

# โœ— Wrong - Missing quotes
{namespace=production}
# โœ“ Correct - Quotes required
{namespace="production"}
# โœ— Wrong - Invalid operator
{namespace="prod" and app="api"}
# โœ“ Correct - Use label syntax
{namespace="prod", app="api"}

High Storage Usage#

Solutions:

  1. Reduce retention days in Archive policy
  2. Increase collection interval (collect less frequently)
  3. Reduce max logs per collection
  4. Exclude high-volume namespaces

Performance Tips#

Searching#

Fast:
- Specific namespace
- Narrow date range (24 hours)
- Log level filter
Slow:
- All namespaces
- 6+ month range
- No filters

Collection#

Efficient:
- Interval: 300-600 seconds
- Max logs: 1000-2000
- Specific namespaces only
Inefficient:
- Interval: 60 seconds
- Max logs: 5000+
- All namespaces including system

API Rate Limits#

Endpoint TypeLimitPeriod
Standard calls100/minute
Search/Archive10/minute
Heavy operations5/minute

If rate limited: Exponential backoff (1s, 2s, 4s, 8s...)

Storage Calculation#

Database Storage#

Avg log size: 512 bytes
Monthly logs: 1M logs/day ร— 30 days = 30M logs
Space needed: 30M ร— 512 bytes = ~15 GB/month
Cost: ~$4.50/month (at $0.30/GB)

S3 Storage (After Compression)#

Compressed size: ~70% reduction (512 โ†’ 150 bytes)
Space needed: 30M ร— 150 bytes = ~4.5 GB/month
Cost: ~$0.10/month (at $0.023/GB S3 Standard)
Cost: ~$0.02/month (at $0.004/GB S3 Glacier)

Common API Calls#

Get Logs (cURL)#

curl -X POST https://api.nife.io/v1/k8s/logs/search \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"cluster_id": "prod-cluster",
"namespace": "production",
"level": "error",
"limit": 100
}'

Create Archive Policy (cURL)#

curl -X POST https://api.nife.io/v1/k8s/logs/archive/policies \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Prod Archive",
"retention_days": 90,
"compression_type": "gzip",
"is_active": true
}'

List Collections (cURL)#

curl -X GET "https://api.nife.io/v1/k8s/logs/config?cluster_id=prod-cluster" \
-H "Authorization: Bearer YOUR_TOKEN"

Field Mapping (UI โ†’ API)#

UI FieldAPI Parameter
Cluster IDcluster_id
Namespacenamespace
Pod Namepod_name
Containercontainer_name
Log Levellog_level
Start Timestart_time
End Timeend_time
Log Levelslog_levels
Collection Intervalcollection_interval

Retention vs Archive Timeline#

Day 1-30 (Default)
โ”œโ”€ Real-time in database
โ”œโ”€ Full query capability
โ””โ”€ Highest cost
Day 31-90 (Archive to S3)
โ”œโ”€ Accessible but slower
โ”œโ”€ Limited query capability
โ””โ”€ Low cost
Day 91+ (Glacier)
โ”œโ”€ Cold storage
โ”œโ”€ Hours to retrieve
โ””โ”€ Minimal cost

Collection Options Comparison#

OptionWhen to UseCostSpeed
Interval: 60sHigh-priority productionHighReal-time
Interval: 300sStandard productionMediumNear real-time
Interval: 600sStaging/DevLowSlightly delayed
On-DemandDebuggingVariableOn request

Status Indicators#

IndicatorMeaningAction
๐ŸŸข HealthyAll systems operationalNone needed
๐ŸŸก DegradedSome components slowMonitor closely
๐Ÿ”ด UnhealthySystem not operationalCheck immediately

Useful Metrics to Monitor#

Production:
- Logs per hour > 50,000?
- Error logs trending up?
- Storage growth > 1GB/day?
Staging:
- Collection jobs completing?
- Archive jobs succeeding?
- Any failed operations?

Quick Wins for Cost Reduction#

  1. Archive older logs - Saves ~90% (moves to S3)
  2. Enable compression - Saves ~70% of storage
  3. Increase collection interval - Reduces volume (300s โ†’ 600s)
  4. Exclude system namespaces - Reduces noise
  5. Set shorter retention - Reduces storage

Before Contacting Support#

  • Verified cluster connectivity: kubectl cluster-info
  • Checked system health status
  • Confirmed collection configs are enabled
  • Verified date range has logs
  • Confirmed LogQL syntax
  • Checked rate limits not exceeded
  • Exported logs for analysis

Documentation Links#

Help & Support#