Skip to main content

Scaling Applications

Scale your applications to handle growing demand and improve availability.

Scaling Overview

What is Scaling?

Scaling adjusts your application's capacity to handle traffic and load.

Types of Scaling

  • Horizontal: Add more replicas/instances
  • Vertical: Increase CPU/memory per instance
  • Regional: Deploy to additional regions
  • Auto: Automatic based on load

Horizontal Scaling (Replicas)

Understanding Replicas

What are Replicas?

  • Multiple copies of your application
  • Each runs independently
  • Load balancer distributes traffic
  • Provides redundancy
  • Improves availability

Benefits

  • Handle more traffic
  • Distribute load
  • Tolerate failures
  • Rolling updates
  • No downtime deployments

Scaling Replicas

Increase Replica Count

  1. Open application details
  2. Find replica configuration
  3. Increase replica count
  4. New instances start
  5. Load balancer includes them

Scaling Process

  • Takes several minutes
  • Instances start one by one
  • Health checks run
  • Traffic gradually added
  • Monitoring updates

Decrease Replica Count

  1. Open application details
  2. Reduce replica count
  3. Instances gracefully shut down
  4. In-flight requests complete
  5. Load shifts to remaining
warning

Decreasing replicas may impact availability during traffic spikes.

Replica Best Practices

  1. Minimum 2 Replicas: For high availability
  2. Match Traffic: Scale for expected load
  3. Monitor Metrics: Watch CPU/memory
  4. Test Scaling: Try before production
  5. Plan Gradual: Scale slowly
  6. Monitor Cost: More replicas = more cost

Vertical Scaling (Resources)

Understanding Resource Scaling

CPU Allocation

  • Virtual CPUs per instance
  • Affects performance
  • Higher cost
  • Better processing power

Memory Allocation

  • RAM per instance
  • Determines capacity
  • Affects cost
  • Performance impact

Scaling Resources

Increase CPU

  1. Open application settings
  2. Select higher CPU tier
  3. Restart application
  4. New CPU allocated
  5. Performance improves

Increase Memory

  1. Open application settings
  2. Select higher memory
  3. Restart application
  4. New memory available
  5. Can handle more data

Scaling Impact

  • Requires restart
  • Brief downtime
  • New instances created
  • Old instances terminated
  • May take 5-10 minutes

Resource Best Practices

  1. Monitor Metrics: Watch usage
  2. Right-size: Don't over-allocate
  3. Test First: In staging
  4. Plan Gradual: Increase slowly
  5. Watch Costs: Higher = more expense
  6. Review Regularly: Optimize allocation

Regional Scaling

Multi-Region Deployment

Why Deploy to Multiple Regions?

  • Reduced latency for users
  • Geographic redundancy
  • High availability
  • Disaster recovery
  • Compliance requirements
  • Performance improvement

Available Regions

  • US East (N. Virginia)
  • US West (Oregon)
  • EU West (Ireland)
  • EU Central (Frankfurt)
  • Asia Pacific (Multiple)
  • Other regions

Deploying to Additional Regions

Add Region

  1. Open application
  2. Find region configuration
  3. Select new region
  4. Deploy application
  5. Monitor deployment

Deployment Process

  • Takes 5-15 minutes
  • Containers pulled
  • Health checks run
  • Traffic gradually routed
  • Monitoring configured

Remove Region

  1. Open application
  2. Find region settings
  3. Select region to remove
  4. Confirm removal
  5. Traffic rerouted

Regional Load Balancing

Traffic Distribution

  • Load balancer directs traffic
  • Based on location
  • Latency optimization
  • Failover handling
  • Geographic routing

Failover

  • If region fails
  • Traffic automatically rerouted
  • To healthy region
  • Minimal disruption
  • Transparent to users

Auto-Scaling

Understanding Auto-Scaling

What is Auto-Scaling?

  • Automatically adjust replicas
  • Based on metrics
  • Scale up under load
  • Scale down when quiet
  • Cost efficient

Metrics Monitored

  • CPU utilization
  • Memory usage
  • Request rate
  • Custom metrics

Configuring Auto-Scaling

Set Auto-Scaling Limits

  1. Open application settings
  2. Find auto-scaling configuration
  3. Set minimum replicas
  4. Set maximum replicas
  5. Configure metrics/thresholds

Configuration Parameters

  • Min replicas: Minimum always running
  • Max replicas: Maximum allowed
  • Scale-up threshold: When to add
  • Scale-down threshold: When to remove
  • Cooldown period: Wait between scales

Auto-Scaling Behavior

Scale Up

  • When metric exceeds threshold
  • Creates new replica
  • Adds to load balancer
  • Handles increased load
  • Cost increases

Scale Down

  • When metric drops
  • Removes replica
  • Scales down gradually
  • Reduces cost
  • Maintains minimum

Example Thresholds

  • CPU > 70%: Add replica
  • CPU < 30%: Remove replica
  • Wait 5 minutes between scales
  • Keep minimum 2 replicas

Monitoring Scaling

Metrics to Watch

During Scaling

  • Deployment status
  • Instance startup
  • Health check results
  • Traffic distribution
  • Performance metrics

After Scaling

  • Response times
  • Error rates
  • Resource utilization
  • Cost impact
  • User experience

Scaling Alerts

Set Alerts For

  • Scale-up events
  • Scale-down events
  • Failed scaling
  • Unhealthy instances
  • Resource limits

Cost Implications

Understanding Scaling Costs

Replica Costs

  • Each replica = compute cost
  • Per month billing
  • More replicas = more cost
  • But handles more traffic

Resource Costs

  • Higher CPU = higher cost
  • More memory = higher cost
  • Different regions may cost differently
  • Premium instances cost more

Optimization

  • Right-size resources
  • Use auto-scaling
  • Scale down off-peak
  • Choose efficient regions
  • Monitor spending

Load Balancing

How Load Balancing Works

Traffic Distribution

  • Requests distributed across replicas
  • Even distribution
  • Health checks ensure healthy
  • Failed replicas removed
  • Transparent to users

Load Balancer Types

  • Round-robin
  • Least connections
  • Resource based
  • Geographic based

Session Affinity

Sticky Sessions

  • User stays on same replica
  • Session persistence
  • For stateful apps
  • Configure if needed
  • May impact load distribution

Deployment During Scaling

Rolling Updates

Zero-Downtime Updates

  • Scale up to extra replicas
  • Update old instances
  • Gradually shift traffic
  • Remove old replicas
  • No service interruption

Process

  1. Create new replicas with new version
  2. Health checks
  3. Gradually route traffic
  4. Remove old replicas
  5. Deployment complete

Scaling with Updates

When Scaling and Updating

  1. Update image
  2. Scale to new version
  3. Old replicas shut down
  4. New replicas created
  5. Minimal disruption

Best Practices

  1. Start Conservative: Begin with 2 replicas
  2. Monitor Metrics: Watch performance
  3. Auto-Scale: For variable load
  4. Test First: In staging environment
  5. Plan Ahead: Know peak loads
  6. Cost Aware: Monitor spending
  7. Redundancy: Always have backup
  8. Document: Record scaling decisions

Scaling Strategies

Strategy by Application Type

High Traffic APIs

  • Start: 3-5 replicas
  • Scale: Auto-scale on CPU
  • Max: 10-20 replicas
  • Regional: Global deployment

Microservices

  • Per service: 2-3 minimum
  • Auto-scale each service
  • Different max for each
  • Regional: As needed

Static Sites

  • No scaling needed
  • CDN handles load
  • Single replica sufficient
  • Global edge locations

Databases

  • Read replicas for scaling
  • Sharding for large data
  • Regional replication
  • No horizontal scaling

Troubleshooting Scaling

Issues and Solutions

Scaling Stuck

  1. Check resource limits
  2. Verify permissions
  3. Review logs
  4. Try manual scaling
  5. Contact support

Not Scaling When Expected

  1. Check thresholds
  2. Verify metrics
  3. Check cooldown period
  4. Ensure auto-scaling enabled
  5. Monitor for time lag

Performance Not Improving

  1. Verify new replicas healthy
  2. Check load distribution
  3. Monitor actual metrics
  4. May need vertical scaling
  5. Consider regional scaling

Scaling Strategies Overview

Three Core Scaling Approaches:

Three Core Scaling Approaches:

  1. Horizontal Scaling (Replicas)
  • Add more instances of your application
  • Load balanced across replicas
  • Simple to implement
  • Improves availability
  • Best for stateless services
  1. Vertical Scaling (Resources)
  • Increase CPU and memory per instance
  • Better performance per instance
  • Simpler management
  • Higher cost per instance
  • Best for compute-intensive workloads
  1. Regional Scaling (Geographic Distribution)
  • Deploy to multiple regions
  • Reduces latency for users
  • Improves disaster recovery
  • Higher cost due to replication
  • Best for global applications

Scaling Comparison Matrix

ApproachComplexityImplementation TimeCost ImpactLatencyAvailabilityAuto-Scale Support
HorizontalLowMinutesMediumNoneImprovesYes
VerticalLowMinutesMediumNoneSameLimited
RegionalMedium5-15 minHighImprovesImprovesNo
Auto-ScalingMediumMinutesVariableNoneImprovesYes

Scaling Decision Framework

Ask These Questions:

  1. What is your current capacity?
  2. What is expected peak load?
  3. What is acceptable latency?
  4. What is your budget?
  5. Do you need global coverage?
  6. Is your app stateless?
  7. Can you tolerate brief downtime?

Real-World Scaling Scenarios

Scenario 1: Growing SaaS Application

  • Current: 2 replicas in US East
  • Peak load: 10x baseline
  • Solution: Auto-scaling to 10-20 replicas
  • Regional: Add EU and Asia for global users

Scenario 2: High-Performance Computing

  • Current: Small instances
  • Bottleneck: CPU-bound processing
  • Solution: Vertical scale to larger instances
  • No need for additional replicas

Scenario 3: Global Content Platform

  • Current: Single region
  • Solution: Multi-region deployment
  • Add regional caching
  • CDN for static content

Scaling Best Practices Checklist

  • [ ] Monitor current metrics before scaling
  • Set realistic minimum and maximum limits
  • Test scaling in staging first
  • Plan for cost implications
  • Document scaling decisions
  • Set up monitoring and alerts
  • Have rollback plan ready
  • Notify team of scaling operations
  • Review performance after scaling
  • Adjust strategy based on results

Summary

Successful application scaling depends on:

  • Understanding your growth trajectory
  • Choosing appropriate scaling strategy
  • Monitoring and adjusting continuously
  • Balancing performance and costs
  • Planning for high availability
  • Testing before production changes

Next Steps