Introduction

Flask is a lightweight and flexible microframework, making it an excellent choice for building microservices. However, Flask alone is not optimized for handling multiple requests concurrently. This is where Gunicorn, a powerful WSGI HTTP server, comes in.

In this guide, we will explore how to scale Flask applications using Gunicorn, covering worker models, performance tuning, and best deployment practices.

Why Use Gunicorn with Flask?

Flask’s built-in development server is single-threaded and not suitable for production. Gunicorn provides:

  • Multiple worker processes for handling concurrent requests
  • Load balancing between worker processes
  • Asynchronous execution support with gevent or eventlet
  • Graceful restarts and process management

Setting Up Flask with Gunicorn

Installing Dependencies

Ensure you have Flask and Gunicorn installed:

pip install flask gunicorn

Creating a Simple Flask Microservice

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/health')
def health_check():
return jsonify({"status": "ok"}), 200

@app.route('/data')
def get_data():
return jsonify({"message": "Microservice response"}), 200

if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)

Running Flask with Gunicorn

Basic Command

Run the Flask app using Gunicorn with default settings:

gunicorn -w 4 -b 0.0.0.0:5000 app:app
  • -w 4: Uses 4 worker processes
  • -b 0.0.0.0:5000: Binds to all network interfaces on port 5000
  • app:app: Refers to the app instance in app.py

Choosing the Right Worker Model

Gunicorn supports multiple worker models:

  1. Sync Workers: Default mode, handles requests one at a time.
  2. Thread Workers (gthread): Good for I/O-bound tasks.
  3. Async Workers (gevent or eventlet): Best for high concurrency.

Using Gevent for Async Execution

gunicorn -w 4 -k gevent -b 0.0.0.0:5000 app:app

This improves performance for I/O-heavy workloads like database queries or external API calls.

Scaling with Load Balancers

In a production environment, running multiple instances behind a reverse proxy (NGINX) or Kubernetes enhances scalability.

Nginx Configuration

server {
listen 80;
location / {
proxy_pass http://127.0.0.1:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}

NGINX load balances requests between Gunicorn workers, improving fault tolerance.

Performance Tuning

Optimizing Worker Count

A good rule of thumb:

workers = 2 * CPU_CORES + 1

For an 8-core server, set:

gunicorn -w 17 -b 0.0.0.0:5000 app:app

Enabling Auto-Restart

To automatically restart workers after a set number of requests:

gunicorn --workers 4 --max-requests 1000 --max-requests-jitter 50 app:app

This prevents memory leaks by restarting workers periodically.

Deploying Flask + Gunicorn with Docker

Dockerfile

FROM python:3.9

WORKDIR /app
COPY . .

RUN pip install --no-cache-dir -r requirements.txt

CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]

Building and Running the Container

docker build -t flask-microservice .
docker run -p 5000:5000 flask-microservice

Conclusion

Building scalable microservices with Flask and Gunicorn requires:

✅ Using multiple workers for concurrency
✅ Selecting the right worker model for performance
✅ Leveraging NGINX or Kubernetes for scalability
✅ Optimizing Gunicorn configurations for stability
✅ Deploying with Docker for portability

With these best practices, your Flask microservices can handle thousands of concurrent requests efficiently. 🚀