Building Scalable Microservices with Flask and Gunicorn
Learn how to create high-performance microservices using Flask and Gunicorn
Introduction
Flask is a lightweight and flexible microframework, making it an excellent choice for building microservices. However, Flask alone is not optimized for handling multiple requests concurrently. This is where Gunicorn, a powerful WSGI HTTP server, comes in.
In this guide, we will explore how to scale Flask applications using Gunicorn, covering worker models, performance tuning, and best deployment practices.
Why Use Gunicorn with Flask?
Flask’s built-in development server is single-threaded and not suitable for production. Gunicorn provides:
- Multiple worker processes for handling concurrent requests
- Load balancing between worker processes
- Asynchronous execution support with gevent or eventlet
- Graceful restarts and process management
Setting Up Flask with Gunicorn
Installing Dependencies
Ensure you have Flask and Gunicorn installed:
pip install flask gunicorn
Creating a Simple Flask Microservice
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/health')
def health_check():
return jsonify({"status": "ok"}), 200
@app.route('/data')
def get_data():
return jsonify({"message": "Microservice response"}), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Running Flask with Gunicorn
Basic Command
Run the Flask app using Gunicorn with default settings:
gunicorn -w 4 -b 0.0.0.0:5000 app:app
-w 4
: Uses 4 worker processes-b 0.0.0.0:5000
: Binds to all network interfaces on port 5000app:app
: Refers to theapp
instance inapp.py
Choosing the Right Worker Model
Gunicorn supports multiple worker models:
- Sync Workers: Default mode, handles requests one at a time.
- Thread Workers (
gthread
): Good for I/O-bound tasks. - Async Workers (
gevent
oreventlet
): Best for high concurrency.
Using Gevent for Async Execution
gunicorn -w 4 -k gevent -b 0.0.0.0:5000 app:app
This improves performance for I/O-heavy workloads like database queries or external API calls.
Scaling with Load Balancers
In a production environment, running multiple instances behind a reverse proxy (NGINX) or Kubernetes enhances scalability.
Nginx Configuration
server {
listen 80;
location / {
proxy_pass http://127.0.0.1:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
NGINX load balances requests between Gunicorn workers, improving fault tolerance.
Performance Tuning
Optimizing Worker Count
A good rule of thumb:
workers = 2 * CPU_CORES + 1
For an 8-core server, set:
gunicorn -w 17 -b 0.0.0.0:5000 app:app
Enabling Auto-Restart
To automatically restart workers after a set number of requests:
gunicorn --workers 4 --max-requests 1000 --max-requests-jitter 50 app:app
This prevents memory leaks by restarting workers periodically.
Deploying Flask + Gunicorn with Docker
Dockerfile
FROM python:3.9
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]
Building and Running the Container
docker build -t flask-microservice .
docker run -p 5000:5000 flask-microservice
Conclusion
Building scalable microservices with Flask and Gunicorn requires:
✅ Using multiple workers for concurrency
✅ Selecting the right worker model for performance
✅ Leveraging NGINX or Kubernetes for scalability
✅ Optimizing Gunicorn configurations for stability
✅ Deploying with Docker for portability
With these best practices, your Flask microservices can handle thousands of concurrent requests efficiently. 🚀