High-Performance File Upload and Management in Flask
Optimize file upload, storage, and retrieval in Flask for high-performance applications
Introduction
Handling large file uploads efficiently in Flask can be challenging. Naive implementations can lead to high memory consumption, slow performance, and server crashes.
In this guide, we’ll explore high-performance techniques for file uploads, storage, and retrieval in Flask, including:
- Streaming file uploads to avoid memory overload
- Asynchronous file handling for better scalability
- Chunked file uploads for large files
- Integrating with cloud storage solutions
By the end, you’ll be able to optimize file handling in Flask APIs for speed and efficiency. 🚀
Setting Up Flask for File Uploads
First, install Flask and required dependencies:
pip install flask werkzeug
Then, set up a basic Flask app:
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os
app = Flask(__name__)
UPLOAD_FOLDER = "uploads"
app.config["UPLOAD_FOLDER"] = UPLOAD_FOLDER
if not os.path.exists(UPLOAD_FOLDER):
os.makedirs(UPLOAD_FOLDER)
This initializes Flask and creates an uploads directory if it doesn’t exist.
Efficient File Upload Handling
1. Basic File Upload
A simple file upload endpoint:
@app.route('/upload', methods=['POST'])
def upload_file():
if "file" not in request.files:
return jsonify({"message": "No file uploaded"}), 400
file = request.files["file"]
if file.filename == "":
return jsonify({"message": "No selected file"}), 400
filename = secure_filename(file.filename)
file.save(os.path.join(app.config["UPLOAD_FOLDER"], filename))
return jsonify({"message": f"File {filename} uploaded successfully"}), 200
However, this method loads the entire file into memory, which is inefficient for large files.
2. Streaming File Uploads (Avoid Memory Overload)
To prevent Flask from loading files into memory, use file streaming:
@app.route('/stream-upload', methods=['POST'])
def stream_upload():
file = request.files.get("file")
if not file:
return jsonify({"message": "No file uploaded"}), 400
filename = secure_filename(file.filename)
file_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
with open(file_path, "wb") as f:
for chunk in file.stream:
f.write(chunk)
return jsonify({"message": f"File {filename} uploaded successfully"}), 200
This writes the file in chunks, preventing excessive memory usage.
3. Chunked File Upload (Handling Large Files)
For very large files, chunked uploads split files into smaller parts:
Client-Side Upload (Split File into Chunks)
def split_file(file_path, chunk_size=5 * 1024 * 1024): # 5MB per chunk
with open(file_path, "rb") as f:
chunk_number = 0
while chunk := f.read(chunk_size):
with open(f"{file_path}.part{chunk_number}", "wb") as chunk_file:
chunk_file.write(chunk)
chunk_number += 1
Server-Side API to Accept Chunks
@app.route('/chunk-upload', methods=['POST'])
def chunk_upload():
chunk = request.files.get("chunk")
chunk_index = request.form.get("index")
file_id = request.form.get("file_id")
if not chunk or chunk_index is None or not file_id:
return jsonify({"message": "Missing parameters"}), 400
chunk_path = os.path.join(app.config["UPLOAD_FOLDER"], f"{file_id}.part{chunk_index}")
chunk.save(chunk_path)
return jsonify({"message": f"Chunk {chunk_index} received"}), 200
Merging Chunks on the Server
def merge_chunks(file_id, total_chunks):
file_path = os.path.join(app.config["UPLOAD_FOLDER"], file_id)
with open(file_path, "wb") as final_file:
for i in range(int(total_chunks)):
chunk_path = f"{file_path}.part{i}"
with open(chunk_path, "rb") as chunk_file:
final_file.write(chunk_file.read())
os.remove(chunk_path)
return file_path
This reduces network failures and allows resumable uploads.
Asynchronous File Upload Processing
Uploading files in a blocking request can slow down Flask. Instead, use Celery for background file processing:
pip install celery redis
Configure Celery with Redis:
from celery import Celery
def make_celery(app):
celery = Celery(
app.import_name,
backend="redis://localhost:6379/0",
broker="redis://localhost:6379/0",
)
celery.conf.update(app.config)
return celery
celery = make_celery(app)
Define an asynchronous file processing task:
@celery.task
def process_uploaded_file(file_path):
# Simulate file processing
import time
time.sleep(5)
return f"Processed file {file_path}"
Trigger async processing after upload:
@app.route('/async-upload', methods=['POST'])
def async_upload():
file = request.files.get("file")
if not file:
return jsonify({"message": "No file uploaded"}), 400
filename = secure_filename(file.filename)
file_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
file.save(file_path)
task = process_uploaded_file.delay(file_path)
return jsonify({"message": f"File {filename} uploaded", "task_id": task.id}), 200
This prevents API slowdowns by delegating file processing to Celery.
Integrating Cloud Storage (AWS S3)
For scalable file storage, integrate AWS S3:
pip install boto3
Configure AWS credentials:
import boto3
s3 = boto3.client("s3", aws_access_key_id="YOUR_KEY", aws_secret_access_key="YOUR_SECRET")
@app.route('/upload-s3', methods=['POST'])
def upload_s3():
file = request.files.get("file")
if not file:
return jsonify({"message": "No file uploaded"}), 400
filename = secure_filename(file.filename)
s3.upload_fileobj(file, "your-bucket-name", filename)
return jsonify({"message": f"File {filename} uploaded to S3"}), 200
This offloads storage and retrieval to the cloud.
Conclusion
Optimizing file uploads in Flask involves:
- Streaming uploads to reduce memory usage
- Chunked uploads for handling large files
- Asynchronous processing to improve performance
- Cloud storage integration for scalability
With these techniques, your Flask app can handle high-performance file uploads efficiently! 🚀