Building a Self-Scaling n8n Instance That Dynamically Spawns Worker Nodes as Queue Depth Changes

n8n is great - we all know it - but here's where things get interesting—and where most people hit a wall. When you install n8n the standard way, you get a single instance that's honestly quite capable for many scenarios. But there's a fundamental bottleneck that becomes painfully apparent when you start scaling up. A standard n8n installation runs everything in one instance, and once you hit around 10 simultaneous workflows—sometimes even fewer—performance starts to degrade noticeably. You'll see slower processing times, potential instability under heavy load, and that frustrating feeling of watching your automation system become the very bottleneck you were trying to eliminate.

iii. The Auto-Scaling Solution

This is where our custom auto-scaling build comes in, and it's genuinely a game-changer. What we're building here isn't just a slightly improved n8n—it's a completely reimagined architecture designed specifically for users who need to run hundreds of simultaneous workflows. We're talking about a high-performance, resilient, distributed system that can handle enterprise-level scaling requirements without breaking a sweat. The beauty of this solution is how it dynamically adjusts its resources to meet demand, whether you're dealing with sudden bursts of activity or sustained high-volume processing.

The architecture we're implementing extends n8n by leveraging some seriously robust, production-ready technologies. We use Docker containers to ensure consistent and isolated service deployment—each component runs in its own little universe. Redis acts as our high-speed queue management and caching layer, handling job distribution with remarkable efficiency. PostgreSQL provides the persistent, reliable data storage we need for a production system. And orchestrating all of this is a custom Python autoscaler script that intelligently manages the workload, watching the queue and making smart decisions about when to scale up or down.

b. Overview of the Community Build

i. Acknowledging the Creator and Source

I want to give credit where it's due here. This powerful auto-scaling architecture wasn't created by some faceless corporation—it was developed and is actively maintained by a community member known as GitHub user conor-is-my-name and Reddit user u/conor_is_my_name. You can find the complete source code, configuration files, and documentation at:

https://github.com/conor-is-my-name/n8n-autoscaling

If you're interested in consulting work, the creator is based in San Francisco and prefers a retainer-based arrangement. All kudos goes to him!

ii. Key Features and Benefits

Once you complete this installation process, you'll have something truly special—a fully operational, enterprise-grade n8n environment. Let me walk you through what you're actually getting:

Dynamic Autoscaling is the star of the show. Workers automatically scale up and down based on the actual job queue depth in Redis. This means you always have exactly the processing power you need at any given moment. When a massive burst of activity hits, the system responds immediately without any manual intervention.

High Scalability means we're talking about handling hundreds of simultaneous workflow executions. The performance limits are primarily determined by your underlying server hardware—we've tested this successfully on an 8-core, 16GB RAM VPS handling massive loads.

The Queue Mode Architecture is pre-configured for n8n's queue mode, which distributes workloads across three distinct container types: a main UI/API container that handles the interface and coordination, a dedicated webhook container that ingests incoming triggers, and multiple worker containers that actually execute the workflows. This separation ensures maximum efficiency and responsiveness.

Automated HTTPS with Caddy manages your public access through a modern, secure web server. What's brilliant about Caddy is that it automatically provisions and renews TLS (SSL) certificates for your domains—both n8n.<your-domain> and webhook.<your-domain>—ensuring all traffic between your users and the server is properly encrypted.

You can add Optional UI Protection through Cloudflare Access (part of their Zero Trust suite) to add an extra authentication layer protecting the UI while keeping production webhooks fully functional and accessible.

Robust Data Persistence comes from using production-grade PostgreSQL for reliable data storage and Redis for high-speed queue management. This ensures your data remains safe and accessible even through container restarts or system updates.

The Integrated Tooling in the custom n8n Docker image is particularly impressive. We've pre-installed essential tools for advanced workflows:

Chromium + Puppeteer for browser automation, web scraping, and PDF generation
FFmpeg for complex media processing tasks
GraphicsMagick for image manipulation
Git + OpenSSH client for repository interactions and remote server operations

The Intelligent Autoscaler is a dedicated Python service that continuously monitors the Redis queue and programmatically scales the n8n-worker service up and down. It issues docker compose commands when the queue length crosses your defined thresholds, making smart decisions about resource allocation.

Built-in Monitoring includes a simple Redis queue monitor that provides real-time visibility into the current queue length by printing it to its logs. This helps tremendously with diagnostics and tuning.

Easy Updates are possible because the entire stack is managed via Docker Compose. Updates and maintenance become straightforward operations using standard Docker commands.

iii. Security by Default

Security wasn't an afterthought here—it was baked in from the beginning. Public traffic routes through Cloudflare's network, giving you DNS-level protection including DDoS mitigation and the ability to hide your server's origin IP address from casual observers when you enable the proxy feature. On your server, Caddy acts as a secure reverse proxy, terminating TLS connections and routing traffic internally. This layered approach ensures your communication is encrypted while benefiting from Cloudflare's robust infrastructure.

iv. Target Audience and Cost

What I love about this build is its accessibility. It works equally well for beginners running simple personal automations and for experts or businesses requiring enterprise-level scalability. The solution itself is completely free to use. Your only costs are purchasing a domain name—typically around $10 per year from registrars like Namecheap, GoDaddy, or Hostinger—and optionally renting a Virtual Private Server (VPS) for hosting, which you can get for a modest monthly fee.

c. Document Scope and Methodology

i. A Definitive, All-in-One Resource

This guide represents an exhaustive, all-in-one resource for deploying and managing the n8n auto-scaling architecture. I've synthesized information from every available source—the official GitHub repository files including README.md, docker-compose.yml, Dockerfile, autoscaler.py, and more, extensive community discussions on Reddit, and a deep analysis of the underlying source code. By consolidating all user queries, reported issues, resolutions, and technical insights into this single document, we're providing a clear, comprehensive path to successful deployment.

2. Architectural Deep Dive: System Internals Explained

a. High-Level System Diagram and Data Flow

Before we dive into installation, it's crucial to understand exactly what you're building. This isn't your standard n8n installation—it's a distributed system engineered for high performance and resilience. Each component plays a specific, vital role in the overall architecture.

i. Visual Architecture Diagram

Let me show you how data flows through the system, from an external user or service on the internet, through the Cloudflare DNS layer, and into the interconnected services within your Docker environment, all managed by the Caddy reverse proxy:

graph TD
    subgraph "Public Internet"
        A[User/Client] -->|HTTPS| B(Cloudflare DNS Proxy);
        C[External Service] -->|HTTPS Webhook| B;
    end

    subgraph "Your Server / VPS"
        B -->|Encrypted Traffic| D[Caddy Reverse Proxy Service];

        subgraph "Internal Routing"
            D -- "n8n.yourdomain.com" --> E[n8n Main Service:5678];
            D -- "webhook.yourdomain.com" --> F[n8n Webhook Service:5678];
        end
        
        subgraph "Execution & Scaling Core"
            E -->|Queues Jobs| G[Redis Service];
            F -->|Queues Jobs| G;
            G -->|Jobs Pulled by| H((n8n Worker Service(s)));
        end
        
        subgraph "Autoscaling Logic"
            I[Autoscaler Service] -->|Monitors Queue Length| G;
            I -->|Issues 'docker compose' Commands| J[Docker Socket];
            J -->|Scales Up/Down| H;
        end

        subgraph "Data Persistence"
            E -->|Stores Workflows/Credentials| K[PostgreSQL Service];
            H -->|Reads/Writes Execution Data| K;
        end

        subgraph "Monitoring"
          L[Redis Monitor Service] -->|Reads Queue Length| G;
        end
    end

ii. Text-Based Graph Representation

Here's a simpler view that illustrates the core interactions between the main application components:

graph TD
    A[n8n Main] -->|Queues jobs| B[Redis]
    B -->|Monitors queue| C[Autoscaler]
    C -->|Scales| D[n8n Workers]
    B -->|Monitors queue| E[Redis Monitor]
    F[PostgreSQL] -->|Stores data| A
    A -->|Webhooks| G[n8n Webhook]

b. Detailed Component Breakdown

i. Ingress and Security Layer

(1) Caddy (caddy)

Think of Caddy as your secure gateway to the internet. It's a powerful, modern reverse proxy that runs inside a Docker container on your server. Its primary job is receiving incoming traffic from the internet on ports 80 and 443. What makes Caddy special is how it automatically handles all aspects of TLS encryption—it obtains and renews SSL certificates from Let's Encrypt for your domains without you having to lift a finger. When a request comes in (say, for n8n.yourdomain.com), Caddy knows exactly where to route it within your Docker network. This replaces the need for more complex reverse proxies or tunnel services, making your life significantly easier.

ii. n8n Application Services

(1) n8n Main (n8n service)

This is the core n8n application—think of it as the "main" or "master" node of your system. It serves the user interface you interact with, manages your workflows and credentials, and handles API requests. In our queue-based architecture (we're using EXECUTIONS_MODE=queue), when a workflow gets triggered—whether manually or via a webhook—this service doesn't actually execute the workflow itself. Instead, it places the execution task onto the Redis queue for a worker to pick up. This separation is key to our scalability. The service operates on internal port 5678 for the main application and 5679 for the task broker, writing user data to a persistent volume at /n8n/main.

(2) n8n Webhook (n8n-webhook service)

This specialized n8n process is dedicated solely to ingesting incoming webhooks. Why separate this out? Well, by having a dedicated process for webhook ingestion, your main UI remains fast and responsive even when you're getting hammered with incoming webhook calls. Like the main service, it quickly receives webhook data and places it onto the Redis queue for processing. It's started with the specific launch command /webhook and writes its data to its own persistent volume.

(3) n8n Worker (n8n-worker service)

These are your workhorses—the components that actually do the heavy lifting. They're headless n8n instances with one job: pull tasks from the Redis queue and execute them. This is where your workflow logic actually runs. The system can run multiple worker containers in parallel, which is the secret sauce for handling high concurrency. Critically, the n8n-worker service is what the autoscaler scales up and down in response to workload changes. It starts with a minimum number of replicas and includes a graceful shutdown timeout to allow long-running tasks to complete before termination.

iii. State and Queue Management

(1) Redis (redis service)

Redis is essentially the central nervous system of our execution pipeline. It acts as a high-speed message broker using the BullMQ library's queueing logic. When the main n8n service receives a job, it pushes it to a list in Redis—specifically, a key named bull:jobs:wait. Workers constantly watch this list and pull the next available job to execute. Redis's in-memory speed is absolutely essential for the low-latency job distribution we need.

(2) PostgreSQL (postgres service)

PostgreSQL serves as the system's long-term memory. It stores all your critical data—workflows, credentials, execution history and logs, user settings. Using a robust database like PostgreSQL (we're using version 17) instead of the default SQLite is non-negotiable for any scalable, production-grade deployment. The build configures PostgreSQL to use secure SCRAM (Salted Challenge Response Authentication Mechanism) for authentication, because security matters.

iv. Scaling and Monitoring Logic

(1) Autoscaler (n8n-autoscaler service)

This is the custom-built "brain" of our scaling mechanism. It's a Python script (autoscaler.py) that runs in its own container, following a continuous control loop that's actually quite elegant:

First, it periodically connects to the Redis service. Then it checks the length of the job queue—and here's where it gets smart. To ensure compatibility with different versions of the BullMQ library, it intelligently checks for jobs under both bull:jobs:wait and bull:jobs:waiting keys.

Next, it compares this queue length to the SCALE_UP_QUEUE_THRESHOLD and SCALE_DOWN_QUEUE_THRESHOLD you've defined in your configuration. It queries the Docker daemon via the mounted Docker socket (/var/run/docker.sock) to determine how many n8n-worker containers are currently running.

If the queue is long enough and we're below the configured maximum workers, it issues a docker compose ... --scale n8n-worker=N+1 command to start a new worker. Conversely, if the queue is short enough and we're above the minimum, it scales down. It respects a COOLDOWN_PERIOD_SECONDS to prevent rapid, unnecessary scaling actions—what we call "flapping" in the industry.

(2) Redis Monitor (redis-monitor service)

This simple utility container has one job: print the current Redis queue length to its logs at regular intervals. It's purely for your observational and debugging purposes. You can easily see the live status of the job queue by running docker logs redis-monitor.

v. Networking and Volumes

(1) Internal Network (n8n-network)

The docker-compose.yml file defines an internal bridge network named n8n-network. All services in the stack attach to this network, allowing them to communicate using their service names as hostnames. For example, the main n8n service can reach the database simply by referencing postgres.

(2) External Network (shark)

The setup includes creating an "external" Docker network named shark. This allows you to easily connect other Docker Compose projects to this n8n stack in the future without complex network configurations. Services like PostgreSQL attach to it, allowing potential access from other containers you might deploy later.

(3) Persistent Volumes

The docker-compose.yml defines several named volumes—postgres_data, redis_data, n8n_main, and n8n_webhook. These ensure your critical data persists even if containers are removed or recreated, separating the application's state from its runtime environment.

3. Prerequisites: Gathering Your Resources

Let me be clear: failure to meet these prerequisites is the most common source of installation problems. Verify each one carefully before you begin—it'll save you hours of troubleshooting later.

a. Hardware and Server Requirements

i. Server or Local Machine

(1) Recommended VPS Specifications

For best performance, I recommend using a Virtual Private Server (VPS) or cloud VM. Here's what you'll need:

For minimum hardware, a machine with at least 2 CPU cores and 4GB of RAM provides a good starting point for basic use. But if you're serious about scale—and I assume you are if you're reading this—you'll want recommended hardware for scale: a more powerful server with 8 cores and 16GB of RAM to handle high-scale operations with hundreds of concurrent executions.

As for VPS providers, reputable options include Netcup (the author uses a Root VPS RS 2000 for around €5/month), Hetzner, DigitalOcean, Vultr, Linode, and Google Cloud Platform (GCP).

(2) Local Machine and Storage

You can absolutely run this entire stack on a powerful local computer running Windows, macOS, or Linux. Regardless of where you run it, ensure you have at least 10GB of free storage available for the Docker images and persistent data volumes.

ii. ARM Architecture Warning

(1) Compatibility Issues for Raspberry Pi and Apple Silicon

CRITICAL: This is important—the provided Dockerfile is built for the x86/amd64 CPU architecture common in most servers and desktop PCs. If you're using an ARM-based machine like a Raspberry Pi (including the Pi 5) or an Apple M-series Mac (M1/M2/M3), you're going to encounter significant issues. Package dependencies, particularly for the pre-installed chromium browser required by Puppeteer, may not be available or require different installation procedures. You'll likely need to find ARM-compatible versions of these packages and manually modify the Dockerfile to successfully build the n8n image.

b. Software and Tools

i. Core Dependencies

(1) Operating System

A Linux distribution like Ubuntu is my recommendation for server deployments. That said, the stack works perfectly well with Windows and macOS, primarily for local development.

(2) Docker and Docker Compose

You absolutely must have Docker version 20+ and Docker Compose v2+ installed. Modern Docker installations include Compose as a plugin, invoked with docker compose (two words, no hyphen).

On a fresh Linux server, you can install Docker quickly using the official convenience script:

curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh

On Windows or macOS, install Docker Desktop from https://www.docker.com/products/docker-desktop.

Always verify your installation with:

docker --version
docker compose version

(3) Git

Git is required to clone the project repository from GitHub. On Ubuntu, installation is straightforward:

sudo apt install git

ii. Recommended Tools for Ease of Use

(1) Visual Studio Code with Remote - SSH

If you're a beginner or working on a remote VPS, I highly recommend using Visual Studio Code with the Remote - SSH extension. This setup lets you connect directly to your server and edit configuration files visually, as if they were on your local machine. It's a game-changer for productivity.

c. Accounts and Services

i. Essential Accounts

(1) Domain Name

You must own a domain name (like yourdomain.com). This setup won't work with IP addresses alone because it relies on DNS and requires valid hostnames for Caddy to automatically issue SSL certificates. You can purchase a domain from any registrar—Namecheap, GoDaddy, or Hostinger—for approximately $10 per year.

(2) Cloudflare Account

A free Cloudflare account (available at https://dash.cloudflare.com/signup) is required. You'll need to add your domain to Cloudflare and point your domain's nameservers to Cloudflare's servers. This allows Cloudflare to manage your DNS records, which you'll then point to your server's IP address.

d. Pre-Start Security Considerations

i. Password and Key Generation

Before you begin configuration, prepare to generate several strong, unique passwords and secret keys. The N8N_ENCRYPTION_KEY, in particular, must be a random string of exactly 32 characters. You can use an online tool like https://passwordsgenerator.net for this purpose.

ii. Secure Ingress Strategy

This architecture's security model is built around exposing your server to the internet only on standard web ports—80 for HTTP and 443 for HTTPS. These ports are managed by the Caddy reverse proxy. For this to work, you must open these two ports in your server's or cloud provider's firewall. All other services, including the database, run on a private internal Docker network and aren't exposed to the public internet. This significantly reduces your server's attack surface.

4. End-to-End Installation and Configuration Guide

Follow these steps precisely. Don't skip any—each one is there for a reason.

a. Step 1: Environment Preparation

i. Connect to Server and Install Prerequisites

First, connect to your server using SSH or open a terminal on your local machine. Ensure that docker, docker compose, and git are installed and functioning correctly by running their respective --version commands.

CRITICAL: You must open ports 80 and 443 on your server's firewall. Caddy needs these ports to obtain SSL certificates and serve your n8n instance over HTTPS. If you're using ufw on Ubuntu, run these commands:

sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status

If you're using a cloud provider like AWS, GCP, or Hetzner, you must also configure the firewall/security group rules in their web console to allow inbound traffic on TCP ports 80 and 443.

ii. Clone Repository and Create Network

(1) Getting the Source Code

Navigate to a suitable directory where you want to store the project—/opt/ or your home directory work well—and clone the repository:

git clone https://github.com/conor-is-my-name/n8n-autoscaling

Next, change into the newly created project directory:

cd n8n-autoscaling

(2) Creating the External Docker Network

Create the external shark network. This network allows for easy interoperability with other Docker projects you might deploy in the future:

docker network create shark

b. Step 2: Cloudflare DNS Configuration

i. Pointing Your Domains to Your Server

Log in to your Cloudflare account. If you haven't already, add your domain to Cloudflare and follow the instructions to update its nameservers at your registrar. This change can take anywhere from 5-30 minutes to several hours to propagate—patience is key here.

From the Cloudflare dashboard for your domain, navigate to the DNS section. You need to create two records that point to your server's public IP address.

First, find your server's public IP address. You can usually find this in your VPS provider's dashboard or by running curl ifconfig.me on your server.

Create the DNS record for the n8n UI:

Click Add record
Type: A (if you have an IPv4 address) or AAAA (for IPv6)
Name: n8n (Cloudflare will automatically append your domain)
Content/IPv4 address: Enter your server's public IP address
Proxy status: Ensure the cloud icon is orange—this enables Cloudflare's proxy, adding security and hiding your server's IP
Click Save

Create the DNS record for the webhooks:

Click Add record again
Type: A or AAAA
Name: webhook
Content/IPv4 address: Enter the same server IP address
Proxy status: Ensure the cloud icon is orange
Click Save

After a few minutes, these DNS records should be active.

c. Step 3: Environment Variable Configuration (`.env` file)

This is the most critical configuration step. All secrets and settings for the entire stack are managed in a single .env file.

i. Creating the Configuration File

Inside the n8n-autoscaling directory, copy the example file to create your own configuration file:

cp .env.example .env

Now, open the .env file for editing with your preferred text editor:

nano .env

ii. Comprehensive Variable Breakdown

You must configure every relevant variable in this file. Let me walk you through each section:

(1) Autoscaling Parameters

COMPOSE_PROJECT_NAME=n8n-autoscaling - Do not change this. The autoscaler script relies on this exact name to find the correct containers to scale.

GENERIC_TIMEZONE=America/New_York - Change this to your local timezone, like Europe/Berlin.

MIN_REPLICAS=1 - The minimum number of worker containers that will always be running.

MAX_REPLICAS=5 - The maximum number of workers the autoscaler can create. Adjust this based on your server's CPU and RAM.

SCALE_UP_QUEUE_THRESHOLD=5 - The number of jobs in the queue that will trigger creation of a new worker.

SCALE_DOWN_QUEUE_THRESHOLD=1 - If the queue length drops below this value, a worker will be removed.

POLLING_INTERVAL_SECONDS=10 - How often (in seconds) the autoscaler checks the queue length.

COOLDOWN_PERIOD_SECONDS=10 - The minimum time to wait between scaling actions. The default of 10 is aggressive; a safer value for production might be 180 (3 minutes) to prevent flapping.

N8N_QUEUE_BULL_GRACEFULSHUTDOWNTIMEOUT=300 - How long (in seconds) a worker will wait for its active jobs to finish before shutting down. This should be longer than your longest-running workflow.

N8N_GRACEFUL_SHUTDOWN_TIMEOUT=300 - A general graceful shutdown timeout. Keep it consistent with the Bull setting.

(2) Redis and Postgres Configuration

REDIS_PASSWORD= - I highly recommend setting a strong, random password here for your Redis instance.

POSTGRES_HOST=postgres - Do not change.

POSTGRES_DB=n8n - Do not change unless you have a specific reason.

POSTGRES_USER=postgres - Do not change unless you have a specific reason.

POSTGRES_PASSWORD=YOURPASSWORD - CRITICAL: Change this to a very strong, unique password for your database.

(3) Core n8n Configuration

N8N_HOST=n8n.yourdomain.com - Replace yourdomain.com with your actual domain. This is the URL for the n8n UI.

N8N_WEBHOOK=webhook.yourdomain.com - Replace yourdomain.com with your domain. This is the base URL for production webhooks.

N8N_WEBHOOK_URL=https://webhook.yourdomain.com - The full URL for webhooks.

WEBHOOK_URL=https://webhook.yourdomain.com - A legacy variable; set to the same value.

N8N_EDITOR_BASE_URL=https://n8n.yourdomain.com - The full URL for the editor UI.

N8N_ENCRYPTION_KEY=YOURKEY - CRITICAL: This key encrypts your credentials. If you lose it, your credentials are irrecoverable. It must be exactly 32 characters long. Generate a secure random string. On Linux/macOS, you can use:

openssl rand -hex 16

N8N_USER_MANAGEMENT_JWT_SECRET=YOURKEY - CRITICAL: Change this to another strong, unique secret for user session management.

N8N_RUNNERS_AUTH_TOKEN=YOURPASSWORD - CRITICAL: Change this to another strong, unique secret for internal service communication.

After setting all variables, save and close the .env file.

d. Step 4: Caddy Reverse Proxy Configuration

Now we'll configure Caddy to route traffic to the correct n8n services.

i. Modifying `docker-compose.yml` for Caddy

The default docker-compose.yml from the repository includes services for traefik and cloudflared. You must remove these and add a service for caddy.

Open the docker-compose.yml file for editing.

Delete the entire service definitions for traefik and cloudflared.

Add the following service definition for caddy to the services: section of the file:

  caddy:
    image: caddy:2-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    networks:
      - n8n-network

At the bottom of the docker-compose.yml file, under the volumes: section, add the new volumes for Caddy:

volumes:
  postgres_data:
  redis_data:
  n8n_main:
  n8n_webhook:
  caddy_data:
  caddy_config:

Save and close the docker-compose.yml file.

ii. Creating the `Caddyfile`

In the same n8n-autoscaling directory, create a new file named Caddyfile:

touch Caddyfile

Open this new Caddyfile and paste the following content. Remember to replace n8n.yourdomain.com and webhook.yourdomain.com with your actual domains:

n8n.yourdomain.com {
    reverse_proxy n8n:5678
}

webhook.yourdomain.com {
    reverse_proxy n8n-webhook:5678
}

This configuration tells Caddy exactly what to do: when a request comes in for n8n.yourdomain.com, forward it to the n8n service on port 5678. When a request comes in for webhook.yourdomain.com, forward it to the n8n-webhook service on port 5678. Caddy handles all the HTTPS certificate logic automatically—it's brilliant.

e. Step 5: Launching the Application Stack

i. Running Docker Compose

You're now ready to start the entire stack. From your terminal, inside the n8n-autoscaling directory, run:

docker compose up -d

(1) Explanation and Patience

This command tells Docker Compose to build, create, and start all the services defined in the docker-compose.yml file in detached (-d) mode, meaning they'll run in the background.

BE PATIENT: The first time you run this command, it will take significant time—potentially 10-20 minutes or more. Docker needs to download all the base images (Node.js, PostgreSQL, Redis, Caddy, etc.), build the custom n8n image by running every step in the Dockerfile (which includes installing numerous system packages and Node.js dependencies), and then start all the services in the correct order. Go make yourself a coffee—you've earned it.

5. Post-Installation Verification and Initial Use

a. System Health Check

i. Verifying Running Containers

After a few minutes, check the status of all the containers:

docker compose ps

You should see all your containers listed (including caddy), all with a STATUS of Up ... or healthy. If any containers are restarting or have exited, check their logs to diagnose the issue.

ii. Checking Service Logs

To see the real-time output of any specific service, use the docker compose logs command. It's good practice to check the main services to ensure they started without errors:

# Check the main n8n service
docker compose logs n8n

# Check the autoscaler service
docker compose logs n8n-autoscaler

# Check the Caddy reverse proxy for SSL errors
docker compose logs caddy

# Check the databases
docker compose logs postgres
docker compose logs redis

b. Accessing the n8n UI and First-Time Setup

i. Navigating to the UI

Open your web browser and navigate to the URL you configured for the UI—something like https://n8n.yourdomain.com. You should see a secure connection (padlock icon) in your browser's address bar.

ii. Creating the Owner Account

(1) Initial Setup Process

The first time you access the UI, n8n will guide you through creating an owner account. Complete this process to gain access to your new n8n instance.

(2) Troubleshooting the "Owner Already Setup" Error

Here's a common issue that trips people up: you might encounter an error message saying "Instance owner already setup" immediately after creating the account, which prevents login. This is almost always caused by aggressive caching on Cloudflare's side.

To fix this, go to your Cloudflare dashboard for your domain, navigate to Caching -> Configuration, and click Purge Everything. You might also want to create a page rule or cache rule for your n8n UI hostname (n8n.yourdomain.com/*) to bypass the cache entirely and prevent this from happening again. After clearing the cache, refresh the n8n page in your browser.

c. Verifying Queue Mode and Autoscaling Functionality

To confirm the entire system is working as intended, let's simulate a load and observe the autoscaler in action.

i. Simulating Workload to Test Scaling

(1) Creating a Test Workflow

In the n8n UI, create a new workflow with two nodes:

First, add a Webhook Trigger Node—this will give you a "Production URL" that looks like https://webhook.yourdomain.com/webhook/your-id.

Second, add a Function Node to simulate a long-running task. Paste the following code inside it:

// This code simulates 15 seconds of work
const ms = 15000;
const start = Date.now();
while (Date.now() - start < ms) { /* busy wait */ }
return [{ status: 'done', tookMs: ms }];

Save and activate the workflow.

(2) Triggering Concurrent Executions

From your local terminal, run the following command, replacing the URL with your workflow's production webhook URL. This sends 50 requests to your workflow simultaneously, instantly creating a large job queue:

for i in $(seq 1 50); do
  curl -s -X POST "https://webhook.yourdomain.com/webhook/your-id" -d '{}' >/dev/null &
done
wait

ii. Observing Scaling Decisions

(1) Monitoring Autoscaler Logs

While the requests are being sent, watch the autoscaler's logs in real-time:

docker compose logs -f n8n-autoscaler

You should see log lines indicating the queue length rapidly increasing, followed by messages like "Condition met for SCALE UP" as it decides to add more worker containers.

(2) Checking Worker Replicas

In another terminal, check the number of running worker containers. The replica count should increase from your MIN_REPLICAS towards your MAX_REPLICAS:

docker compose ps n8n-worker

(3) Manually Checking the Redis Queue

You can also query the Redis queue length directly to see the number of jobs waiting to be processed:

# For BullMQ v3
docker compose exec redis redis-cli LLEN bull:jobs:wait

# For BullMQ v4+
docker compose exec redis redis-cli LLEN bull:jobs:waiting

As the workers process the jobs, you'll see this number decrease. Once it drops below your SCALE_DOWN_QUEUE_THRESHOLD, the autoscaler will begin scaling the workers back down to MIN_REPLICAS.

6. Advanced Configuration and Security Hardening

a. Securing the n8n UI with Cloudflare Zero Trust

You can add an extra layer of security that requires users to authenticate with an identity provider (like Google or a one-time password) before they can even reach the n8n login page. This is handled at the Cloudflare edge and works seamlessly with your Caddy setup.

i. The Goal and The Problem

(1) Adding SSO-Style Protection

The goal is to use Cloudflare Access (part of the Zero Trust suite) to protect your n8n UI from unauthorized access.

(2) The Test Webhook Conflict

However, here's the catch: a naive implementation that protects the entire n8n.yourdomain.com hostname will cause problems. The "Test workflow" button inside the n8n editor sends requests to a test webhook URL that also uses the n8n.yourdomain.com hostname. If the entire host is protected, Cloudflare Access will block these internal test calls, causing them to fail.

ii. The Path-Based Solution

The solution is elegant: create a policy that protects only the specific URL paths associated with the user interface, leaving other paths open for testing.

(1) Creating the Access Application

In the Cloudflare Zero Trust dashboard, go to Access -> Applications.

Click "Add an application" and choose "Self-hosted".

Configure the application:

Application name: A descriptive name like n8n UI
Subdomain: n8n
Domain: Select your domain

(2) Configuring Policies for Specific Paths

Next, create a policy to define who's allowed access. Create a rule that allows your email address or an email domain.

Crucially, instead of applying this policy to the entire application, configure it to apply only to specific paths. Edit the policy and add rules to protect only the following paths:

/signin
/home

This configuration ensures that the sensitive UI pages are protected by Cloudflare Access, while the base URL and the /webhook-test/ path used by the editor remain accessible, allowing test webhooks to function correctly. Production webhooks on webhook.yourdomain.com are entirely unaffected.

b. Comprehensive Security Hardening Checklist

i. Final Security Review

To ensure your deployment is as secure as possible, review these best practices:

(1) Applying Best Practices

Secrets: Use long, unique, and randomly generated secrets for N8N_ENCRYPTION_KEY, POSTGRES_PASSWORD, and all other JWT/auth tokens in your .env file.

Cloudflare: Beyond DNS management, consider enabling Cloudflare's WAF (Web Application Firewall) rules to block common threats and rate-limit webhook endpoints if appropriate for your use case. Ensure the "Proxy status" (orange cloud) is enabled for your DNS records. Ensure caching is disabled for the n8n UI hostname to prevent login issues.

PostgreSQL: Ensure your POSTGRES_PASSWORD is strong. The stack already correctly configures secure SCRAM-SHA-256 authentication. The PostgreSQL port isn't exposed to the public internet by default in this configuration—it's only accessible from within the private Docker network.

Firewall: Ensure your server's firewall only allows inbound traffic on ports 22 (for SSH), 80, and 443. All other ports should be blocked from the public internet.

Backups: Automate regular backups of your PostgreSQL database and store the backups in a secure, off-host location. Periodically test your restore process.

Logging: For long-term production use, consider configuring a Docker logging driver to forward container logs to a centralized logging service for persistent storage and analysis.

7. Customization, Tuning, and Advanced Usage

a. Adding Custom Community Nodes and Packages

The provided n8n image is extensible, allowing you to add custom n8n nodes or other Node.js packages required by your workflows. This repository doesn't enable automatic installation via the N8N_COMMUNITY_PACKAGES environment variable—packages must be baked into the image.

i. Modifying the Dockerfile

(1) Baking Packages into the Image

The most reliable method is to bake the packages directly into your custom n8n Docker image.

Open the main Dockerfile in the project's root directory.

Locate the line that reads RUN npm install -g n8n puppeteer.

Append the package names you wish to install to this line. For example, to add a custom node package like n8n-nodes-replicate and the puppeteer-extra library for stealth web scraping, change the line to:

RUN npm install -g n8n puppeteer n8n-nodes-replicate puppeteer-extra puppeteer-extra-plugin-stealth

ii. Updating Environment Variables

(1) Allowing External Modules

If a package is intended to be used within a Code Node (using require()), you must explicitly grant n8n permission to execute it.

Open your .env file.

Find the NODE_FUNCTION_ALLOW_EXTERNAL variable.

Append the name of the new package(s) to the comma-separated list. For the example above, it would look like:

NODE_FUNCTION_ALLOW_EXTERNAL=ajv,ajv-formats,puppeteer,n8n-nodes-replicate,puppeteer-extra,puppeteer-extra-plugin-stealth,ffmpeg,git,graphicsmagick,openssh-client

iii. Rebuilding the n8n Image

(1) Applying the Changes

Your changes won't take effect until you rebuild the n8n image and restart the services:

# Rebuild the images, ensuring no cache is used to force the new npm install
docker compose build --no-cache

# Restart the stack with the newly built images
docker compose up -d

b. Performance Tuning and Scalable Workflow Architecture

i. The Anti-Pattern: Avoiding Large In-Node Loops

Here's a common mistake I see all the time: when processing large datasets (thousands of rows from a database), people use n8n's built-in "Loop Over Items" node. For a scalable architecture, this is an anti-pattern. A single, long-running loop will monopolize one worker container for its entire duration—possibly for hours. This prevents that worker from processing any other jobs and completely defeats the purpose of our parallel, multi-worker setup.

ii. The Professional Strategy: Event-Driven Fan-Out

A much more scalable and resilient approach is to re-architect your logic into an event-driven, "fan-out" pattern using two separate workflows.

(1) Architecting with a Dispatcher and Processor

Workflow A (The "Dispatcher"):

Trigger: This workflow can be triggered manually, on a schedule, or by an initial event
Logic: Its sole purpose is to fetch the large list of items to process (query 10,000 user records from a database, for example)
Action: Instead of looping, it iterates through the list and, for each individual item, makes an HTTP Request to the webhook trigger of Workflow B, passing the item's data in the request body

Workflow B (The "Processor"):

Trigger: This workflow is triggered by a Webhook node
Logic: It receives a single item from Workflow A and performs the required long-running action on just that one item

(2) Benefits of This Pattern

This architecture is vastly superior for several reasons:

True Parallelism: When the Dispatcher sends 10,000 webhook calls, they become 10,000 independent jobs in the Redis queue. The autoscaler will see the massive queue length, instantly scale up to your MAX_REPLICAS, and all your workers will begin processing these 10,000 jobs in parallel. A multi-hour task can be completed in minutes.

Increased Resilience: If the processing for a single item fails in Workflow B, it only affects that one execution. The other 9,999 jobs will continue to process successfully. In a loop, a single failure can halt the entire batch.

Horizontal Scalability: This pattern scales almost perfectly with the number of workers. If you need more throughput, you simply increase MAX_REPLICAS and provide a server with more CPU cores.

c. Migrating from an Existing n8n Instance

i. The Manual Migration Process

(1) Confirming No Automated Path

There is no automated migration path or in-place upgrade process to move from a standard n8n installation to this auto-scaling architecture. This should be treated as a completely fresh installation.

(2) Step-by-Step Manual Migration

To migrate your data, you must perform a manual export and import process:

Set up the new n8n-autoscaling instance completely and ensure it's working.

Open your old n8n instance and the new one in separate browser tabs.

Credentials: Manually recreate all of your credentials (API keys, database connections, etc.) in the new instance's Credentials section. For security reasons, credentials cannot be easily exported.

Workflows: For each workflow, open it in the old instance, select all nodes, and copy them (as JSON). You can also use the "View JSON" button or equivalent export option. Paste the copied JSON into a new, blank workflow in the new instance.

d. Platform-Specific Deployment Guidance

i. ARM Architecture (Raspberry Pi, Apple Silicon)

As mentioned in the prerequisites, deploying on ARM requires manual intervention. When building the Docker image on an ARM host, you must use docker buildx to specify the target platform:

docker buildx build --platform linux/arm64 -t your-custom-tag --load .

You may also need to modify the Dockerfile to install an ARM-compatible version of the chromium browser. The original author has referenced a separate repository specifically for this purpose: https://github.com/conor-is-my-name/arm-puppeteer-chrome.

ii. Cloud VM Deployment (GCP, Hetzner, etc.)

Deploying this stack on a cloud VM from providers like Google Cloud Platform, Hetzner, or AWS is straightforward and follows the same installation steps. The key difference from other deployment methods is that you must open inbound firewall ports 80 and 443 in your cloud provider's firewall console (often called "Security Groups" or "Firewall Rules"). Caddy requires this access to obtain SSL certificates and serve traffic to the public internet. Ensure your firewall rules allow traffic from any source (0.0.0.0/0) on TCP ports 80 and 443.

8. Operations and Maintenance

a. Updating the n8n-autoscaling Stack

Keeping your deployment up-to-date with the latest features and security patches from the source repository is a straightforward process.

i. Standard Update Procedure

(1) Pull, Down, Rebuild, Up

The standard update procedure involves four commands. Execute them from within the n8n-autoscaling directory on your server. This process will cause a brief period of downtime while the services are being rebuilt and restarted.

Pull the latest code changes from the GitHub repository:

git pull

Stop and remove the current running containers:

docker compose down

Rebuild the Docker images. The --no-cache flag is crucial—it ensures that the latest base images are pulled and all installation steps in the Dockerfile are re-run, incorporating any new dependencies or changes:

docker compose build --no-cache

Start the new services in detached mode:

docker compose up -d

ii. Tips for Reducing Downtime

(1) Advanced Upgrade Strategies

For environments where downtime must be minimized, consider these strategies:

Temporarily Increase Minimum Workers: Before starting the upgrade, edit your .env file and temporarily increase the MIN_REPLICAS value. This helps ensure more workers stay online to process the queue while other services are being updated.

Phased Service Upgrades: If an update only affects the workers (adding a new Node.js dependency, for example), you can choose to rebuild and restart only the worker service. This keeps the main UI and webhook ingestion services online.

b. Backup and Restore Procedures

The PostgreSQL database is the authoritative source for all your workflows and credentials, making regular backups essential.

i. Backing Up the PostgreSQL Database

(1) Database Dump Command

You can create a compressed backup of your n8n database with a single command. This command executes pg_dump inside the running postgres container and copies the output to a file on your host machine:

# This creates a compressed dump file named backup-YYYY-MM-DD.dump in your current directory
docker compose exec -T postgres pg_dump -U postgres -d n8n -F c -f /var/lib/postgresql/data/backup.dump
docker compose cp postgres:/var/lib/postgresql/data/backup.dump ./backup-$(date +%F).dump

You should automate this command to run regularly and store the backup files in a secure, off-site location.

ii. Backing Up n8n Data Volumes

If you store binary files or other important assets using the n8n data folder, you should also back up the persistent Docker volumes.

(1) Volume Archive Command

These commands launch a temporary Alpine container to access the named volumes and create a compressed tarball of their contents:

# Back up the main n8n data volume
docker run --rm -v n8n-autoscaling_n8n_main:/src -v $(pwd):/dest alpine tar -czf /dest/n8n_main.tar.gz -C /src .

# Back up the webhook data volume
docker run --rm -v n8n-autoscaling_n8n_webhook:/src -v $(pwd):/dest alpine tar -czf /dest/n8n_webhook.tar.gz -C /src .

(Note: The volume prefix n8n-autoscaling_ is derived from your COMPOSE_PROJECT_NAME. If you changed it, adjust the command accordingly.)

iii. Restoring the Database

(1) Database Restore Command

To restore your database from a backup file, first copy the .dump file into the running postgres container, then execute pg_restore:

# First, copy the backup file into the container
docker compose cp ./backup-file.dump postgres:/var/lib/postgresql/data/backup.dump

# Then, execute the restore command
docker compose exec -T postgres pg_restore -U postgres -d n8n --clean --if-exists /var/lib/postgresql/data/backup.dump

c. Monitoring the Live System

Regularly monitoring the health and performance of your stack is key to maintaining a reliable service.

i. Key Monitoring Commands

(1) Consolidated Command List

Here's a consolidated list of commands for observing the system's state:

View Autoscaler Logs (Most Important for Scaling): To see real-time scaling decisions:

docker compose logs -f n8n-autoscaler

View Queue Monitor Logs: For a simple, periodic printout of the queue length:

docker compose logs -f redis-monitor

Check Overall Service Health: To see the status of all running containers:

docker compose ps

View Logs for a Specific Service: (like n8n, n8n-worker, postgres):

docker compose logs -f n8n

Check Number of Running Workers:

docker compose ps n8n-worker

Check Redis Health: The PING command should return PONG:

docker compose exec redis redis-cli PING

List All n8n-related Keys in Redis:

docker compose exec redis redis-cli KEYS "bull:*"

Check Current Queue Length Directly:

# Check the primary waiting list (BullMQ v3)
docker compose exec redis redis-cli LLEN bull:jobs:wait

# Check the secondary waiting list (BullMQ v4+)
docker compose exec redis redis-cli LLEN bull:jobs:waiting

9. In-Depth Component and Code Analysis

a. `docker-compose.yml` Explained

This file is the blueprint that defines all the services, networks, and volumes for the entire application stack.

i. Core Concepts

(1) Anchors, Networks, Volumes, and Healthchecks

YAML Anchors: The file uses a YAML anchor (x-n8n: &service-n8n) to define a common template for all n8n-based services (n8n, n8n-webhook, n8n-worker). This reduces configuration duplication and makes the file easier to maintain.

Networks: It defines the internal n8n-network for private communication between services and declares the shark network as external, allowing it to be shared with other Docker projects.

Volumes: It defines named volumes like postgres_data and redis_data to ensure critical data persists even if the containers are removed or recreated.

Healthchecks: The redis and postgres services include healthcheck directives. These tell Docker Compose to periodically check if the databases are responsive. Services that depend on them will wait for the healthcheck to pass before starting, ensuring a correct startup order.

ii. Service-Specific Configurations

(1) Command Overrides and Socket Mounting

Caddy Service: The added caddy service is configured to expose ports 80 and 443 to the host server, allowing it to receive public internet traffic. It mounts the Caddyfile from your host into the container so it knows how to route traffic, and uses named volumes (caddy_data, caddy_config) to persist SSL certificates and other configuration.

Command Overrides: The n8n-webhook and n8n-worker services inherit the base configuration from the YAML anchor but override the command. Instead of the default n8n start, they execute sh /webhook and sh /worker, respectively. These small scripts simply run the specialized n8n webhook and n8n worker commands, starting n8n in headless modes dedicated to their specific tasks.

Docker Socket Mounting: The n8n-autoscaler service configuration includes a critical volume mount: /var/run/docker.sock:/var/run/docker.sock. This gives the autoscaler container direct, privileged access to the host machine's Docker daemon. This access is necessary for it to query the status of other containers and issue commands to scale the worker service.

b. Main `Dockerfile` Explained

This file contains the instructions for building the custom n8n image that includes all necessary dependencies.

i. Build Stages and Dependencies

(1) System Packages and Global NPM Installs

The Dockerfile starts from an official node:20 base image. It then runs a long apt-get install command to install all the system-level binary dependencies required by various n8n nodes. This includes libraries for rendering and browser automation (chromium, required by Puppeteer), video/audio processing (ffmpeg), image manipulation (graphicsmagick), JSON processing (jq), and version control (git, openssh-client). Finally, it uses npm install -g to install the n8n and puppeteer libraries globally within the image, making them available to all workflows.

ii. Environment Variables and Scripts

(1) Key ENV Directives and Helper Scripts

The Dockerfile sets several important environment variables. The most significant is NODE_FUNCTION_ALLOW_EXTERNAL, which provides a security whitelist of binary packages (like puppeteer and ffmpeg) that are allowed to be executed from within a Code node. It also contains a clever printf command that creates two small, executable helper scripts—/worker and /webhook—which are used by the docker-compose.yml file to start the containers in the correct specialized mode.

c. `autoscaler/autoscaler.py` Explained

This Python script is the heart of the scaling logic.

i. Core Logic and Functions

(1) get_queue_length()

This function connects to Redis and counts the number of jobs waiting to be processed. It's designed to be resilient and future-proof by intelligently checking for multiple Redis key patterns that different versions of the BullMQ library use for the waiting queue. It first tries bull:jobs:wait, then bull:jobs:waiting, and finally a legacy pattern, ensuring it works across n8n updates.

(2) get_current_replicas()

This function accurately counts the number of currently running worker containers. It communicates with the Docker daemon and lists all containers, but it uses specific labels to filter the results. It looks for containers that have both the project label (com.docker.compose.project=n8n-autoscaling) and the service label (com.docker.compose.service=n8n-worker). This precise filtering ensures it only counts the workers for this specific project, even if other Docker containers are running on the same host.

(3) scale_service()

This function executes the scaling action. Instead of using a Docker library within Python, it constructs and executes a standard command-line call using Python's subprocess module. The command it runs is docker compose ... up -d --no-deps --scale n8n-worker=<replicas>, delegating the scaling action directly to the Docker Compose tool. This is a reliable approach that ensures containers are created and destroyed correctly according to the compose file's definition.

ii. Main Loop and Cooldown

(1) The Control Loop

The script's logic is contained within an infinite main() loop. On each iteration, it orchestrates the process: it gets the current queue length and replica count, compares them against the configured thresholds, and calls the scale_service function if a change is needed. Crucially, it respects the COOLDOWN_PERIOD_SECONDS by pausing after any scaling action, which prevents "flapping"—rapidly scaling up and down in response to small fluctuations in workload. It scales by one instance at a time to avoid erratic behavior.

10. Comprehensive Troubleshooting Runbook

a. UI and Access Issues

i. Symptom: "Instance owner already setup" Loop

Cause: This is almost always caused by Cloudflare caching the initial setup page.

Fix: Go to your Cloudflare dashboard for the domain. Under Caching -> Configuration, click Purge Everything. To prevent it from happening again, create a Page Rule or Cache Rule for the hostname n8n.<your-domain>/* and set the caching level to Bypass. Clear your browser cache.

ii. Symptom: Cloudflare Access Breaks Test Webhooks

Cause: The Cloudflare Access policy is protecting the entire n8n.<your-domain> hostname, which intercepts the internal calls made by the "Test workflow" button.

Fix: Edit your Cloudflare Access application policy. Instead of protecting the entire hostname, configure it to protect only the specific UI paths: /signin and /home. This secures the user-facing pages while leaving the API and test webhook paths open. Production webhooks on webhook.<your-domain> are unaffected as they're on a different hostname.

b. Connectivity and Networking Issues

i. Symptom: Blank Pages, 502 Errors, or SSL Errors

Cause: There's a problem with the Caddy reverse proxy configuration or DNS.

Fix:

Check Caddy Logs: Run docker compose logs caddy. Look for any errors related to obtaining SSL certificates—things like "rate limited" or "DNS problem". This can happen if your DNS records aren't yet propagated.

Check DNS: Use a tool like whatsmydns.net to confirm that n8n.yourdomain.com and webhook.yourdomain.com are both pointing to your server's IP address.

Check Firewall: Double-check that ports 80 and 443 are open on your server's firewall and in your cloud provider's security group settings. Caddy cannot work without them.

Check Caddyfile: Verify that the domain names in your Caddyfile exactly match the hostnames you configured in the .env file and Cloudflare DNS, and that the service names (n8n, n8n-webhook) are correct.

ii. Symptom: Port Binding Conflict ("address already in use")

Cause: When running docker compose up, you get an error because another process (like Apache or Nginx) or a leftover container is already using port 80 or 443 on the host machine.

Fix: Find and stop the conflicting process. On Linux, you can use these commands to find the process ID:

# Find process using port 80
sudo lsof -iTCP:80 -sTCP:LISTEN
# Find process using port 443
sudo lsof -iTCP:443 -sTCP:LISTEN

Stop the identified process or service—for example, sudo systemctl stop apache2.

c. Autoscaling Malfunctions

i. Symptom: Autoscaler Does Not Change Worker Count

Checklist for Diagnosis:

Check Logs: The first step is always docker compose logs -f n8n-autoscaler. Look for any error messages.

Project Name: Is the COMPOSE_PROJECT_NAME in your .env file (n8n-autoscaling) exactly matching the project name Docker is using?

Docker Socket: Is the autoscaler container running? Does it have access to /var/run/docker.sock?

Queue Length: Is the queue length actually increasing above your SCALE_UP_QUEUE_THRESHOLD? Verify with docker compose exec redis redis-cli LLEN bull:jobs:wait.

Service Name: Is the N8N_WORKER_SERVICE_NAME in your .env file set exactly to n8n-worker?

ii. Symptom: Redis Connection Errors in Logs

Cause: The autoscaler, monitor, or n8n services cannot connect to the Redis container.

Fix: Verify that the Redis container is healthy with docker compose ps. Then, test connectivity directly with docker compose exec redis redis-cli PING. It should return PONG. Ensure REDIS_HOST=redis is set correctly in your .env file.

d. Platform and Build Issues

i. Symptom: Puppeteer or Chromium Fails on ARM

Cause: The x86/amd64 version of Chromium was installed in the Docker image, which is incompatible with the ARM host CPU.

Fix: You must build the image specifically for the ARM platform using docker buildx build --platform linux/arm64 .... You may also need to modify the Dockerfile to install an ARM-specific version of the chromium package.

ii. Symptom: `Error: Command "start" not found` in n8n logs

Cause: The global npm install -g n8n command failed during the Docker build, or the system PATH is misconfigured inside the container.

Fix: This usually indicates a problem with the Docker build process. Ensure the Dockerfile hasn't been modified in a way that breaks the npm install or ENV PATH lines. Rebuild the image from scratch to ensure all layers are correct: docker compose build --no-cache n8n and then restart with docker compose up -d.

11. Appendices

a. Command-Line Cheat Sheet

i. Quick Reference for Common Operations

(1) Start, Stop, Logs, and Scaling

Bring up all services (detached mode):

docker compose up -d

Stop and remove all services:

docker compose down

View logs for all services (live):

docker compose logs -f

View logs for a single service (live):

docker compose logs -f n8n-autoscaler

Manually scale workers (autoscaler will later reconcile this):

docker compose up -d --no-deps --scale n8n-worker=3 n8n-worker

Check worker replica count:

docker compose ps n8n-worker

Check Redis queue length:

docker compose exec redis redis-cli LLEN bull:jobs:wait
docker compose exec redis redis-cli LLEN bull:jobs:waiting

Check health of all containers:

docker compose ps

Update the repository and rebuild all images:

git pull
docker compose down
docker compose build --no-cache
docker compose up -d

b. Minimal Validated `.env` Template

i. Copy-Pasteable Configuration Template

(1) Ready-to-Use Environment File

Use this template as a starting point. Replace every placeholder <...> with your real, unique values:

## Autoscaling
COMPOSE_PROJECT_NAME=n8n-autoscaling
COMPOSE_FILE_PATH=/app/docker-compose.yml
GENERIC_TIMEZONE=America/Los_Angeles
MIN_REPLICAS=1
MAX_REPLICAS=5
SCALE_UP_QUEUE_THRESHOLD=5
SCALE_DOWN_QUEUE_THRESHOLD=1
POLLING_INTERVAL_SECONDS=10
COOLDOWN_PERIOD_SECONDS=10
POLL_INTERVAL_SECONDS=5
N8N_QUEUE_BULL_GRACEFULSHUTDOWNTIMEOUT=300
N8N_GRACEFUL_SHUTDOWN_TIMEOUT=300

## Redis
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=
QUEUE_NAME_PREFIX=bull
QUEUE_NAME=jobs
QUEUE_BULL_REDIS_HOST=redis
QUEUE_HEALTH_CHECK_ACTIVE=true

## Postgres
POSTGRES_HOST=postgres
POSTGRES_DB=n8n
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<strong-db-password>
PGDATA=/var/lib/postgresql/data/pgdata
DB_TYPE=postgresdb

## N8N
N8N_HOST=n8n.<your-domain>
N8N_WEBHOOK=webhook.<your-domain>
N8N_WEBHOOK_URL=https://webhook.<your-domain>
WEBHOOK_URL=https://webhook.<your-domain>
N8N_EDITOR_BASE_URL=https://n8n.<your-domain>
N8N_PROTOCOL=https
N8N_PORT=5678
N8N_DIAGNOSTICS_ENABLED=false
N8N_USER_FOLDER=/n8n/main
N8N_SECURE_COOKIE=false
N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=false
N8N_ENCRYPTION_KEY=<32-char-random-secret>
N8N_USER_MANAGEMENT_JWT_SECRET=<strong-jwt-secret>
N8N_WORKER_SERVICE_NAME=n8n-worker
EXECUTIONS_MODE=queue
OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true
N8N_TASK_BROKER_URL=http://n8n:5679
N8N_COMMAND_RESPONSE_URL=http://n8n:5679
N8N_TASK_BROKER_PORT=5679
N8N_RUNNERS_AUTH_TOKEN=<strong-token>
NODE_FUNCTION_ALLOW_EXTERNAL=ajv,ajv-formats,puppeteer,ffmpeg,git,graphicsmagick,openssh-client
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

c. Final Validation and URL Formats

i. Production Webhook URL Format

(1) Example URL Structure

Once your system is running, the webhook URLs you use for external integrations—the "Production URL" in a webhook node—will follow this format: https://webhook.<your-domain>/webhook/<workflow-webhook-id>

It's critical to use this URL for all production workflows. The "Test URL" should only be used for debugging within the editor. If you implement Cloudflare Access, ensure that no access policies are applied to the webhook.<your-domain> hostname to guarantee reliable webhook ingestion.

ii. Final System Validation Checklist

(1) Go-Live Sanity Check

Before moving your instance into production, perform this final validation checklist:

[ ] You can access and sign in to the n8n UI at https://n8n.<your-domain>
[ ] A test request sent to a production webhook URL at https://webhook.<your-domain> is successfully received and executed
[ ] The command docker compose ps shows all services as running or healthy
[ ] Running concurrent requests causes the Redis queue length (LLEN bull:jobs:wait or :waiting) to increase
[ ] The n8n-autoscaler logs show correct queue length readings and make appropriate scale-up/scale-down decisions
[ ] Your server's firewall correctly blocks all ports except 22 (SSH), 80, and 443

If you can check all these boxes, you have a secure, scalable, and fully operational n8n environment ready for production workloads. By following this comprehensive guide, you've deployed an n8n environment that grows and shrinks its worker capacity to match your queue in real time, providing an efficient and robust automation platform. I recommend monitoring the official GitHub repository for future updates and improvements. You've built something remarkable here—enjoy the power of truly scalable automation!