# Running 30+ Crons on One 8GB VPS, No OOM When you're building a data-intensive platform like IntellDirectories, scheduled tasks are the bedrock of data freshness and integrity. We run dozens of daily background jobs: data imports, index rebuilds, cache purges, notification deliveries, SEO sitemap generation, and more.
For years, like many startups, we just threw them into `crontab`. `0 0 * * * /usr/local/bin/python3 /app/jobs/daily_data_sync.py` `30 0 * * * /usr/local/bin/python3 /app/jobs/rebuild_search_index.py` `0 1 * * * /usr/local/bin/python3 /app/jobs/purge_old_cache.py` You get the picture. Simple, effective, until it isn't. As IntellDirectories grew, so did our cron count.
Suddenly, a small 8GB VPS was struggling. At 00:30 UTC, when several resource-hungry jobs were scheduled to kick off simultaneously, we'd see RAM spikes from 2GB to 7GB, sometimes hitting OOM (Out Of Memory) and killing critical processes. It was a ticking time bomb, and a frustrating one to debug. We needed a better way to orchestrate these tasks without scaling up our infrastructure prematurely.
This isn't about throwing Kubernetes at every problem. This is about practical, cost-effective engineering for a growing SaaS business. ## The Problem with Simultaneous Cron Execution The fundamental issue with multiple `crontab` entries firing around the same time is resource contention.
Each Python script, for example, might initialize its own database connections, load libraries, and allocate memory. If you have 5-10 such scripts, each potentially consuming 100-200MB of RAM during its peak, you're looking at 1-2GB of RAM usage just for cron jobs, *on top* of your application server, database, and other services. Our old setup was exactly this: 30+ distinct `crontab` entries.
Some ran hourly, most daily, a few weekly. The daily ones, in particular, often clustered in the early hours of the morning.
A typical sequence might be: 1. `data_import_job.py` (300MB peak RAM, 15 min runtime) 2. `index_rebuild.py` (500MB peak RAM, 30 min runtime) 3. `seo_sitemap_gen.py` (200MB peak RAM, 10 min runtime) If these were all set to run at, say, 00:30, our 8GB VPS would suddenly need 1GB of RAM *just for these three tasks* in addition to everything else.
Add a few more, and we're looking at a memory crisis. The solution isn't always to buy more RAM; it's to use what you have more intelligently. ## The Cron Orchestrator Pattern: Sequential Execution Our solution was to embrace a cron orchestrator pattern. Instead of 30+ distinct `crontab` entries, we now have just *two* critical cron entries: 1. `schedule-daily`: Runs once a day at 00:30 UTC.
Its sole job is to enqueue all daily tasks into a Redis queue. 2. `process-cron-queue`: Runs every minute. This script checks the Redis queue, picks up the next available job, acquires a lock, executes it, and marks it complete. This dramatically changes the resource profile.
Instead of N processes potentially running concurrently, we now have (at most) one `process-cron-queue` script running at any given time, executing jobs sequentially.
The RAM spike from 30 processes is gone, replaced by the memory footprint of a single orchestrator process plus the memory needed for *one* specific job at a time. ### How It Works: A Simplified Walkthrough **Step 1: The Scheduler (`schedule-daily`)** This is a simple script, typically a Python or Node.js script, that defines our daily cron manifest.
It looks something like this: ```python # /app/cron_orchestrator/schedule_daily.py import redis import json r = redis.Redis(host='localhost', port=6379, db=0) def enqueue_job(job_name, script_path, args=None): job_payload = { 'name': job_name, 'script': script_path, 'args': args if args is not None else [], 'status': 'pending' } r.lpush('cron_queue', json.dumps(job_payload)) print(f"Enqueued job: {job_name}") if __name__ == '__main__': # Clear the queue from any previous incomplete jobs (optional, for safety) # r.delete('cron_queue') # Enqueue daily jobs in desired order enqueue_job('Data Sync', '/app/jobs/daily_data_sync.py') enqueue_job('Search Index Rebuild', '/app/jobs/rebuild_search_index.py') enqueue_job('Sitemap Generation', '/app/jobs/seo_sitemap_gen.py') enqueue_job('Cache Purge', '/app/jobs/purge_old_cache.py', args=['--full']) # ... many more jobs print("All daily jobs enqueued.") ``` The `crontab` entry for this script is simple: `30 0 * * * /usr/local/bin/python3 /app/cron_orchestrator/schedule_daily.py >> /var/log/cron_scheduler.log 2>&1` **Step 2: The Processor (`process-cron-queue`)** This is the workhorse.
It continuously polls the Redis queue. When it finds a job, it acquires a distributed lock (using Redis itself), executes the job, and then releases the lock.
If a job fails, it can be marked for retry or moved to a dead-letter queue. ```python # /app/cron_orchestrator/process_cron_queue.py import redis import json import subprocess import time r = redis.Redis(host='localhost', port=6379, db=0) LOCK_KEY = 'cron_processor_lock' LOCK_TIMEOUT = 3600 # 1 hour, adjust based on max job runtime def acquire_lock(): # Try to acquire a lock.
Set if not exists (NX) with an expiry (EX) return r.set(LOCK_KEY, 'locked', nx=True, ex=LOCK_TIMEOUT) def release_lock(): r.delete(LOCK_KEY) if __name__ == '__main__': if acquire_lock(): try: job_payload_str = r.rpop('cron_queue') # Get job from the right (FIFO) if job_payload_str: job = json.loads(job_payload_str) print(f"Processing job: {job['name']}") try: cmd = ['/usr/local/bin/python3', job['script']] + job['args'] result = subprocess.run(cmd, capture_output=True, text=True, check=True) print(f"Job {job['name']} completed successfully.\nSTDOUT: {result.stdout}\nSTDERR: {result.stderr}") # Optionally, log job completion to a separate Redis list or DB except subprocess.CalledProcessError as e: print(f"Job {job['name']} failed!\nSTDOUT: {e.stdout}\nSTDERR: {e.stderr}") # Re-enqueue or move to dead-letter queue except Exception as e: print(f"An unexpected error occurred for {job['name']}: {e}") else: print("No jobs in queue.") finally: release_lock() else: print("Processor already running or lock not acquired.") ``` The `crontab` entry for the processor: `* * * * * /usr/local/bin/python3 /app/cron_orchestrator/process_cron_queue.py >> /var/log/cron_processor.log 2>&1` This `process-cron-queue` runs every minute.
If it acquires the lock, it processes *one* job. If a job takes longer than a minute, the next minute's `process-cron-queue` invocation will find the lock already held and simply exit, waiting for the previous job to finish. This ensures true sequential execution and prevents resource contention. ## The Payoff: RAM, Cost, and Sanity Savings The impact on our 8GB VPS was immediate and profound.
Instead of peak RAM hitting 7GB+ and triggering OOM, our peak usage now rarely exceeds 4GB during cron execution windows. This is because only one resource-intensive job runs at a time. The `process-cron-queue` script itself is lightweight, typically under 50MB of RAM. **Specifics:** * **RAM Savings:** We reduced peak RAM usage during cron windows by an estimated 3-4GB.
This means our 8GB VPS (costing ~$40/month) is now comfortably handling our load, where before we were teetering on the edge of needing a 16GB VPS (costing ~$80/month or more). That's a direct operational saving of at least $480/year. * **Cost Savings:** No need to upgrade our VPS for cron-related memory spikes.
We avoid the complexity and cost of distributed cron solutions like Airflow or custom Kubernetes operators for this specific problem, which would be overkill for our current scale and add significant operational overhead. * **Reliability:** No more OOM kills. Jobs complete reliably, one after another.
If a job fails, we get a clear log entry, and it doesn't block other jobs indefinitely (unless designed to). * **Operational Simplicity:** Debugging is easier. If a job fails, it's isolated.
We can easily see what's in the queue or what's currently running by inspecting Redis. * **Contrarian Point:** In an era where every problem seems to warrant a distributed system, this approach demonstrates that for many common operational tasks, a single, well-orchestrated machine is often the most efficient and robust solution.
Over-engineering with microservices or complex orchestrators for simple cron jobs can introduce more points of failure and higher operational costs than the problem it solves. This pattern has been critical for IntellDirectories' data consistency and operational stability.
It allows us to grow our scheduled tasks without growing our infrastructure linearly, keeping our costs lean and our systems robust. If you're building a data-heavy application and finding your cron jobs are becoming a bottleneck on a single server, consider this simple yet powerful orchestration pattern. [List your business free →](/list-business)