Server Alerts Going Crazy — AI Fixed It in 8 Minutes, No Code

เราเล่าจากการทดลองจริงในแล็บ ไม่ใช่ทฤษฎี — และให้หลักฐานพูดแทน

จุดเจ็บ: ระบบมักพังตอนที่ไม่มีใครดู กว่าจะรู้ตัวก็เสียหายไปแล้ว

เดิมพัน: ระบบล่มเงียบ ๆ คือฝันร้ายของทุกทีม เพราะความเสียหายโตขึ้นทุกนาทีที่ไม่รู้

สิ่งที่เราทำในแล็บ: Woke up to alerts flooding 3 channels — server overload, 5 broken workflows, 20 containers fighting for resources. AI diagnosed, analyzed, and fixed everything in 8 minutes without writing a single line of code.

Best for: Business owners / Team leads / Managers who manage servers but aren't developers | Read time: 8 min | Level: Beginner to Intermediate | Category: Behind-the-Scenes

"6 AM. Phone buzzing non-stop. Notifications flooding 3 channels — Bot Admin, Bot Dev, Bot Management, all going off at once."

Normally this would cause a mini heart attack. But not today — because AI had it covered.

March 8, 2026, 6 AM — phone screen lit up with a barrage of messages from OpenClaw's system Bot, firing alerts one after another.

"Workflow Error Alert: Node 'Check n8n' hasn't been executed"

"n8n system error — API unreachable"

Total active workflows: zero. The message basically said the server might be down.

So what would most people do? Call a developer? Wait for the IT team to clock in? Sit there refreshing the page over and over?

Nope. Opened Cursor and told AI to investigate.

20 containers on server

5 broken workflows

330 alerts cleared

8 min total fix time

What's Actually Running on This Server?

This server runs multiple systems simultaneously — OpenClaw (automated workflow system), Godseye (AI Trading), Content Thailand website, EA-AI system, BOI mainweb, and more. That's 20 Docker containers on a 2-Core CPU server with 15GB RAM.

Sounds like a lot? It absolutely is. But it had been running smoothly all along — until today.

What the Bot reported across 3 channels simultaneously:

Bot Admin — "Container Health Check failed"
Bot Dev — "Workflow #29, #45 error repeating every 6 hours"
Bot Management — "n8n API unresponsive, check immediately"

Told AI to Investigate — What Did It Find?

The instruction to Cursor AI was simple: "Check the actual server status — is there a real problem or is this just fallout from an n8n restart?"

AI SSH'd into the server and pulled CPU, RAM, Disk, and Docker container data within 10 seconds.

The results — not great:

Metric	Normal Range	Actual Value	Status
CPU Load	Below 2.0	6.38	3x over limit
RAM	Under 80%	9.7GB / 15GB	65% + Swap nearly full
Swap	Under 50%	3.7GB / 3.8GB	97% — nearly maxed out!
Disk	Under 80%	162GB / 194GB	88%

Swap at 97% means the server was barely responding — it was constantly reading/writing to disk instead of RAM. Imagine opening 20 programs on your computer at once when you don't have enough RAM.

6 AM — server alerts flooding in, AI analyzed and fixed everything in 8 minutes

Take a breath — before the deep dive

How Did AI Track Down the Troublemaker Containers?

AI didn't just say "server is overloaded" and call it a day — it analyzed every single container and broke down the findings clearly.

Container	Problem	Impact
godseye-timescaledb	CPU 52% + RAM 85%	Resource hog #1
lark-mcp	unhealthy + RAM 71%	Eating 1GB RAM but 0% CPU
contentthailand-nginx	Restart loop	Wasting CPU every second
duplicati	CPU 26%	Backup running
egp-solver	unhealthy	Not eating resources but broken

Key insight from AI: "contentthailand-nginx-1 is stuck in a restart loop because it can't find upstream web:3000 — it keeps restarting itself every few seconds, burning CPU for nothing." — This is something you'd never catch just eyeballing the Docker dashboard.

Why Wasn't Shutting Down Containers an Option?

AI's first suggestion was "shut down unnecessary containers" and displayed all 20 sorted into groups, asking "which ones can be turned off?"

The answer: "All 5 groups are in use. Can't shut any of them down."

The remaining option? Upgrade the server — which means spending money.

Option A: One Beefy Server

8 Core CPU, 32GB RAM, 500GB SSD
Cost ~$110-170/month
Simple, but if it goes down = everything goes down

Option B: Split Into 2 Servers ✅

Server 1: OpenClaw + n8n (4 Core, 16GB)
Server 2: Godseye + other websites (4 Core, 16GB)
Cost ~$110-170/month (combined)
More stable — one goes down, the other keeps running

But AI didn't stop there — it said "While waiting for the new server, here are 3 things that can be done right now — no cost, no service downtime."

This is where AI really shines — it doesn't just say "buy a new server" and move on. It finds immediate workarounds that can be done on the spot.

No code written — AI generated the commands, just copy-paste

⚡

8 Minutes. 4 Problems. All Fixed.

AI diagnosed and resolved what would take a human team hours

What Did AI Fix in Just 8 Minutes?

Minute 0-1 ⏱️

AI SSH'd into server, pulled full diagnostics

Checked CPU, RAM, Disk, Docker stats for all 20 containers simultaneously — results in 10 seconds.

Minute 1-2 ⏱️

Analyzed the problem + proposed solutions

AI grouped 20 containers into 5 categories, pinpointed the troublemakers, and proposed 3 immediate fixes.

Minute 2-4 ⏱️

Applied 3 fixes simultaneously

1) Stopped the nginx restart loop
2) Paused duplicati backup temporarily
3) Throttled timescaledb CPU from 0.5 → 0.3 → 0.15 cores

Minute 4-5 ⏱️

Capped lark-mcp RAM usage

Reduced from 1.5GB → 768MB since it was hogging RAM but using 0% CPU — freed up RAM for other containers.

Minute 5-7 ⏱️

Fixed 5 buggy workflows

Workflow #13 SSL Check, #14 API Health, #15 Web Health, #16 Third Party, #17 Morning Leave — all had the same bug: no guard to check for empty values → error → repeated alert spam.

Minute 7-8 ⏱️

Cleared 330 old alerts + verified results

Deleted old alert messages flooding the Bot Dev channel across 3 batches — 330 messages total. Then verified server load had actually dropped.

What Were the Results Before vs After?

Metric	Before Fix	After Fix	Change
CPU Load	6.38	3.5	Down 45%
Free RAM	622MB	1.5GB	Up 140%
timescaledb CPU	52%	15%	Down 71%
lark-mcp RAM	1GB	513MB	Down 49%
Broken Workflows	5	0	100% fixed
Pending Alerts	330 messages	0	All cleared

Server back to normal — every container still running, nothing shut down. Just smarter resource management.

Lesson: AI excels at infrastructure because it sees the full picture instantly

Why Does AI Outperform Devs at This Specific Job?

Not saying AI is better than developers at everything. But when it comes to server diagnostics + troubleshooting under time pressure — AI wins hands down.

Without AI

Call a dev at 6 AM — probably still asleep
Dev SSH's in, starts looking around = 10-15 min
Analyze 20 containers = 30+ min
Fix things one by one, trial and error = 1-2 hours
Total: 2-3 hours (if the dev even answers)

With AI ✅

Tell AI to check = instant
AI pulls all diagnostics = 10 seconds
Analyze 20 containers = 30 seconds
Propose fixes + execute = 5 minutes
Total: 8 minutes from eyes open to done

What AI does that humans struggle with: It can scan for repeated patterns across every workflow in one pass. The first fix only covered 2 workflows that showed up in alerts — but AI flagged "hold on, there are 3 more with the exact same bug" and fixed all 5. A human would typically wait for those other 3 to break before discovering the issue.

After it was all done, the message to the team was:

"This is the future — the playbook for a super team's post-sales support."

One person plus AI = the output of a 3-4 person team. Problem solved before the rest of the team even woke up.

Want to Do This?

If you manage a server or have systems running in production — AI can help monitor and fix issues right now. No dev skills required.

Set Up a Monitoring Bot ⏱️ 30 min

Use n8n (free, self-hosted) to create a workflow that checks server health every 6 hours. Send alerts via LINE/Slack/Lark when something goes wrong.

Give AI SSH Access to Your Server ⏱️ 10 min

Set up SSH keys so Cursor / Claude Code can access your server. This lets AI investigate and fix issues on the spot.

Tell AI What's Wrong in Plain English ⏱️ 1 min

No need to memorize commands — just describe what happened and let AI handle the rest.

Ready-to-use prompt — just paste into Cursor

Got a bot alert saying the server has issues {{paste screenshot or alert message}}

Check the actual server status:
- CPU, RAM, Disk usage
- All Docker containers (status + resource usage)
- If there's a problem, suggest a fix with reasoning
- If it looks good, go ahead and fix it

Server: ssh {{user}}@{{ip}}

Prompt for fixing repeated workflow bugs

Scan all n8n workflows for the same pattern as the bug just fixed
- Check which workflows have the same issue
- Fix all of them, not just the one that errored
- Summarize what was changed, before/after

Token-saving tip: Paste a screenshot of the alert directly instead of typing out the description — AI can read images. Saves both time and tokens.

Why Does This Matter?

No dev background. Never written a Docker command manually. Didn't even know what timescaledb was before this.

But a server crisis with 20 containers got resolved in 8 minutes — because AI knew all that stuff instead.

This is what's changed — you don't have to "know" everything. You just have to "ask" the right way.

And Cursor makes asking as easy as typing a chat message.

Today the server is running smoothly. The team didn't have to wake up to fix anything. Clients didn't even know something went wrong.

8 minutes. From crisis to normal.

Want to Start Using AI for Server Management?

Check out the complete Cursor guide — let AI build everything, no coding required.

Start from zero. No dev experience needed.

What Are the Most Common Questions?

Can AI actually SSH into a server? Is that safe?

Cursor AI uses the SSH keys already set up on your machine. It doesn't store passwords in the cloud — it runs locally on your computer, just like you typing commands yourself. Every command AI wants to run gets shown first. You hit approve before anything executes.

How much does Cursor + AI cost?

Cursor Pro is $20/month. That gets you unlimited Claude Sonnet plus Opus for complex tasks. Compare that to hiring a dev for a one-time server fix at $50-150 — it pays for itself in the first month.

What if AI makes a mistake? Could it break the server?

Every command AI proposes needs your approval before it runs. If something looks sketchy, just ask AI "what would this command actually do?" — always review before approving. Nothing runs without explicit permission.

Can a 2-Core server really handle 20 containers?

Yes, but only with proper resource limits — set CPU and memory limits for each container via Docker. Without limits, containers fight for resources until something crashes. Long-term, splitting servers by workload is the way to go.

Can ChatGPT do this instead of Cursor?

ChatGPT can tell you how to fix things, but it can't actually do it — you'd have to copy commands and paste them yourself. Cursor has a built-in terminal with SSH, so it executes commands directly. One window, ask and done.

Last updated: March 8, 2026 | Written by: Hi Logic Labs | Tools used: Cursor + Claude Opus

#AI #ServerManagement #Cursor #Docker #n8n #VibeCoding #AIForBusiness #BehindTheScenes #Hi Logic Labs #NonDeveloper #AIOperations

สิ่งที่ได้ และหลักคิด

ของจริงที่เอาไปใช้ต่อได้ ไม่ใช่แค่ไอเดีย หลักคิดของเราคือทำให้เป็นระบบที่ทำซ้ำได้และไม่พึ่งความจำคน

อยากเห็นระบบแบบนี้ทำงานกับงานของคุณ — ดู ViberQC และลงชื่อรอรอบทดลองที่ hilogiclabs.com

6 AM Server Alerts Going Crazy — AI Fixed Everything in 8 Minutes, No Code Written