Unexpected Instance Restarts and Monitoring Gap

Michal_Szymanowski · September 16, 2025, 11:26am

Hello, I migrated my backend to Koyeb two days ago. Today, I noticed some suspicious behavior. Could you please help me figure out what might be causing it? The instances suddenly started restarting for no apparent reason. I can’t see any errors or CPU load in the logs. Additionally, the metrics show a gap in the CPU, even though the instances were running at that time. Could you please take a look at this? I’ve attached screenshots.

Lukasz_Oles · September 16, 2025, 11:37am

Hello @Michal_Szymanowski , It seems that for some reason your post was marked as a spam and I’ve just noticed.
Is this still an issue?

Michal_Szymanowski · September 16, 2025, 1:05pm

Hi @Lukasz_Oles

Thanks for reaching out!

So the situation is a bit mixed. After I posted this, Bastien reached out to me in a private chat and explained that those initial restarts in my screenshots were actually caused by a platform incident (https://status.koyeb.com/cmfb1f2iw000xi0nw0gst3v7m). That part makes sense and was resolved.

But here’s the thing, I’m still seeing instability. It’s really puzzling. Everything ran perfectly for 5 days straight, zero issues, and then today I get hit with multiple health check failures and restarts again. The backend load today hasn’t changed at all - actually yesterday it was even higher and there were no problems whatsoever.

My app is a high availability API that uses Chrome headless through Playwright. The health checks are super simple - just launch Chrome and close it. I recently migrated from traditional containers where this setup didn’t have these kinds of issues, so I’m still getting used to the microVM architecture and wondering if Chrome headless might need some specific tuning for this environment.

I’ve tried scaling up to bigger instances thinking it might be a resource thing, but that didn’t help. Looking at the metrics, CPU and memory usage look fine - doesn’t seem like a leak or resource exhaustion. I’m currently running in Frankfurt.

I’m really curious if this is just how Chrome headless behaves on microVMs and if there are any specific optimizations I should consider. Has anyone else run into similar issues with Playwright workloads? Any tips or best practices from the team?

I’m experimenting with different regions and Playwright configs, but honestly the inconsistent pattern makes me think I need to better understand how to optimize for this architecture.

Any insights would be super helpful!

Topic		Replies	Views
Cannot deploy - health check is failing Troubleshooting and help deployments	5	412	September 8, 2024
The worker just restarts for no reason randomly Troubleshooting and help deployments	1	44	May 5, 2025
Health check failing for application startup Troubleshooting and help deployments	9	539	October 14, 2024
Deployment failed silently Troubleshooting and help help , deployments	10	45	December 10, 2025
Koyeb web serivce unhealthy Troubleshooting and help help	2	29	March 17, 2025

Unexpected Instance Restarts and Monitoring Gap

Related topics