Hello, I migrated my backend to Koyeb two days ago. Today, I noticed some suspicious behavior. Could you please help me figure out what might be causing it? The instances suddenly started restarting for no apparent reason. I can’t see any errors or CPU load in the logs. Additionally, the metrics show a gap in the CPU, even though the instances were running at that time. Could you please take a look at this? I’ve attached screenshots.
Hello @Michal_Szymanowski , It seems that for some reason your post was marked as a spam and I’ve just noticed.
Is this still an issue?
Hi @Lukasz_Oles
Thanks for reaching out!
So the situation is a bit mixed. After I posted this, Bastien reached out to me in a private chat and explained that those initial restarts in my screenshots were actually caused by a platform incident (https://status.koyeb.com/cmfb1f2iw000xi0nw0gst3v7m). That part makes sense and was resolved.
But here’s the thing, I’m still seeing instability. It’s really puzzling. Everything ran perfectly for 5 days straight, zero issues, and then today I get hit with multiple health check failures and restarts again. The backend load today hasn’t changed at all - actually yesterday it was even higher and there were no problems whatsoever.
My app is a high availability API that uses Chrome headless through Playwright. The health checks are super simple - just launch Chrome and close it. I recently migrated from traditional containers where this setup didn’t have these kinds of issues, so I’m still getting used to the microVM architecture and wondering if Chrome headless might need some specific tuning for this environment.
I’ve tried scaling up to bigger instances thinking it might be a resource thing, but that didn’t help. Looking at the metrics, CPU and memory usage look fine - doesn’t seem like a leak or resource exhaustion. I’m currently running in Frankfurt.
I’m really curious if this is just how Chrome headless behaves on microVMs and if there are any specific optimizations I should consider. Has anyone else run into similar issues with Playwright workloads? Any tips or best practices from the team?
I’m experimenting with different regions and Playwright configs, but honestly the inconsistent pattern makes me think I need to better understand how to optimize for this architecture.
Any insights would be super helpful!

