Health check failing for application startup

Ahmad_Elsaeed · October 13, 2024, 2:15pm

Hey, I’ve been using Koyeb for the past 3 months to deploy my web service.

I’ve slowly upgraded the instance size that I’ve been using for my deployments over the past few weeks as my load on the machine has increased.

My latest deployment has been failing consistently since yesterday. It tells me TCP health check on port 8000 failed.

However, I’ve made sure of those things:

All environment variables that are required are provided.
The instance is working perfectly fine on my local machine.
For the instance size, I was on the medium instance, which was more than enough to run my current application.

Then I read here in a previous discussion that the issue was with the instance size. However, I inspected my metrics and figured that I’m not even using 25% of my instance size in terms of memory. I still upgraded to the bigger instance size though. And it’s still failing the health check.

I even tried to do an HTTP health check. However, that didn’t work as well.

I tried to increase the duration that waits for the health check to return. That didn’t work too.

I’m not sure what is the solution now.

Can anyone help?

For reference, this is the message that I am seeing:

INFO: Started server process [1]
INFO: Waiting for application startup.
TCP health check failed on port 8000. Retrying…

Lukasz_Oles · October 13, 2024, 3:06pm

Hello @Ahmad_Elsaeed,

Are you sure that it listens on port 8000?

When started locally, can you check on which port uvicorn is listening?

Run something like:

docker exec <container_name_or_id> ss -tnl | grep uv

Ahmad_Elsaeed · October 13, 2024, 3:23pm

Yes, it runs on port 8000. It has been for the past 3 months, and nothing changed.

I can try to run that command though.

Lukasz_Oles · October 13, 2024, 5:42pm

Yes, please do it.
You can also compare the output with the current running deployment.
Updated command is:

ss -ltnp

I’ve forgot p flag.

Ahmad_Elsaeed · October 13, 2024, 9:00pm

Yes, I ran that command and yes it is running on port 8000.

Ahmad_Elsaeed · October 13, 2024, 9:20pm

Hey @Lukasz_Oles, I appreciate your help :). So I ran a profiler on the app startup process on my local machine, and it is around 447MB. Nothing that my instance shouldn’t be able to handle right?

I initially had an instance that has 2 GB RAM, and I’ve now even upgraded to 4 GB of RAM. Still the same issue.

For reference, my app is a Python FastAPI app.

Edit: I need someone from the team to help me with this as it is a time-sensitive issue. We’re trying to roll a new update of our service and we haven’t been able to because of this…

Edit 2: I’m suspecting there is something wrong happening here. I am monitoring the metrics when I’m deploying the new instance, and I’m looking at the CPU usage. For the medium size instance, I was getting up to 75% CPU usage. I upgraded to the large size (double the size of the medium) and it’s still at 62% CPU usage. I then upgraded to the XL because according to the troubleshooting docs, the percentage shouldn’t be up to 50%. But the X-large is giving me 99% CPU usage. I’m suspecting there is something wrong happening here. What could it be?

Lukasz_Oles · October 14, 2024, 7:46am

What does it do during the startup? Is it trying to connect to something?
What happens before it starts to listen on port 8000?
Maybe add some logs to the startup process to see when it hangs?

It looks like something is blocking the app from listening on port 8000 and that’s why health check is failing.

Ahmad_Elsaeed · October 14, 2024, 1:32pm

On startup, it’s trying to create 5 API endpoints and 9 FastAPI schedulers.

It doesn’t try to connect to anything external.

I added logs between the starting up of the app and its completion. It seems like the app runs 6 of the schedulers before the startup is completed.

This is where the TCP health check could fail because the schedulers don’t get completed before the health check. However, those schedulers have not changed since my last deployment that was successful. Not sure what could be the blocker here, but I’ve even added more grace period time for the TCP health check and it has also not succeeded.

I’ll try to deploy with the logs and see whether they get completed or not now.

cc: @Lukasz_Oles

Ahmad_Elsaeed · October 14, 2024, 1:55pm

Hey @Lukasz_Oles , I appreciate your help. The issue is fixed.

Sharing the solution in case someone wants to refer to it later. Basically, my app is a Python FastAPI app, and it was trying to run the schedulers in between the startup of the app and the completion of the startup.

In the decorator of each of the schedulers, I just added a parameter called “wait_first” that waits for one interval before running the schedulers. Hence, giving the app enough space to complete the startup before running the schedulers.

That fixed my issue without upgrading the size of the instance.

Lukasz_Oles · October 14, 2024, 1:58pm

@Ahmad_Elsaeed Thank you for sharing the solution!

Topic		Replies	Views
Deployment failing without changing anything Troubleshooting and help deployments	10	157	November 8, 2024
Cannot deploy - health check is failing Troubleshooting and help deployments	5	352	September 8, 2024
TCP health check failed on port 8000. Retrying Troubleshooting and help help	2	456	August 3, 2024
Instance has abruptly stopped. TCP health check on port 8000 failed, restart attempt 2. Instance failed to start Troubleshooting and help help , nodejs	3	1665	February 2, 2025
Fail to build for the first time Troubleshooting and help help , deployments	4	23	February 4, 2025

Health check failing for application startup

Related topics