Response times just spiked - memory usage went blank

Eileen_Noonan · September 1, 2023, 10:23pm

I just had about 10 minutes of extremely slow response times - upwards of 5 minutes, though the app didn’t technically crash. During this time, there’s actually a gap in the memory usage in the logs.

I’m curious what could have caused this? I’m using a micro instance on the free tier and I am probably pretty close to maxing that out in terms of memory. I also had run an operation to generate and store a few MB worth of vector embeddings (though the actual inference happens on HF). I was also in the console running du -sh . just to see how big things were.

Are there often fluctuations and outages like this? Would the solution just be to upgrade to a larger instance? Thank you-

~E

Sanna_Jammeh · September 3, 2023, 3:55am

Also seeing this issue.

Eileen_Noonan · September 3, 2023, 4:55am

I think I may have hit my memory limits from doing various operations. I just deployed a change that uses disc storage instead of memory storage and it cut my usage drastically. (using lancedb instead of docarray - lancedb data is tiny and in git so I assume it’s not ephemeral).

The metrics were originally showing I used about 90% of my memory at rest. Now it’s more like 50%. Hopefully that prevents further interruptions but we’ll see.

yann · September 4, 2023, 8:55pm

That sounds indeed like a silent out of memory. Out of memory tends to be hard to troubleshoot as they are often silent in terms of logs.

We have a few ideas to try to better detect and warn in case of out of memory, I created a feedback ticket and we will try to move on this shortly.

One thing to keep in mind is that there is no swap on the machines, so if you’re at 90% of RAM usage, the next step is clearly an out of memory.

Let us know how it goes!

Eileen_Noonan · September 4, 2023, 9:08pm

Thanks Yann. Luckily I did find a better solution for vector storage and retrieval and it cut my memory usage way down! This is just a prototype so I’m really trying to stay on the free tier.

Topic		Replies	Views
I've already experienced about 5 times (usually once a day) when the API simply stops responding to any requests, and I need to rebuild it to get it working again Troubleshooting and help help	4	46	October 22, 2024
Upstream connection error, service degraded Troubleshooting and help help	3	21	August 7, 2024
Egress monitoring General	5	32	November 20, 2024
Changelog #77 - New AI models in one-click app catalog, faster out-of-memory detection, and more Announcements changelog , troubleshooting , autoscaling , scale-to-zero , 1-click-model	1	462	January 17, 2025
Flask App Loads Infinitely Troubleshooting and help help	1	143	April 9, 2024

Response times just spiked - memory usage went *blank*

Related topics

Response times just spiked - memory usage went blank