Currently scale-to-zero only works for publicly open services. Which means any malicious user/bot could randomly make a request to the service and keep the service up all the time, even if authentication fails. And once a malicious user finds your endpoint, the endpoint can never be used again without always being up. This essentially makes scale-to-zero services too dangerous to use in terms of cost.
This thread talks about using automated pause/unpause Why can't workers scale to 0? Should only have to pay for compute that is being used which might be ok for queued jobs, but won’t work for realtime applications because of the extended spin up time.
If we could just be able to throw up some API gateway in front of the service this would solve the problem. But as of now we can’t because private services can’t be scaled to zero.