gRPC (server) streaming "terminated by RST_STREAM with error code: INTERNAL_ERROR"

I’ve deployed 2 variants (Go|Rust) of a gRPC server that implements gRPC Health Checking protocol.

The Health service implements a unary Check and a server-streaming Watch method.

The unary Check works without issue (other than the issue with reflection per link).

ENDPOINT="healthcheck-dazwilkin.koyeb.app:443"          # Golang
ENDPOINT="healthcheck-rust-dazwilkin.koyeb.app:443" # Rust
SERVICE="grpc.health.v1.Health"
METHOD="Check"

grpcurl \
--proto health.proto \
${ENDPOINT} \
${SERVICE}/${METHOD}
{
  "status": "SERVING"
}

However, the server-streaming Watch method fails (on both servers). Neither service is able to stream more than ~8 messages before being terminated. The implementations both sleep (15 seconds) between messages (8*15 = 120 seconds = 2 minutes!?):

METHOD="Watch"

grpcurl \
--proto health.proto \
${ENDPOINT} \
${SERVICE}/${METHOD}

Golang:

{
  "status": "SERVING"
}
{
  "status": "NOT_SERVING"
}
{
  "status": "SERVING"
}
{
  
}
{
  "status": "SERVING"
}
{
  "status": "NOT_SERVING"
}
ERROR:
  Code: Internal
  Message: stream terminated by RST_STREAM with error code: INTERNAL_ERROR

NOTE The Golang implementation randomizes the service status

Rust:

{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
{
  "status": "SERVING"
}
ERROR:
  Code: Internal
  Message: stream terminated by RST_STREAM with error code: INTERNAL_ERROR

The Golang service logs no (obvious) errors but the Rust code panics (server.rs:90 shown below) which should not error:

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: SendError(Ok(HealthCheckResponse { status: Serving }))', src/server.rs:90:18
tokio::spawn(async move {
    loop {
        println!("[watch] Sending");
        tx.send(Ok(HealthCheckResponse {
            status: ServingStatus::Serving as i32,
        }))
        .await
        .unwrap(); // line #90
        println!("[watch] Sent");
        println!("[watch] Sleeping");
        sleep(Duration::from_secs(15)).await;
    }
});

Both servers are Nano instances but are well-within instance capacity (CPU: ~0%; Memory: ~0.5MB Rust ~3.0MB Golang)

2 Likes

Hey @Daz_Wilkin,

Thank you for your feedback!

This 120s timeout is our default timeout for http request, it’s strange that the stream is closed with a INTERNAL_ERROR code, during my test I saw NO_ERROR code.

I increased the max stream duration for gRPC stream to 12h, this value is still temporary, and we might adjust it depending of the platform load and feedbacks.

We plan to support customizable timeouts for http, websockets and streams, you can upvote the feature request here: https://feedback.koyeb.com/admin/feedback/feature-requests/p/custom-per-service-http-timeout?boards=feature-requests

We believe it’s important to have limits to make sure gRPC clients properly implement reconnects.

Let us know what you think,

Bastien

3 Likes

Aha! That explains it’s and it is no longer failing (after 2 minutes), thank you!

gRPC streaming (client|server) is a differentiating feature (similarly w/ Websockets) and I would expect many|most gRPC services would use some form of streaming.

If not already, please consider documenting (gRPC and) this (understandable but non-obvious) constraint.

I think it’s reasonable for you to require a customizable time for this; setting it would be a signal to devs that it exists and would provide them with the ability to encode their expectation of how long their streams should exist.

How did you test the service? I’m interested to understand the different error codes.

Thank you!

1 Like