Issues running a model

Hello there,

I am trying to deploy the Qwen-2-72B-4bit model. The main issue I’m encountering is that after successfully deploying with Docker, my requests are failing with error code 524.

Additionally, since this is a vision model, I need guidance on how to properly include images in the requests. I want to ensure the image handling isn’t causing the error code.

Thank you in advance.

Hi @Andrea_Mikula,

We have example apps for VL models deployed with VLLM such as Deploy Qwen 2.5 VL 72B Instruct One-Click App - Koyeb.

In it, we suggest way to generate completions with the model as follow:

import os

from openai import OpenAI

client = OpenAI(
  api_key=os.environ.get("OPENAI_API_KEY", "fake"),
  base_url="https://<YOUR_DOMAIN_PREFIX>.koyeb.app/v1",
)

chat_completion = client.chat.completions.create(
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://images.unsplash.com/photo-1506744038136-46273834b3fb"
            },
          },
          {"type": "text", "text": "Describe the image."},
        ],
      },
    ],
    model="Qwen/Qwen2.5-VL-72B-Instruct",
    max_tokens=50,
)

print(chat_completion.to_json(indent=4))

It is probably what you are looking for?

1 Like