Latest blog article on RAG Chat

Seth · March 12, 2024, 2:22pm

I’ve been reading the latest blog article about RAG chat, and it was very inspirational!
There is just one question I have as it’s not answered in the article itself.

Why did the author choose to use Replicate for chat predictions while openai also offers an api to provide chat completions?

In this way, the developer of the app needs an openai api key as well as a Replicate api key!

Can anyone share their light on this?

The blog article: https://www.koyeb.com/tutorials/build-a-retrieval-augmented-generation-chatbot-using-pgvector

Openai completions API

Rishi_Raj_Jain · March 12, 2024, 5:14pm

Hey Seth,

Rishi the author of the article here.

and it was very inspirational!

Glad to hear kind words about the blog!

Why did the author choose to use Replicate for chat predictions while openai also offers an api to provide chat completions?

Using Replicate offers you the ability to run different models with minimal change in the code. Say you want to run google gemma instead of openai model, then you’d have to re-write the whole thing if you went with openai SDK in the first place.

Hope that makes sense.

Seth · March 13, 2024, 1:22pm

Alright makes sense, Replicate helps to decouple UI from a specific generation model.

If the openai generation model suits my usecase then I could just use that instead Replicate.

It’s just that to use Replicate I’d have to investigate and understand yet another pricing structure beside the openai dependency that one already has for vectorizing/indexing your content. This is whether I doubt in the first place whether or not to use Replicate for the generation part in the chat UI.

Rishi_Raj_Jain · March 17, 2024, 7:47am

If the openai generation model suits my usecase then I could just use that instead Replicate.

Totally. I prefer Replicate if I find the use case to not have OpenAI API as a required depedency.

It’s just that to use Replicate I’d have to investigate and understand yet another pricing structure

I think the pricing structure varies per API that you use. Let’s say you want to use Facebook’s LLAMA or Google Gemma 7B, because of their added capabilities, you’d wanna use Replicate irrespective of the pricing structure.

But yeah, always prefer what suits your use case.

Topic		Replies	Views
Changelog #78 - DeepSeek-R1 Qwen 32B, Llama 70B and Mistral Small 3 One-Click Models, DeepSeek-R1’s Multi-Lingual and Agentic RAG Capabilities in Practice, and more Announcements changelog , domains , deepseek , inference , mistral	2	417	February 5, 2025
Changelog #83 - QwQ 32B and R1 1776 Distill Llama 70B One-Click Models, 8x H100 GPUs, and more Announcements changelog , control-panel , h100-gpus , qwq-32b , r1-1776	1	465	March 7, 2025
Changelog #80 - Qwen 2.5 VL 7B Instruct and Qwen 2.5 VL 72B Instruct One-Click Models, Improved Scale to Zero Cold Start, and more Announcements changelog , scale-to-zero , 1-click-model , llm	1	520	February 14, 2025
Changelog #77 - New AI models in one-click app catalog, faster out-of-memory detection, and more Announcements changelog , troubleshooting , autoscaling , scale-to-zero , 1-click-model	1	462	January 17, 2025
Changelog #76 - New plans, deploy AI models in one click, and more Announcements changelog , pricing , metrics , secrets , 1-click-model	1	206	December 20, 2024

Latest blog article on RAG Chat

Related topics