I’ve been reading the latest blog article about RAG chat, and it was very inspirational!
There is just one question I have as it’s not answered in the article itself.
Why did the author choose to use Replicate for chat predictions while openai also offers an api to provide chat completions?
In this way, the developer of the app needs an openai api key as well as a Replicate api key!
Why did the author choose to use Replicate for chat predictions while openai also offers an api to provide chat completions?
Using Replicate offers you the ability to run different models with minimal change in the code. Say you want to run google gemma instead of openai model, then you’d have to re-write the whole thing if you went with openai SDK in the first place.
Alright makes sense, Replicate helps to decouple UI from a specific generation model.
If the openai generation model suits my usecase then I could just use that instead Replicate.
It’s just that to use Replicate I’d have to investigate and understand yet another pricing structure beside the openai dependency that one already has for vectorizing/indexing your content. This is whether I doubt in the first place whether or not to use Replicate for the generation part in the chat UI.
If the openai generation model suits my usecase then I could just use that instead Replicate.
Totally. I prefer Replicate if I find the use case to not have OpenAI API as a required depedency.
It’s just that to use Replicate I’d have to investigate and understand yet another pricing structure
I think the pricing structure varies per API that you use. Let’s say you want to use Facebook’s LLAMA or Google Gemma 7B, because of their added capabilities, you’d wanna use Replicate irrespective of the pricing structure.