Hi Ryan, just came across this post and really liked it, will suscribe and read the other interesting looking articles as I'm diving into LLMs and RAG recently.
By the way, unsure if a mistake or not, but in the code section that follows the Hello LangChain title, there's a hard coded API key
I think saying "New York is 278% the size of Houston" is correct?
I could say I have a bag of 10 apples and Ryan has a bag of 5 apples, so my bag is 100% bigger than Ryan's bag. (100% of 5 apples is 5 apples. 5 apples + 5 apples is 10.)
Your computation of embedding cost is wrong. You say:
"Suppose you want to build a chatbot to chat with your corporate’s doc and you have 10,000 files, with an average text length of 20,000 tokens. In this scenario, you would end up spending: 10,000 x 20,000 x 0.0004 = $80,000 solely on embeddings"
but the cost of ada v2 is 0.0004 per 1K tokens, not per token! The correct calculation is:
10,000 x (20,000/1000) x 0.0004 = $80
Ada v2 embedding is very cheap!
To embed the collected works of Shakespeare would cost you:
1.200.000/1000 x 0.0004 = $0.48 !
The cost now is even lower, see https://openai.com/pricing: Ada v2 $0.0001 / 1K tokens, therefore your case with 10000 files would cost now $20, and the collected works of Shakespeare $0.12 !
Hi Ryan, just came across this post and really liked it, will suscribe and read the other interesting looking articles as I'm diving into LLMs and RAG recently.
By the way, unsure if a mistake or not, but in the code section that follows the Hello LangChain title, there's a hard coded API key
I think saying "New York is 278% the size of Houston" is correct?
I could say I have a bag of 10 apples and Ryan has a bag of 5 apples, so my bag is 100% bigger than Ryan's bag. (100% of 5 apples is 5 apples. 5 apples + 5 apples is 10.)
Your computation of embedding cost is wrong. You say:
"Suppose you want to build a chatbot to chat with your corporate’s doc and you have 10,000 files, with an average text length of 20,000 tokens. In this scenario, you would end up spending: 10,000 x 20,000 x 0.0004 = $80,000 solely on embeddings"
but the cost of ada v2 is 0.0004 per 1K tokens, not per token! The correct calculation is:
10,000 x (20,000/1000) x 0.0004 = $80
Ada v2 embedding is very cheap!
To embed the collected works of Shakespeare would cost you:
1.200.000/1000 x 0.0004 = $0.48 !
The cost now is even lower, see https://openai.com/pricing: Ada v2 $0.0001 / 1K tokens, therefore your case with 10000 files would cost now $20, and the collected works of Shakespeare $0.12 !
Yes, you are correct sir.
It should be 10 million documents (which is over the average of legal firm or medium hospital) instead of 10k documents. I've updated the number.
Thanks for pointing out.