[ad_1]
What occurs if you take a working chatbot that’s already serving 1000’s of shoppers a day in 4 totally different languages, and attempt to ship a fair higher expertise utilizing Massive Language Fashions? Good query.
It’s well-known that evaluating and evaluating LLMs is difficult. Benchmark datasets will be onerous to come back by, and metrics comparable to BLEU are imperfect. However these are largely tutorial considerations: How are business information groups tackling these points when incorporating LLMs into manufacturing tasks?
In my work as a Conversational AI Engineer, I’m doing precisely that. And that’s how I ended up centre-stage at a current information science convention, giving the (optimistically titled) discuss, “No baseline? No benchmarks? No biggie!” At the moment’s publish is a recap of this, that includes:
- The challenges of evaluating an evolving, LLM-powered PoC towards a working chatbot
- How we’re utilizing several types of testing at totally different levels of the PoC-to-production course of
- Sensible execs and cons of various take a look at sorts
[ad_2]
Katherine Munro
2024-08-26 20:39:06
Source hyperlink:https://towardsdatascience.com/lessons-from-agile-experimental-chatbot-development-73ea515ba762?source=rss—-7f60cf5620c9—4