in AI, Software

Chatbots before LLMs

Designing Bot frameworks

In the era before LLMs, building a conversational agent meant ….one had to literally come up with sample phrases so that the Engine could do the Named Entity Recognition. Doing this by ourselves was limiting by definition as each person would have a limited range of expression. In that era, there were also tools that generated sentences if you gave them the base activity as input.(Now that we have experienced LLMs, this sounds funny on multiple count, but it used to work). (The era before production-grade NLP was even weird, i have documented that in an earlier post . )

This is also the era when a formal framework for chatbot interaction did not exist. So we ended up building our own framework. We had to debate the fitness of different NLP libraries (Stanford vs opennlp etc) and debate about their accuracy and code framework capability around interactions/invocation. In one case i was so frustrated with NLP that i designed a framework that would allow users to issue commands instead of chat by typing and we also had autocomplete/type ahead added to it . Like how you type elaborate commands on Unix, but imagine the typing experience of Google search for this. We were able to do this with some good keyword-parser-functional paradigm. And not to forget a brilliant developer with me,Nehal . But the momentum for a formal chat-style bot was huge and frameworks arrived soon.

Designing with chatbot Frameworks

Again,a debate will unfold about the chat agent framework selection. Most of the frameworks had similar capability around the core NLP and invoking services part but they had marked differences in the “flow” aspects, voice vs text capabilities and so on . This madea huge difference when the conversions we were supporting had multiple end states or conditionalities. The implication of this statement is that when we first did our chatbot in 2014/15 , the Alexa one the field of conversation design was not acknowledged (Alexa wasn’t available in India then, we got it from us). It was a few years later, especially when the commercial use cases came along, that the User Experience of the part of the chat interaction became mainstream/part of project work (Interaction Design) . Bot discovery and interaction design are useful and important even in the agentic era.

Getting the chatbots working

Once the engine of the framework we selected and trained would determine the action, I would write and wire handlers to do the processing.It quickly evolved into a chain of command pattern cum workflow or some sort. Giving rise to all sorts of integration issues.Message transformation, Errors, Retries, Auth and so on. Some of the frameworks had built-in capability to chain conversational flows (and pass variables/values around ). Most of them had some take on retries and how long the conversation can be, but it had to be discovered than being documented.

And then there was a user.At time, he would be technically disconnect from the chatbot, so we had to maintain the whole conversation state in the database. Sometimes the follow-up step in the processing needed more inputs from the user to I had to create a local and global state for the whole interaction to be recorded.

In some use cases we had to ask the user to upload receipts, which were processed by a vision model.And guess what, the image upload and processing could take more and varied time for my chatbot to remain active.So we keep some keepalive and sweet nothing “status… updating…” going to the user,to fool the whole system. Moreover, the framework chosen didn’t have native support for this sort of outside call, so everything had to be bundled together.

In another use case, we had served him content based on a help document.This had to be done with a combination of a search index in case the document repo was too large.In case of a structured FAQ one of the engine ,NIA had built in the capability of mapping queries to document keywords (density).

As a side note , many engine had some capability to detect obscenity that sufficed.PII interestingly panned out in black and white in many cases (due to the domain and use case mix at that time).

Voice Video and Human agent handovers

The voice based chatbots had a different set of additional issues.The engines from AWS and Google had built in ability to prompt for the question again if the pronunciation wasn’t clear. At times, this ended up in multiple retries/pass at the same handler so it had to be taken care of (since not all services were idempotent ).At time, the user would totally rephrase the ask, which would throw our design off guard .

Another fun aspect is that as Alexa evolved into a voice device with screen, we suddenly had to take care of the visual aspect of the interaction. Multilingual support was out of the box, so life was cool .

Video bots were a different thing to handle. Out stuff didn’t make it to production but the idea was to emulate a human face with expressions (confidentiality etc etc).It was pretty impressive for that time .

In one of the case, we had to handover the interaction to a human agent based on the predefined scenario.It was a straightforward integration to another system with some adjustments to timeout. But when the requirement evolved into passing the whole conversation to a human agent ,we realized that we didn’t have a handle to chat interaction that is provided by the framework! So I logged them as passed it on. That eventually led us to design another product around chat interaction analysis/insights and designs and it went on to (then ) compete with chatbase product.

Some notes

When we select a new technology or framework, it’s best to adapt to its way of doing things. However, when the field is new and evolving, the capability mismatch can be huge. In many case,s when it came to call/orchestrate service calls, my experience with traditional banking development helped me handle the issues with ambiguity, state and performance better than the chatbot native generation of freshers who looked towards the state of the art for solution.It also mattered because most of the recommended remedies around these problems were to use some sort of ESB or wire RPA somehow.I had found them out of sync with spirt of chat interaction (Now that we have LLMs to reason, plan and orchestrate them, i feel validated with some sort of emotional closure) .

Later, when ChatGPT happened and we moved on to RAG and Agentic tools,it felt like I was remaking the Spiderman movie Franchise for the third time. The story from there on in next post.

Write a Comment

Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.