Sunday 8 April 2018

The art of creating questions

Especially if the main purpose of your bot is to answer questions, and no matter whether you go the rule-based or the machine-learning path, or both, you will be faced with the problem on how to generate a data-set of questions that your bot is supposed to answer to.

In the ideal case you would have already a dataset of conversations dumped from a live chat software. However, you might find yourself in the situation of having to bootstrap your bot with a limited sets of questions. It is the so-called cold-start problem, which often causes chatbots to suck.

In fact, you need great variants of questions that include the wide domain knowledge of that industry e.g. ‘internet does not work’ is the same as ‘my browser is not loading’. Today, these variances are mostly manually generated - besides reducing coverage for user questions this can be cost prohibitive. New ways to auto create these variances at scale are becoming available though it would take some time for this intelligence to have domain context.

It goes without saying that, first of foremost, you have to build a pipeline to clean up the data - to be used both while training and while predicting. So that the database of questions that you build is going to be clean.

After that, the effort of creating an initial database of questions is probably going to be manual. A focused small group of experts knowing the business domain and with good analytical skills is probably your best bet to build a questions database.

The process might look like this: domain experts would gather documents related to each of the type of questions your bot is supposed to be able to answer to. They would also search on forums and questions site (such as quora / reddit) to find out how users may formulate questions that can be answered to each answer. Ideally, they would also be able to extrapolate rules.

 You need reading from the domain or literature. and/or create an ontology.


No comments:

Post a Comment