Building Intelligent Systems - Some Thoughts
The customer service representative receives a call from a disgruntled customer about a service that s/he had requested for the order but had passed the committed date. One is bound to hear such stories if you have ever been a customer or a service provider.As per a research study conducted by TARP research way back in 1999 it was discovered that on an average one unhappy customer will tell ten people about their experience. In turn, these ten people will each tell five more people. Just imagine how fast an experience can flow these days due to internet and rise in the tele-density.
For all practical purposes there are orders that should have completed but did not and therefore needed human intervention to complete. As a result of such instances, the company requires at least two people to look at the order after the unhappy customer calls. Apart from the probability of losing the customer, time is lost in reacting to the call.
Earlier in my career, I was managing a stack of such order fulfilment systems. There, as a matter of habit, I used to look at the percentages of order fall outs for a day and then try and find patterns. Then, this experience was used to proactively address issues and help the orders complete on time. While all this was happening, I had asked my team to make daily logs of the issues that prevented the order from completing.
The number of issues, though different each time showed a trend. So, an automated report was generated every night and the incomplete orders were manually completed by the product support group. Taking it a little further, the application support team wrote scripts to identify such orders and complete them. This significantly reduced the calls to the customer service centres.
All this was done in less than six months along with major enhancements and patches being applied to the system and with other hardware and database issues.
All this pain could have been avoided had the systems been made a little intelligent during the design itself.
Here are some of my suggestions on how it can be achieved:
1. Order life-cycle: Old legacy system tended to be simple, sturdier, self contained and of limited importance to business success. Their modern counterparts are often very complex, interacting with other large stack of heterogeneous systems that are equally business-critical. This indicates that we must understand each finite state in the order processing and provide a ‘check pointing’ mechanism which links the various life cycle stages of the order and enables us to support each of these systems in concert. It is very difficult to orchestrate the order when the application support groups and the business support groups work in silos. Moreover, it becomes all the more challenging in a multi-vendor support scenario.
2. Intelligent and Predictive Interfaces: When an order is being processed or the order fails, each of the interfaces should interact with the other systems in the stack. If the orders are being processed as required then the order processing agent should automatically forecast and reserve computing resources in the cascading systems or in case of a system resource crunch or a wrong resource being allocated should trigger a fix it job, a call out to the support engineer or trouble ticket be raised.
3. Utility function based services: The goal based and the non-goal based states of an order should be used to calculate the probable order completion. This utility function should help the interfaces to decide the resource allocation. If a non-goal based state is observed in any of the sub-systems in the stack then the order should be passed on for a stage 3* monitoring to the exception management system.
4. Exception Management Systems: As mentioned in the 2nd and the 3rd point, when any of the sub-systems in the stack encounters a problem, the intelligent agents and the utility function should identify the order state and the order then should be passed on to the Exception Management System. This is quite similar to the exception block in a code except that the exception management system, based on the inputs, has the ability to intelligently compute and take appropriate action. A very simplified diagram below depicts the behaviour.
5. Learning system: One of the must have features of an intelligent system is that it should be a learning system. This is a little hard to achieve but system can be built with simple learning logic. If many orders fail for a certain reason than the exception management system should understand various actions taken earlier by the support engineers to fix the job and apply the same logic (lessons/learnings) to the failed order.
6. Throttle/Release Control: In most of the order fulfilment systems the SLAs are defined as number of orders completed in a day but the systems are sized based on the average number of orders that will be handled during the peak load. In practice it is very difficult to size the stack, as the owners of each of the sub system in the stack could be different or the subsystem could be in a different life-cycle stage. This could make one of the subsystems prone to failure or in an asynchronous system build a backlog in subject to peak load. In such a case it is very important that the order processing on the other systems are throttled. I have used this mechanism extensively and had developed a formula to compute the control parameters.
I am fully cognizant of the fact that there are practical challenges and there are a lot of other important considerations while designing the system but going forward the designers and the budget owners will have to start thinking of building intelligence in the system and the suggestions given above are some of the things that I feel that can help design intelligent systems.
Due to lack of data, it can be argued that this will increase the design and the development cost but personally I feel that a multipurpose bots can be built and configured for each of the systems. Also, considering the revenue loss, due to disgruntled customers, and the reduction of the number support engineers the TCO of the system or the stack would be greatly reduced.
