Like a Flight Simulator, But for Business Decisions
No first-time pilot would fly a plane without logging hours in a flight simulator. A simulator allows pilots to practice reacting to rare events like wind shear and engine failure. Testing their decision-making with unexpected events makes them better prepared when flying. The simulator can also provide exact mathematical estimates based on their reactions, like determining if they have enough altitude or fuel to return to the nearest airport.
Unfortunately, individuals and business leaders have no equivalent way to practice decision-making. They make decisions without rehearsal and unaided by the true probabilities and consequences of their actions. Whether one is negotiating to buy a home or trying to sustain investment performance in volatile markets, inexperience and emotion can steer people in the wrong direction.
Our team of artificial intelligence specialists combined forces with scientists from Yale University's Human Nature Lab to address this gap. We ran online experiments using a gaming interface that can be configured to simulate most decisions. Players made decisions as the game simulated changing conditions. Hundreds of live participants were rewarded with payouts as their decision making improved. Our algorithms learned: 1) how behaviors changed under different conditions, 2) how these reactions affect performance, and 3) which interventions improve decisions.
How Decision Simulations Work
The decision simulator operates like a game wherein players experience changing events and make decisions in response. Various games focus on specific decision scenarios such as managing personal finances when faced with surprise expenses or making portfolio allocations in volatile markets. Each game has four components: 1) an introduction, 2) goal setting, 3) the simulation, and 4) feedback. In our experiments we payout monetary rewards based on the players' scores. This makes the experiment more realistic, because players are trying to maximize actual gains. We also apply the scientific method to all our apps, where half our group is randomly controlled and the other half is playing with and against the RL Agent. The control group experiences the same events, but may receive more automated decision guidance (e.g., the standard late payment reminders from a credit card company). The treatment group plays a game controlled by the Agent. This method helps us understand whether the AI is performing.The Agent can be configured to be both a guide and a counter party. As a guide, the Agent learns to take actions to achieve a reward function beneficial to the player. (e.g., changing messages to improve early payments of debt). As a counter party, the Agent tries to beat the player (e.g., negotiating price reductions with players where the Agent wins on lower prices and the player wins on higher prices). In the process, the AI can teach the player to hone their strategy.
Enterprise leaders can use this approach by running the RL Agent with a panel of research participants to test behavioral assumptions. This is much more practical than survey research as it models actual behaviors and reactions versus simply asking participants for their preferences.
As the table below suggests, there are several forms of Decision Simulation. First, it is possible to run decision simulations where algorithms compete with each other (adversarial algorithms). The algorithms are guided by historic data patterns. The second form is the most common. Here we develop a Decision Simulation game and crowdsource participants to play. Finally, Decision Simulations can be setup as an employee or customer tool to help guide their choices with an AI.
Reinforcement Learning
Many of OnCorps' behavioral algorithms are based on reinforcement learning (RL). As illustrated in the graph below, reinforcement learning uses an autonomous agent technology guided by policies (e.g., what to do and at what moments).
These policies are updated by an algorithm that observes behaviors from the environment (e.g., a portfolio allocation scenario). The RL Agent chooses different actions (e.g., decision guidance messages) and is rewarded by adherence to its messages. As more participants interact with the Agent, and as various actions yield better rewards, the algorithm updates policies to the Agent. Put simply, the Agent learns to improve by trying things and seeing what works and what doesn't.
Algorithm Performance and Results
We have seen one of the fastest adoption rates of any AI product we have developed. One reason for this is the gaming and experimental nature of the tool allows firms to test their nudges and campaigns without disrupting their customers.
As the illustration below shows, the platform also significantly increases the frequency and granularity of trials they can conduct. This is a unique benefit in RL because the agent is essentially creating hundreds and sometimes thousands of permutations to determine which best achieves its reward function.
Most firms, in contrast, test campaigns with traditional survey and focus group methods. This limits their ability to make micro changes and adjustments. The logistic regression curve above suggests one apparent reason for our successful results. It is simply creating more iterations at levels of granularity that would be cost prohibitive using conventional methods.
Case Study Performance
We worked with a major global consultancy to apply this RL method to a bank's credit collections operations. The goal was to reduce time-to-pay for delinquent customers. The approach had two major components: first, to use an RL Agent to determine what combination of messages, timing, and frequency might work better. The second goal was to determine if we could identify differing behavioral mindsets and tailor messages to them.
To test these assumptions, we worked with members of Yale's Human Nature Lab to devise a personal finance game. The game gave each participant a checking and credit balance. It simulated the passage of time, paydays, and surprise expenses (e.g., medical bills). It then tested various messages to prompt players to pay their bills.
We randomly separated the game players into a control group and a group interacting with our RL Agent. The control group nudging was setup to exactly match the bank's notices and frequency. In this way, we were able to see if the AI could beat the status quo. The RL Agent experimented with different messages and frequencies. The RL achieved a 16 percent reduction in time-to-pay. Moreover, when leveraging the RL Agent to experiment with different messages, we were able to achieve a 4.5% increase in click-through rates.