When Artificial Intelligence Fails, the Perils of Accuracy

Often, AI projects are completely focused on “system accuracy” — in other words, “how often is the model correct?” This might be based on some classification tasks (e.g., how often does my AI system label pictures of birds as birds?) or some prediction tasks (e.g., my navigation system gave me a route that it said would take 15 minutes, how close was it?). Despite its apparent utility, this metric is often the wrong one to focus on, at least in isolation, and can result in very problematic behaviors and results.
For example, let’s take a hypothetical business trying to improve it’s navigation directions. Executives tell their data science department that they need a system that gets somebody from point A to point B as quickly as possible. The data science team tells their AI model to develop a technique to “find the shortest path from A to B.” The AI returns the most “accurate” answer — it tells the data science team to tunnel beneath the buildings. Of course, such a project would bankrupt the company — the problem was not specified properly, nor were the success criteria defined appropriately.
Maybe this hyperbolic illustration is a bit extreme, although the development of the Hyperloop would provide a counter-argument to that. However, we often find that firms who understand the totality of their business requirements, and can understand when it is appropriate to accept a lower “accuracy” will have far more success embarking on an AI project. The first step towards success is selecting the appropriate AI model or approach.
One of the most crucial decisions in any AI project is deciding which type of approach or model to use. Often these decisions are made in a vacuum by data scientists or technical experts. A standard approach in making such decisions is to examine the data set, determine an objective (often some type of optimization, classification or forecasting) and then approaching the problem in one of two ways:
- Explore the problem with a deep understanding of the math behind different AI approaches, the full context of the business requirements, and select the approach that best suits the objective of the project; or
- Try different models and approaches until one is found that “fits” the data best, or results in the highest level of accuracy, and then to use that.
We have previously discussed this process and our data-centric approach, but in the intervening time we have continued to see the proliferation of this with nearly every project we have been brought in to. Most often, model selection is based on one criteria — prediction accuracy or model fit. Without a concrete understanding of the risks the business faces and the economic model that is driving the AI project, the answer of an underground road system or a hyperloop is what can come out. These ideas sound great on paper, but will never make it into production (I’m not trying to speculate on the practicality of a hyperloop here, just using it as a punching bag).
Accuracy is often completely divorced from the real-world requirements of the business challenge that a company is trying to solve for. This is not to diminish the importance of model fit and prediction accuracy, but “accuracy” is just one of many features that an application requires, and it is often not the most important.
Outside of the feature set, accuracy is often quoted to indicate how effective a model is. But it’s not even the best measurement of model efficacy. A highly accurate model with a small percentage of extremely big errors is not necessarily preferable to a less accurate model with a higher percentage of minor errors. Errors lead to risk and liability, so focus often needs to reside on those parts of the system that don’t work. Understanding relative costs and exposure for such errors from a business point of view is critical in deploying successful AI applications.
For example, in the real world, companies have regulatory obligations and compliance requirements. They have budgets, existing legacy clients and a successful (hopefully) sales model. Data scientists who tell executives to go develop their own hyperloop will be laughed out of the room. Even worse, data scientists who build a sophisticated black-box that “wows” executives might expose the company to significant risk and liability.
A complete black-box that offers forecasts or predictions with no transparency can be a source of multiple problems, including inherent bias in training, or poor predictions (often edge cases) that are due to a lack of sufficient training data. Even if they appear to work well at first, they can be fragile, and inflexible to change. The inability to explain how an AI model arrives at a forecast or prediction is a non-starter in a regulated environment, and so the selection of a deep learning model could be a non-starter. Deep learning might provide the most accurate model, but if it cannot support fundamental business requirements the AI application will be a failure.
To continue to pick on deep learning, what if a company wants to build an application that is deployable at a client, and which must be able to be trained on-premise due to data privacy concerns? Training deep learning models requires substantial computing infrastructure at scale. Will clients need to have massive computing clusters in order to use this application? This requirement could result in a market for the application that is so small that it no longer makes commercial sense.
Are you completely considering your business requirements as you attempt to select an analytical approach to solve a problem? Are you laying out all of your requirements to ensure that the approach will solve the challenges you will face in production, at scale? Or are you fixated on a complex, black-box model that claims it can make forecasts with 95% accuracy, and dismissive of a simpler approach that might only provide 90% accuracy? If you are, you should first answer these questions:
- Is system accuracy the right metric to measure the value of this application?
- Have you integrated some notion of risk and cost? Will you be able to determine the difference between a false positive and a false negative, and the costs/risks of each?
- Have you discussed the legal, compliance and regulatory implications of an automated system? Do you feel comfortable that your design can satisfy those requirements?
- What features above and beyond accuracy or confidence are required? Are you willing to trade off accuracy to ensure you can support these features? Can you have a successful production deployment without them?
AI-based products or applications cannot be developed in a vacuum — a close partnership between business, technology and data analysis is fundamental to success in this space. Failure can mean wasted time and resources, or can be more insidious — poorly performing systems that drive away clients or open up the company to different business or regulatory risks. It can even lead to projects like a hyperloop, instead of more efficient trains and cars. On the other hand, success in this space can often provide new revenue, a better customer experience, and substantial competitive differentiation.