Skip to main navigation Skip to content

Product Classification for a Global Tax Software Company

Urvin Product Classification Use Case

Business Challenge

A global tax software company approached Urvin AI with a business challenge they faced. New laws and regulations around the world would be shifting the responsibility for collecting sales tax from product sellers to e-commerce platforms. One of the largest e-commerce platforms in the world had approached them to figure out if there was an automated, scalable way to accomplish this. They wanted to use the text of the product description to classify the product, but were struggling with the messiness of the data and lack of any consistent standards for their user-generated listings.

The tax software company decided it would like to solve two goals – address the problem for their e-commerce platform client, and build a product to solve the problem generically for all e-commerce platforms.

They enlisted the help of an academic group at a major university, who quickly strung together some open source libraries to build a neural network-based solution. The solution showed promise with reasonable levels of accuracy on a very limited set of data, but nobody involved had ever been able to deploy AI at scale, in production, so they approached Urvin AI to help.

Urvin quickly identified many business features that a production-level solution would require:

  • For legal and compliance/audit reasons, the AI model had to be explainable;
  • The small dataset the prototype had been tested on was not representative of the scale of the problem. The solution would need to be able to train on 500 million records with 1,800 classes;
  • For data privacy and control reasons, many clients would want to train the system on-premise, rather than in the cloud;
  • The tax software company would want to be able to get models from across their clients, and use those to build better models for their entire client base;
  • The tax software company wanted to be able to integrate human feedback into the model to improve its results;
  • Models had to be able to be extended with more data over time; and finally
    They needed something that would work with 90%+ accuracy.

Urvin’s Approach

Urvin’s AI team did an analysis of the challenge, and the proposed solution. We quickly determined that the scale of the training data and number of possible categories would mean that a neural network or deep learning solution would require a huge amount of compute and RAM resources. Training time would be very high, hardware costs would be very high, the system would need to be retrained, feedback would be difficult to incorporate, clients would require large clusters to train on-premise, and the entire model would be a black box, inexplicable for regulatory and compliance reasons.

Instead, Urvin proposed a far simpler approach, with a more standard statistical classifier. Urvin designed a custom solution that would exactly meet the requirements of the tax software company. Despite being brought on to build an MVP-level solution, Urvin’s software solution was scalable and high-performance, and captured all of the non-performance and accuracy requirements for the client. Instead of leveraging open source libraries, Urvin wrote everything from scratch, building a unique custom offering that was able to:

  • Handle large amounts of “dirty”, out-of-spec data;
  • Train on large datasets very quickly;
  • Incorporate novel data structures to support model extensibility and model transfer without any need for retraining the system;
  • Provides clear insight and reasoning for all classification decisions that would be fully auditable for legal and regulatory reasons; and
  • Worked with 90%+ accuracy for the most important / highest volume categories.

The Result

Urvin came in and was quickly able to build a proof-of-concept. That codebase was not throwaway code – it was a high quality foundation for the application, which ultimately resulted in a production deployment with one of the most demanding e-commerce platforms in the world. While the entire project took 9 months, the foundation of the platform was built in 4 months and could be deployed to production in 6 months. Both firms were extremely impressed and satisfied with the results.

AI Thoughts + Insights

Thought
When Artificial Intelligence Fails, the Perils of Accuracy
Despite its utility, “system accuracy” is often the wrong metric to focus on and can result in very problematic behaviors and results.
When Artificial Intelligence Fails, the Perils of Accuracy
Thought
Implementing AI: Where do I start?
One of the most common questions we encounter when meeting new firms is how to adopt and start using AI and other analytics technologies.
Implementing AI: Where do I start?
Case Study
Data Linking for a Noisy, Huge Dataset
Urvin was approached by a firm who needed to process petabytes of noisy data with a reasonable hardware footprint and processing time.
Data Linking for a Noisy, Huge Dataset