While machine learning has proven to be an indispensable tool for products and services that have clearly defined objectives and labeled datasets (e.g. personalized playlists and image captions), organizations in other industries have struggled to find business value in using conventional supervised machine learning.
Instead, companies in spaces ranging from financial services to healthcare have an abundance of unlabeled and messy data that, in theory, should provide fertile ground for generating automated insights using unsupervised learning and directly contribute to the business’s bottom line.
However, through conversations with AI/ML leaders at major banks, fintech startups and pharmaceutical companies, we’ve learned that the full potential of unsupervised machine learning is blocked by:
Slow training and inference of available implementations of unsupervised and semi-supervised algorithms
As a consequence of slow training, complete inability to efficiently find optimal hyperparameters for the models
Engineering overhead in building supporting infrastructure to allow quick iteration and easy deployment
Lack of metrics and tools to compare models across unsupervised and semi-supervised types
Fragmented communication between data teams and business stakeholders, causing silos and imprecision
According to CNBC, in 2020, credit card fraud alone led to nearly $11 billion worth of losses in the United States. In a KPMG survey of the global banking industry, “over half of respondents recover less than 25 percent of fraud losses; demonstrating that fraud prevention is key.”
Case Study: Financial Services Fraud
One particular area that encapsulates these issues is financial services fraud.
In an effort to detect new patterns of fraud as soon as they emerge, banks and other financial services companies employ teams of data scientists and software engineers that build machine learning models to augment the work of investigators and business analysts.
However, developing such capabilities comes with massive overhead costs that manifest in all of the aforementioned issues. Engineers have to use a myriad of open-source and paid tools to build infrastructure before any machine learning becomes feasible in a production setting.
On top of engineering overhead, fraud detection relies on unlabeled data, making tried-and-true supervised techniques infeasible. The reason for this is that most fraud is characterized by a cat-and-mouse game, where companies are playing catch up as adversaries adapt to fraud policies and find creative ways to trick them. Therefore, it is almost impossible to come up with a fully labeled dataset of new fraud patterns as they emerge. Therefore, data scientists increasingly look to unsupervised and semi-supervised learning for help. However, these methods do not have the same abundance of open source or even paid tools as conventional supervised learning.
Last but not least, data science teams need to effectively communicate their methods and findings to less technical audiences, such as business analysts. Tools in this space are fragmented and not designed for nuanced data visualizations, usually boiling down to static PowerPoint charts and Excel sheets.
Introducing the All Vision Platform
All Vision is a platform that provides fast, scalable and experiment-friendly unsupervised learning tools. With All Vision, you can connect a variety of data sources, clean and prepare datasets, and run a gallery of unsupervised models ranging from clustering (e.g. k-means and DBSCAN) to anomaly detection (e.g. LOF and HDBSCAN). We reimplemented every algorithm from scratch, achieving up to 10x speed-ups in training and inference in comparison with open-source implementations. Manual hyperparameter tuning is available for each algorithm and is feasible for datasets as big as 10GB.
Once you’ve found the best model for the job, we offer one-click deployment to All Vision’s cloud, your private cloud, or on-prem.
For other stakeholders, we automatically generate performance and metrics charts of the data and models during development and production. As soon as another stakeholder needs an update on the process, All Vision generates a dashboard with charts and metrics that you can customize. This dashboard can be accessed within All Vision or easily integrated with existing BI tools and workflows. No more copy-pasting screenshots into PowerPoint decks.
API Platform for Any Use Case
We like to describe what we are building as an API Platform. Having been inspired by the API economy – pioneered by the likes of Stripe and Plaid – we’ve found and validated a need for machine learning tools that are as simple as an API call.
That is because most companies already have existing data infrastructures and production environments that make it exceedingly difficult to integrate all-in-one SaaS platforms for machine learning. Instead, they are looking for value-add in a few lines of code.
With that said, we also recognize that many organizations would prefer an all-in-one platform akin to what Tableau offers in the BI tools space. Thus, we built an intuitive no-code platform on top of our APIs to allow anyone to use unsupervised learning with a few clicks. With this offering complementing our API-as-a-service, we created a powerful and accessible API Platform product.