What No One Tells You About AI Supervision

Warum das Training von KI auf eigenen Daten so wichtig ist

by Karthik Balakrishnan, Wysdom.AI

Some playtime with a “state of the art” Parts-of-Speech tagger led me to a startling realization - while everyone is busy making hay when the sun shines and selling their freshly baked AI tools and frameworks, few if any, realize that AI is anything but akin to DIY IKEA furniture! Before I go any further, let me explain what happened.

Language in its infinite beauty, is incredibly nuanced. In the sentence “my car has four seats” - the word “seats” is a noun. However, in the sentence “my car seats four” the word “seats” is a verb. Both sentences are identical in meaning. And the scholastic tagger reckoned that “seats” is a noun regardless of the phrase!

And guess what - “parts of speech” tagging uses machine learning to transpose a free-form sentence into a structured artefact, with each word tagged with the appropriate grammatical role it plays. It is the most critical step in the Natural Language Understanding chain - if you get it wrong, everything downstream breaks!

Example: Part-of-Speech-Tagging Result

Example: Part-of-Speech-Tagging Result

The grammar police in me quickly realized the risk of canned, black box AI tools:

  1. You have little control over how these tools are trained

    ● While most commercial cognitive services have done wonders for accelerating AI adoption into everyday applications, they seldom (and often never) allow you to change the model’s training parameters.

    ● Those familiar with Machine Learning would quickly realize that training using a Naive Bayes’ vs a Max Entropy algorithm can produce very different models and results.

  2. You have little control over what these tools are trained on

    ● Some cognitive services allow you to train their models on your data.

    ● However, some don’t! A major risk of using pretrained models is that you have no clue on what data the model(s) were trained on.

Let’s take a Halloween example. Say 1 in 100 humans is a zombie and your machine learning model should identify zombies from humans. If you train the model on 99 humans and 1 zombie, guess what - your model will, with a very high likelihood, classify a zombie as a human and woe shall befall upon us. You might as well have just guessed blindly!

My suspicion is the Parts Of Speech tagger was exposed to skewed data, i.e. a majority of samples had “seats” in a noun context, which biased the model’s output.

If your data is skewed, your model itself could be rendered useless.

Which brings me back to my earlier point. Operating AI isn’t plug and chug. Should you do that, you’ll be chugging someone else’s drink – and beware, it could be spiked!. AI supervision is about owning your data and having the right resources to constantly train your models on the right data. Identifying what’s right takes experience and expertise and sometimes, is art and not science.

At Wysdom.AI, we believe in a cognitive platform approach - a triad of Wysdom’s cutting edge cognitive products, surgically augmented DIY cognitive services, and the all important AI supervision. A platform approach is vital for AI to truly bring value to your business processes and in turn, have a positive impact on your customers.

Through our years in the industry, we’ve seen enterprises large and small struggle and succeed. Success is often the result of meticulous planning and adopting a platform approach to AI.

So before bolting off the gate to deploy DIY cognitive services, always remember:

Failure to train is training to fail.
Add a comment

your browser is not up to date
to enjoy this website you will need to install a modern browser.
we recommend to update your browser and to install the latest version.

iOS users, please male sure you're running at least iOS 9.

Mozilla Firefox Google Chrome Microsoft Edge Internet Explorer