Artificial Intelligence

AI Learning Types - Unsupervised Learning

Unsupervised learning can find interesting patterns in your data

Unsupervised learning involves training AI systems on unlabelled data, without any predefined input-output pairs. The AI system learns to identify patterns, correlations, or structures within the data by itself.

The difference between Supervised and Unsupervised learning

Supervised learning is a powerful technique, but it comes with a significant drawback - it needs a lot of labelled data. For instance, if you want to train an AI system to identify spoons, you might need to provide it with 1,000 or even 10,000 pictures of spoons That's a massive amount of spoon photos to input into the system!

However, if you consider how humans learn, the comparison is quite striking. If you've ever taught a child what a spoon is, you probably didn't show them 10,000 unique spoons. Yet, they can still recognize a spoon when they see one. So, in this respect, AI systems currently require a lot more data to learn than humans or most animals.

This is why many AI researchers are excited about the potential of unsupervised learning. They believe it could be a way for AI to learn more effectively and more naturally, like humans or animals, but with much less labelled data in the future.

Unsupervised learning is useful for tasks such as anomaly detection, clustering, and dimensionality reduction. Its often easier to understand these concepts with an example.

Clustering example

Imagine you're running a bookshop in a bustling part of town. Your inventory includes a diverse range of books, from bargain paperbacks to high-end, collector's editions. To gain a better understanding of your customers' purchasing patterns, you start to track their purchases. Specifically, you note the number of books each customer buys and the average price per book they pay.

After analysing this data, you spot two distinctive patterns, or "clusters".

The first cluster consists of customers who buy many low-cost paperbacks. If your shop is in a busy commuting area, this group might represent daily commuters who enjoy catching up on reading during their travel, but don't want to spend too much on each book.

The second cluster includes customers who purchase fewer books but opt for the more expensive, collector's editions. This group could represent book enthusiasts or collectors who don't buy as many books but are willing to spend more on each purchase for a special edition.

These clusters, identified by an unsupervised learning algorithm, provide valuable insights for market segmentation. By understanding these customer behaviours, you can tailor your marketing strategy. For instance, you could target special offers on paperbacks at daily commuters, while promoting collector's editions to book enthusiasts. This strategy would cater to each group's specific buying preferences, enhancing your shop's appeal and potentially boosting sales.

Moorfields Eye Scan data

Following on from the Moorfields eye scan example, a separate study discovered some unexpected patterns from retinal data.  

  • A model was fed 85,000 retinal images without being given any other information.  
  • Currently clinicians are unaware of any distinct rental feature variations between males and females, but the algorithm organised the images into two clusters which related to the sex of the retinal patient achieving a 97% accuracy rate.  
  • The AI had found something that researchers are now trying to understand.  Perhaps the neural network picked up patterns that are too subtle to the human eye.  We just dont know.  (Scientific paper in Nature : )
Fraud Detection  

Unsupervised learning excels in fraud detection using anomaly detection and cluster analysis- it doesn't just wait for known fraud patterns; it actively hunts for new and emerging ones. 

Anomaly detection finds transactions that stand out from the norm.  Fraudulent patterns keep evolving, making them hard to spot with traditional rules

Using algorithms like Isolation Forest or One-Class SVM, the system learns what 'normal' transaction behavior looks like. Once it has a sense of the regular patterns, it becomes adept at flagging transactions that deviate significantly from the norm. These anomalies, or outliers, are potential frauds.

Cluster Analysis uses algorithms to group similar transactions together.

Discoveries could include groups of transactions that don't fit any known customer behavior - a sudden cluster of high-value transactions in a foreign country late at night might be suspicious. Or, a set of online transactions that all come from the same IP address but use different card details.

Fraudsters are always evolving, but with unsupervised learning, the bank's system is continuously learning. As new data comes in, the model adapts, ensuring it remains effective even as fraudulent tactics change.

This works best when AI is paired with human interaction - Once potential frauds are flagged, they're not immediately labeled as fraudulent. Instead, they're often reviewed by human experts. This collaboration ensures fewer false alarms and helps refine the system further.

May 9, 2023

Read our latest

Blog posts