Artificial Intelligence
5 mins

Training AI Systems - Learning Types

Learning Types refer to the fundamental ways in which an AI model learns from data. This category includes: Supervised Learning, Unsupervised Learning and Reinforcement Learning


Learning Types refer to the fundamental ways in which an AI model learns from data. AI Models can use traditional machine learning algorithms (statistical or mathematical models that learn patterns from data), neural networks or  higher-level strategies that an AI model can use to enhance its learning process.

This article covers traditional machine learning algorithms. Other articles cover the differences between Machine Learning and Deep Learning AI algorithms and Advanced Training of AI Systems - Learning Strategies and Paradigms which covers the higher-level strategies that an AI model can use to enhance its learning process, often regardless of the learning type being used. This category includes: Transfer Learning, Multi-task Learning, Active Learning and Semi-Supervised Learning.

Different Approaches to Training AI Systems

There are three main approaches to training AI systems with traditional machine learning algorithms. Each has its own unique advantages and challenges.

Supervised Learning: In supervised learning, AI systems are trained on labelled datasets, which contain input-output pairs. The AI system learns to predict the output based on the input by minimising the difference between its predictions and the actual output. Supervised learning is useful for tasks such as image classification, speech recognition, and natural language processing. Click here to read our blog on Supervised Learning

Unsupervised Learning: Unsupervised learning involves training AI systems on unlabelled data, without any predefined input-output pairs. The AI system learns to identify patterns, correlations, or structures within the data by itself. Unsupervised learning is useful for tasks such as anomaly detection, clustering, and dimensionality reduction. Click here to read our blog on Unsupervised Learning

Reinforcement Learning: In reinforcement learning, AI systems learn by interacting with their environment and receiving feedback in the form of rewards or penalties. The AI system aims to maximise the cumulative reward over time by choosing the optimal sequence of actions. Reinforcement learning is suitable for tasks such as game AI, robotics, and autonomous vehicles. Click here to read our blog on Reinforcement Learning

Why is Training AI Systems Important?

Training AI systems is crucial for several reasons:

  1. Improving Accuracy: Proper training enables AI systems to make better decisions, increasing the accuracy of their predictions, classifications, and overall performance.
  2. Generalisation: Training helps AI systems to generalise their knowledge and apply it to new, unseen situations. This adaptability is essential for AI systems to handle a wide range of scenarios and challenges.
  3. Reducing Bias: Well-trained AI systems can minimise biases in their decision-making processes, leading to more objective and fair results.
  4. Enhancing Efficiency: Training AI systems can optimise their performance, reducing the computational resources and time required to accomplish tasks.

Training Chat GPT

Supervised & Reinforcement Learning - Chat GPT

ChatGPT, like other models developed by OpenAI, was primarily trained using "supervised learning" on a diverse range of internet text. Supervised learning involves training a model on a labeled dataset, where the inputs (text prompts) are associated with the correct outputs (responses or completions), allowing the model to learn the patterns and relationships within the data.

Additionally, OpenAI has used techniques akin to "reinforcement learning from human feedback" (RLHF) to further refine ChatGPT's performance. This process involves several steps:

  • Supervised fine-tuning: The model is first fine-tuned on a dataset of human-generated prompt-response pairs to better align its outputs with what humans consider high-quality responses.
  • Reward modelling: Human labellers rate the quality of model-generated responses to various prompts. These ratings are used to train a reward model that predicts the quality of a response.
  • Proximal Policy Optimisation (PPO): Finally, reinforcement learning, specifically a method called Proximal Policy Optimisation (PPO), is used to fine-tune the model. The model generates multiple responses to a prompt, the reward model evaluates these responses, and then the model is updated to maximise the predicted reward.

This combination of supervised learning and reinforcement learning from human feedback helps in achieving a more nuanced understanding and generation of human-like text. It allows the model to better adapt to the intricacies of human language and preferences, resulting in outputs that are more aligned with what users find helpful, informative, and engaging.


Training AI systems is a complex and essential process that enables these systems to make accurate predictions, generalise knowledge, reduce biases, and enhance efficiency. By drawing inspiration from how animals learn through rewards and intrinsic motivations, AI developers can create effective training methodologies for supervised, unsupervised, and reinforcement learning. As AI systems continue to advance and play an increasingly significant role in various domains, the importance of refining training techniques and understanding their implications cannot be overstated. By learning from the training approaches used for models like ChatGPT, we can pave the way for more sophisticated and beneficial AI applications in the future.

April 24, 2023

Read our latest

Blog posts