Ever tried to get a computer to do what you want? Suddenly the ChatGPT understands what you are saying, (mostly) . So what does that GPT stand for?
GPT or Generative Pretrained Transformer is a neural network that Generates responses, has been Pretrained with a large amount of data, and is based on a Transformer neural network architecture. Transformer neural networks use attention to detail to allow computers to understand language.
All of that may sound pretty complicated but, it is not as bad as it sounds. Let’s take a look.
Table of Contents
- What is GPT Generative Pretrained Transformer?
- So What Exactly Is a Neural Network How does it work
- Why Should You Care About All That Stuff?
What is GPT Generative Pretrained Transformer?
Let’s consider each of the letters separately, then jump into the AI stuff (it’s really cool).
- Generative – In machine learning, generative models are a type of program that can create new information, such as new images, new text, new audio and etc.
- For ChatGPT the new text will be similar to the text that the neural network was trained on.
- ChatGPT is not the only Generative AI, others can make images, increase resolution, and more.
- The key to a generative neural network is the text that it is trained on and the architecture of the neural network.
- Pretrained – GPT was and continues to be trained with a huge mountain of text, Bookcorpus among others. Because it already has a huge amount of data preloaded and analyzed into a neural network with 175 million parameters (currently), it can answer most questions about human knowledge.
- The current training data only goes through 2021
- ChatGPT is not infallible and does make mistakes.
- Transformers – This is a type of deep-learning neural network that was invented in 2017. It was detailed in the paper, Attention Is All You Need. It changed the game because it was generative and used attention to mimic how humans keep track of what is important.
- ChatGPT’s Transformer uses attention to understand what part of a text is important
- Different parts of a text are dependent on other parts,
- Example: George is a he. So “he” represents George it depends on George
- The way that attention is applied by a computer amounts to a lot of equations
- The result is though that the computer is able to “understand” and reply much better than before.
So What Exactly Is a Neural Network How does it work
Transformers are based on a neural network structure. To understand that better it will help if we take a look at what a neural network is.
A neural network is a type of machine learning algorithm that is modeled after the structure and function of the human brain. It is composed of interconnected nodes, also known as artificial neurons, that process and transmit information.
That is a mouthful, but what it means is that just like the picture below there are nodes that pass information from one to another. The nodes in a computer network are not physical like they are in our brains but the code is modeled on them.
Neural networks can have multiple “layers” You can imagine it like a camera lens where the light is altered a bit by each lens it passes through until it gets to the detector.
In a neural network, information passes through each neural network layer. The information is analyzed by the neural network just like the light is altered in the system of lenses.
So How Do You Create the Neural Network To Begin With?
Training!
Data Goes Into the Neural Network —> Neural Network Gives an Output —-> Measure How far off the Answer was (Error) —-> Use An Algorithm to “Back Propagate” the Error to make it Smaller —> Repeat
In Supervised Learning (One Way to Train a Neural Network)
- The input data is passed through the network, and the output is compared to the corresponding true value
- The error is calculated by comparing the predicted output and true value.
- The error is then propagated back through the network, and the weights are adjusted in a way that reduces the error.
- This process is repeated multiple times with different data, and the weights are adjusted each time.
The optimization process uses a bunch of math that would not make this clearer but if you are interested…
As the neural network is trained by sending data into it the neural layers (lenses in our example), the layers start to change. The first layer will start to sort out the easiest to observe data. For vision that would be detecting edges, for text, it is understanding the meaning of each word in the context of the sentence.
The changes in the neural layers are implemented be giving the neural network feedback.
Let’s Make a Cat Detecting Neural Network
For instance, you can teach a neural network to determine if an animal is a cat.
First you might start the training be making sure it was able to detect the edges of the animals face.
Next we could train it to look for eyes.
Then we could train it to look at the pupil of the eye.
Then we could teach it the difference between round pupils and vertical slit pupils.
Then we could teach it that cats have vertical slit pupils.
Then we could ask if this is a picture of a cat.
Eyes are only one feature of a cat, to really make sure the picture was a cat we would need to look at more features. So we would need to train those as well.
That does not make for a very useful AI (unless you really like cats).
So What is ChatGPT?
ChatGPT is a more general neural network, it is limited though to outputting text just like the neural network I described is limited to detecting cats.
It is a little more complicated than the neural network that I described because of Transformers
So What Is A Transformer?
The machine learning community calls Transformers a neural network architecture.
I find that a little complicated. I would say instead that
- Transformers are a way to send data to a neural network and let it know what’s important in the data (attention).
- The same is true in the output, the neural network focuses on what in the output is important (attention) and how it is connected to the entire chat.
So let’s take a look at an example.
Let’s start a fresh chat.
Its fleece was?
I’m sorry, I’m not sure what you are trying to say. Could you please provide more context or clarify your question?
Mary had a little Lamb, Its fleece was white as snow. Its fleece was?
Its fleece was white as snow.
Without any more information, ChatGPT could not answer the question. Most of us would guess that the answer was, “Its fleece was white as snow.”, because we have heard the poem many times. We already have a “meme” if you like in our minds about fleece.
ChatGPT does not have the same context that we do. It does not automatically connect fleece + Mary poem until we give it the poem.
We connect them through what the AI community calls “Attention“, it lets us know that the answer Depends on that poem.
When we give ChatGPT the poem and the question it also connects the answer to the rest of the text that we gave it.
How Does GPT give Attention?
The essence is that the text that you input is first analyzed (encoded), to produce what is known as a context vector. The context vector has information in it that,
- Indicates what part of the text is important
- Indicates how those important parts are related to each other
- Other abstract information about the text input
- This is how the neural network creates “Attention”
The context vector AND the text input are then sent to the decoder.
The decoder takes the context vector and the text input and uses the context vector to help understand the text input. The decoder then creates a text output.
The text output is created using once again “Attention” to make sure that it not only matches the text input but also the rest of the information in the chat.
There is a correction mechanism that allows the output to be tested to make sure it is matching the input.If it is not close enough the decoder neural network is “Tuned In”, and the decoder tries again until the output is close enough to the answer what the input was asking.
Wow! It is a lot simpler just to look at the figure!
So What’s GPT Again?
Generative – Makes Stuff
Pretrained – Already read a lot of stuff
Transformer – A Special type of Neural Network that uses attention analysis to help the neural network understand language, pictures, etc..
Why Should You Care About All That Stuff?
The fact is that most people don’t really know how a computer works, they just use it. The same will be true with AI programs like ChatGPT. Knowing what your computer can and can’t do though is important. Understanding AI is going to be like understanding computers in the 1980’s. The people who did went a long way. The people who didn’t may have needed to find a new job.
Take a look at this graph from the Visual Capitalist. Is this right? I couldn’t say for certain but the trend certainly is. The lowest paying most repetitive jobs will probably be lost.
If you can read the writing on the wall it makes a big difference to where you go next if you are smart. It also opens up huge opportunities for entrepreneurs. I wish you the best and keep learning.