The Baby's Journey

As we saw in the last part, AI transitioned from its teenage phase (Machine Learning) to adulthood (Deep Learning) with neural networks, allowing it to recognize patterns and make decisions.

But adulthood came with new challenges—users didn’t just want AI to classify things or detect spam. They wanted AI to create something new, whether it was text, images, music, or even code.

This marked the birth of Generative AI, a new phase where AI wasn’t just an adult following rules—it became a creative mind that could generate content.

What is Generative AI and How Does It Work?

Ha! So, what exactly is Generative AI (GenAI), and how does it work? Let's break it down in a simple way.

At its core, GenAI is powered by Transformers, and you can think of a Transformer model as having two key components:

The Parameters File – This is like a vast collection of knowledge and information that the AI has learned from (typically billions of pre-trained data points).
The Generator File – This acts as the brain, making sense of the stored information and generating new content based on it.

Much like how humans learn—we accumulate knowledge over time, and our brain helps us generate new ideas based on what we know—Transformers work the same way.

So, how are these parameters actually formed? It all comes down to two crucial stages: pre-training and fine-tuning

The Two Stages of Learning: Pre-training & Fine-tuning

Just like humans, AI also learns in two major stages:

1️⃣ Pre-training – Learning on its own
2️⃣ Fine-tuning – Getting corrections from a mentor

Let’s break this down with an analogy.

Stage 1: Pre-training (Self-learning like a Curious Child)

Imagine a child who is learning on their own. They read books, watch videos, and observe the world around them. Over time, they gather a lot of knowledge, but there’s a catch—

Some of their learnings might be incorrect because they have no way to verify if what they absorbed is right or wrong.
For example, a child might think that all flying creatures are birds—but what about bats?
In their mind, since bats have wings, they must be birds.

This is exactly how pre-training works in AI.

During this stage, AI models learn from massive amounts of data (books, articles, code, conversations, etc.).
They detect patterns and form their own understanding of the world.
However, just like the child, they don’t always get things right—there can be biases, misinformation, or incorrect conclusions.

So, what’s next? Just like how humans need guidance to refine their knowledge, AI also requires fine-tuning.

Stage 2: Fine-tuning (Getting Mentored for Accuracy)

Now, let’s say the child meets a teacher who corrects their understanding:

The teacher explains, "Not all flying creatures are birds! Bats are mammals."
Now, the child updates their knowledge and won’t make the same mistake again.

This is exactly what happens during fine-tuning in AI:

Experts manually refine the AI’s knowledge by providing corrected information and reinforcing good behavior.
This is done using specialized datasets or real-time feedback from human trainers.
It helps AI become more accurate, reliable, and context-aware.

For example:

A pre-trained AI chatbot may answer questions incorrectly or lack specific industry knowledge.
But after fine-tuning with domain-specific data (like finance, law, or medicine), it becomes more precise and useful.

Why Both Stages Matter?

Without pre-training, AI won’t have a foundational understanding.
Without fine-tuning, AI will make too many mistakes and won’t be trustworthy.

It’s like a student who studies hard but needs a teacher to guide them.

And with this, AI becomes smarter, just like how we evolve through learning and mentorship.

Wait… That Sounds Like Us Too! 😄

Doesn’t this seem a lot like how humans function?

We learn from books, experiences, and conversations, but unless we actively update our knowledge by reading new articles or learning new concepts, we remain stuck with what we knew before.

Similarly, for AI to stay relevant, it needs continuous updates—which is where fine-tuning and real-time learning come into play.

Now that we've understood how Generative AI comes to life, it's time to explore the types of GenAI models—because these models define what GenAI is truly capable of.

At the core, a model is simply a system that represents the abilities of GenAI—what kind of input it can understand and what kind of output it can generate.

There are two main types:
🔹 Single-modal models – These are trained to handle one type of input and output. For example, a text-to-text model that only accepts text and produces text.
🔸 Multi-modal models – These are capable of processing multiple types of inputs (like text, images, audio) and generating outputs across various formats. For example, you give it an image and ask for a caption, or provide a prompt and receive an image or voice output.

So, the type of model used in GenAI decides its range of capabilities and the richness of the user experience.

Models vs. Multi-Models: Understanding the Difference

AI models come in different types, but the major distinction is between single models and multi-models.

📌 Single Model: One Input, One Output

A single model is designed for a specific task—it takes one type of input and produces one type of output.

🔹 Example:

A spam detection model only takes emails as input and classifies them as spam or not spam.
A language translation model takes English text as input and translates it into French text.

While these models are effective in their narrow domain, they can’t handle multiple types of inputs or generate diverse outputs.

📌 Multi-Model: Multiple Inputs, Multiple Outputs

A multi-model (or multimodal AI) is designed to handle multiple types of inputs and generate multiple types of outputs. It understands and processes information across different formats like text, images, audio, and video.

🔹 Example:

A modern AI assistant (like GPT-4o) can take a question in text, voice, or image format and respond with text, speech, or even an image.
Self-driving cars use multi-model AI:
- Cameras (image input) + LIDAR (3D depth sensing) + GPS (location input)
- AI processes all of them together to make driving decisions.

Why Are Multi-Models Powerful?

1️⃣ Better Understanding – They combine information from multiple sources, making them more context-aware.
2️⃣ More Human-Like – Just like how we use sight, sound, and touch to understand things, multi-model AI integrates multiple senses.
3️⃣ Versatile Applications – From AI tutors that read and explain handwritten notes to AI artists that generate music based on mood descriptions.

AI at this stage is wise, efficient, and highly skilled, but it still has limitations—it excels in specific tasks but lacks general intelligence

But What’s next ?

🤖 Artificial General Intelligence (AGI)

AGI is not just a better AI; it’s a completely different level of intelligence.

🔹 Current AI (Narrow AI/ANI)

Trained for specific tasks (e.g., Chatbots, Image Generators, Self-driving cars).
Relies on pre-trained data—it cannot truly reason or think independently.
Lacks adaptability—it can't generalize beyond what it has learned.

🔹 AGI (Artificial General Intelligence)

Learns, reasons, and adapts like a human.
Can solve completely new problems without requiring pre-training.
Understands context deeply and can make independent decisions.

🚀 How Do We Get There?

The journey to AGI requires advancements in:
1️⃣ Better Memory & Adaptability – AI needs to remember and apply past knowledge dynamically.
2️⃣ Reasoning & Logic – Instead of just predicting patterns, AI must logically deduce answers like a human.
3️⃣ Self-Learning & Autonomy – AI must be able to teach itself new things without human intervention.
4️⃣ Emotional & Social Intelligence – True AGI will need to understand human emotions, intentions, and ethics.

The Baby's Journey - III

What is Generative AI and How Does It Work?

The Two Stages of Learning: Pre-training & Fine-tuning

Stage 1: Pre-training (Self-learning like a Curious Child)