Decoding AI Buzzwords In Simple Way

LLM (Large Language Model)

ChatGPT, Claude, Gemini are LLM. Basically LLM is an AI Model which can understand and generate human like responses. It is trained on massive amount of data mostly billions of sentences, articles, documents, etc.

Tokens

Tokens are small pieces of text that AI uses to read and understand language. AI does not read full sentences like humans. Instead, it breaks everything into tiny units called tokens. A token can be a whole word, part of a word, punctuation, a space, even an emoji.

Examples:

“AI is changing the world.” Possible tokens: "Artificial", " intelligence", " is", " changing", " the", " world", "."

“Unbelievable performance today!” "Un", "believable", " performance", " today", "!"
The word count here is 3, but the token count is 5 because tokens aren’t always full words, long or uncommon words often get split into multiple tokens. So when estimating tokens for a paragraph, it’s safer to add about 10–20% to the word count.

Context

Context is basically how much previous text or conversation the LLM can remember, and use at one time. You might have seen “The conversation is too long. Let's start a new one.” or “Context length exceeded.” messages in ChatGPT; It is because of the context length. Every AI model has a fixed limit called a context window (like 16k, 128k, 200k tokens, etc).

In a context window your current message, all previous chat messages, system instructions, any documents you’ve given it gets stored. LLM remembers that and personalises the responses.

More context = more consistent answers.

GPT 5 has ~ 400k tokens context window. Claude Sonnet 4 has ~ 200k - 1000k tokens context window. Gemini 2.5 has ~ 30k - 1000k tokens context window.

RAG (Retrieval-Augmented Generation)

As we know every model has a limited number of context window. to get more personalised answers we need to feed more data to the LLM.

Lets suppose you have 1000 page manual/ to work with a software. You want a answer to a question which might need to go through 2-3 pages; and you need a human like small and crisp answer. so you decide to give all the data to the LLM. and you get the answer as “Context length exceeded”😂. It happens because LLM can process a limited amount of tokens per session. So there is no chance that you can feed all your data to the LLM and it will return the result. But if you know that the answer may be lies in Chapter number 3 which contains hardly 10-11 pages. and this much amount of tokens can be handled by the LLM.

RAG basically does the same thing. we can feed all the data to the RAG. The RAG will index the data based on its semantic meaning (the actual meaning). and whenever necessary it will retrieve the specific data which might contain that information. Then we can feed this information to LLM and he can answer our queries very well.

Vector Embeddings

When you visit a bank generally the gatekeeper asks you what are your concerns and according to that he asks you to go to the specific counter. Then you realise that all the peoples present in the queue has the same concern or work to do. Similarly in RAG when you give the data to the RAG a magical functions gives you some address/location in form of embeddings, generally these are array of floats. so when the next time the user queries something it also generate a location/embeddings for his query. and checks which kind of data is closer to that location and returns it. which then can be passed to LLM to give them more context.

RAGs generally use Vector Databases which are like an array of floats which represents the semantic meaning of that data. Vector embeddings are a way of converting words, sentences, or even images into numbers that represent their meaning. These numbers help AI understand similarity. For example, embeddings help AI know that “car” is closer in meaning to “vehicle” than to “banana.”

Embeddings Model

Remember the magic function we discussed in vector embeddings this is the embedding model. It is kind of AI model that converts text (or images, audio, code) into numbers; specifically, a long list of numbers called a vector.

Tools in AI

LLM is just a conversational AI. He can give response from the data which it is trained on. also LLM cant perform any tasks like web browsing or sending email. to do so with help of tools we give LLM the ability to perform such tasks.

For Example: a Weather Tool. LLM don’t have access to the latest data we have to feed llm the latest data to get the accurate results. Generally while developing the app we design some tools. Here a weather tool calls an api to get weather update in Mumbai. so when user give prompt like “What is the colour of sky“ it returns blue; because it is general information. But when we ask “what’s the weather in Mumbai today”. he dont know because AI doesn’t have access to the latest data. so it will call the weather tool to get the weather and return a human like response.

AI Agent

AI Agent is an AI which can think, plan and take actions of it own to complete a task based on the tools he has. Any AI which can decide which tool to call by its own is an AI Agent.

Example: Email & Calendar Agent
User gives prompt “Schedule a meeting with Pranit for Thursday at 6 PM and send him an invitation.”

The Agent will call the calendar tool to check availability. again it will call calendar tool to create an event. then it will call email tool to send email to Pranit. AI decided itself what actions to take and it performed the actions. thats the only difference between AI and AI Agent.

MCP (Model Context Protocol)

Before MCP every AI model had a different way to connect to tools which was complicated and messy. Then MCP got introduced which is a standard protocol for communication between apps and LLMs. It is as similar to REST Apis but for LLMs. just like we write api endpoints in REST to work with the data in MCP server we write actions; But with more context for LLMs. It consists the description for each action, when to use, will it affect the data or not? Is it a pure action or not, etc.

Example: Note Taking App

You create an MCP server that exposes actions like: createNote getNotes updateNote
Now ChatGPT or Claude can interact with your app through MCP. Even in cursor you can register your MCP server and it will get the data from there.
When you give a prompt like “Create a note saying I finished the meeting at 5 PM.” AI calls createNote through MCP. The note appears in your app automatically

AI Buzzwords Explained Simply

LLM (Large Language Model)

Tokens

Examples:

Context

RAG (Retrieval-Augmented Generation)

Vector Embeddings

Embeddings Model

Tools in AI

AI Agent

MCP (Model Context Protocol)

Example: Note Taking App

Comments

More from this blog

Building Scalable SaaS: Multi-Tenant Architecture with PostgreSQL & TypeORM (Design & Implementation)

Command Palette

LLM (Large Language Model)

Tokens

Examples:

Context

RAG (Retrieval-Augmented Generation)

Vector Embeddings

Embeddings Model

Tools in AI

AI Agent

MCP (Model Context Protocol)

Example: Note Taking App

Comments

More from this blog