ChatGPT Memory Explained: How It Really Works

LLMs don't truly remember; most 'memory' is context plus optional personalisation. Learn why long chats degrade, what ChatGPT can save, and a simple workflow for more reliable outputs.

15 Feb 2026

LLM
ChatGPT
AI
Prompting
Chatbots
Productivity

If you've ever thought "ChatGPT is getting confused" or "it forgot what I said earlier," you're not imagining it. Most LLMs simulate memory using the text you give them each turn (the "context"), and that has real limits.

This post explains how LLM "memory" works (with ChatGPT as the concrete example), why long chats degrade, and a simple workflow you can use to get more consistent results when you're doing real work.

TL;DR (key takeaways)

LLMs don't remember like humans; they respond based on the text included in the current context.
In most chat apps, each new message is sent along with some conversation history, which can make long threads drift and cost more tokens.
LLMs have a context window limit (tokens). When you exceed it, older parts may be dropped.
ChatGPT can also personalise using Saved Memories and (if enabled) chat history references from previous chats.
Best habit: keep chats topic-focused; when a thread gets long, ask for a short summary/plan and restart in a new chat.

A quick mental model

Think of each response as coming from a bundle of text assembled at request time:

(Your prompt)
+ (some recent chat turns / conversation history)
+ (optional: personalisation snippets like saved memories / chat-history references)
---------------------------------------------------------
= context sent to the model -> response

Memory type #1: Context (what's in the chat right now)

LLMs only know what you send in the current request. The "memory" you experience in a chat usually comes from the app including parts of the conversation history in the next request.

The Spain/Portugal example

user: What is the capital of Spain?
assistant: Madrid
user: What about Portugal?
assistant: Lisbon

"What about Portugal?" only makes sense because the app includes prior context for the model to infer you mean "capital of Portugal."

When you send the first message, the model sees something like:

user: What is the capital of Spain?

When you send the second message, the app typically sends all messages in the thread so far (until limits force it to drop older parts), so the model sees something like:

user: What is the capital of Spain?
assistant: Madrid
user: What about Portugal?

If the app didn't include the earlier messages, the conversation would look like this instead:

user: What about Portugal?
assistant: Sorry, I don't understand. What about Portugal?

Why long chats get messy

As a thread grows, you are effectively resending more and more history on every turn. Two things tend to happen:

Drift: multiple topics, old constraints, and half-finished ideas compete for attention.
Cost: more text in the context means more tokens per turn.

Depending on the product and plan, higher token usage can also push you into rate limits or daily/weekly caps sooner.

The fix is simple: treat a chat like a mini-project with one topic. When you change topics, start a new chat.

Context windows (tokens) and "forgetting"

LLMs also have a limit on how much text they can process at once (a context window, measured in tokens). If the conversation plus your new request exceeds that window, older parts may be dropped to make room.

As of the time of writing, ChatGPT GPT-5.2 has a 128k token context window, which is roughly ~512,000 characters of English text.

Rule of thumb (English): ~1 token is about ~3/4 of a word (or ~4 characters).

That is the most common reason people experience "it forgot what I said earlier" in long threads.

Token efficiency: why summaries work

A good summary acts like compression: instead of resending a long transcript, you resend a short spec that captures only what matters.

When a thread is getting long, ask for a compact handoff and restart:

Summarise our conversation so far in 8-12 bullet points.
Include: the goal, key decisions, constraints, assumptions, open questions,
and the next 3 recommended steps.

Memory type #2 (ChatGPT): Saved Memories + chat history references

The second kind of "memory" is when ChatGPT seems to remember you across chats. In modern ChatGPT there are two related mechanisms:

Note: these features have evolved over time, so what you see in your ChatGPT settings may differ from older explanations.

Saved Memories Small pieces of information ChatGPT stores to make future responses more helpful.
Chat history references (if enabled) ChatGPT may pull relevant details from prior conversations even if they weren't saved as a discrete "memory."

A simplified example

Suppose ChatGPT has learned (via saved memory and/or chat history) that you write a lot of C#.

If you ask:

"Can you give an example of some code to send an email?"

ChatGPT may choose C# by default because it can quietly add something like this into the request context:

(optional personalisation)
- You often use C#

Why this matters

Personalisation can reduce repetitive setup and improve relevance. But it can also bias outputs in ways you don't want (outdated details, wrong assumptions, or "sticky" preferences).

Maintenance habit:

Review saved memories occasionally; delete anything outdated or irrelevant.
If responses feel oddly steered, consider whether memories/chat history references are pushing the model.
For sensitive or "clean-room" work, use Temporary Chat or disable memory/history.

A practical workflow for better results

You don't need fancy prompting. A few operational habits do most of the work.

1) Use small, focused chats

Treat each chat as a single topic:

One chat: landing page offer + copy
One chat: technical design for an integration

Focused chats reduce irrelevant context, reduce drift, and make it easier to restart cleanly.

2) When a thread is long, extract a summary and restart

Use the summary prompt earlier, then start a new thread with:

the goal,
key decisions and constraints,
what "done" looks like,
the output format.

This is the highest-leverage way to keep quality high on multi-step work.

3) For complex work, split "plan" and "implement"

If you need lots of steps, stakeholders, edge cases, or artifacts, do two passes:

Plan chat: clarify requirements, list constraints, and produce a step-by-step plan.
Implement chat: paste the plan and generate the actual deliverables.

This reduces drift because the implementation thread starts with a compact spec instead of a sprawling transcript.

Temporary Chats and "do not use memory" (privacy and control)

Temporary chats are useful when you want a clean slate:

You don't want saved memories or chat history references influencing responses.
You don't want the current chat saved into memory.

Two caveats worth knowing:

Temporary chats will still follow your custom instructions if you have them enabled.
Some data may still be retained for limited periods for safety/abuse monitoring, and third-party tools (connectors/actions) can have their own retention rules.

You can also start with a prompt like:

"Do not use or rely on any saved / long-term memory."

It's a helpful signal, but if you need certainty, Temporary Chat (or disabling memory/history in settings) is more reliable.

What differs across tools (future-proofing)

The principles above apply to most LLM products, but details vary by vendor and app:

Naming: memory, personalisation, chat history, saved info.
Defaults: some store more by default; others store nothing unless you opt in.
Controls: per-chat toggles vs global settings.
Enterprise: admin controls, retention policies, logging, and compliance.

If you're putting LLMs into business workflows (support triage, internal knowledge bots, CRM helpers), these differences affect privacy, repeatability, and long-term maintenance.

Wrap-up

If you remember one thing: LLMs are powerful, but they only work with the context they're given. Keep chats scoped, restart with a summary when needed, and be intentional about personalisation settings.

References (official docs)

Memory and new controls: https://openai.com/index/memory-and-new-controls-for-chatgpt/
Memory FAQ: https://help.openai.com/en/articles/8590148-memory-faq
Temporary Chat FAQ: https://help.openai.com/en/articles/8914046-temporary-chat-faq
Data Controls FAQ: https://help.openai.com/en/articles/7730893-data-controls-faq
Custom Instructions: https://help.openai.com/en/articles/8096356-chatgpt-custom-instructions
Prompting best practices: https://help.openai.com/en/articles/10032626-prompt-engineering-best-practices-for-chatgpt

If you want help applying this inside your business

If you want AI to do real work inside your existing tools (support, CRM, internal docs) and you care about repeatability, privacy, and "fails loudly" automation, start with a free 45-60 minute AI opportunity & automation review. We'll map what's happening today and leave with a clear first build.

If that sounds useful, you can reach me here: Contact