I've been working on solving this with quite a bit of success, I'll be sharing more on this soon. It involves having 2 systems 1st system is the LLM itself and another system which acts like a 'curator' of thoughts you could say.
It dynamically swaps in / out portions of the context. This system is also not based on explicit definitions it relies on LLMs 'filling the gaps'. The system helps the llm break down problems into small tasks which then eventually aggregate into the full task.
This is a great idea. What you are doing is a RAG over the chat.
In the future such a distinction in memory hierarchies will be more clear
- Primary memory in the training data
- Secondary memory in context
- Tertiary memory in RAG
Sounds like an exciting idea.
May I suggest - put what you have out there in the world, even if it’s barely more than a couple of prompts. If people see it and improve on it, and it’s a good idea, it’ll get picked up & worked on by others - might even take on a life of its own!
Have a look here, it's an early preview
https://x.com/zacksiri/status/1922500206127349958
You can see it's going from introduction, asking me for my name, and then able to answer question about some topic. There is also another example in the thread you can see.
Behind the scenes, the system prompt is being modified dynamically based on the user's request.
All the information about movies is also being loaded into context dynamically. I'm also working on some technique to unload stuff from context when the subject matter of a given thread has changed dramatically. Imagine having a long thread of conversation with your friend, and along the way you 'context switch' multiple times as time progresses, you probably don't even remember what you said to your friend 4 years ago.
There is a concept of 'main thread' and 'sub threads' involved as well that I'm exploring.
I will be releasing the code base in the coming months. I need to take this demo further than just a few prompt replies.