MIT researchers have taken a smart step forward with a new framework called SEAL. This system enables large language models to generate their own training data, cutting down on the need for external content and giving the technology a boost when it comes to self-improvement.
At its core, SEAL operates in two clear phases. First, the model creates what it calls ‘self-edits’ by using reward learning, where natural language instructions set out how to produce new data and define optimisation goals. Next, it follows these instructions by updating its own weights through machine learning, with reinforcement learning helping it steadily refine its changes.
The framework’s engine, the ReST^EM algorithm, acts like a smart filter—it only keeps those self-edits that actually improve performance. On top of that, SEAL employs Low-Rank Adapters (LoRA) to allow for quick and cost-effective updates without the hassle of a full retraining process.
Two experimental tests back up SEAL’s promise. In one scenario, a model (Qwen2.5-7B) working on a text comprehension task moved the needle from 33.5% to 47% accuracy, even outdoing outputs from GPT-4.1 in quality, despite being built on a smaller model. In another test, using Few-Shot Prompting with Llama 3.2-1B for a reasoning task, the system jumped to a 72.5% success rate from a mere 20% without training.
Of course, there are still hurdles to clear. The issue of ‘catastrophic forgetting’—where learning new tasks can sometimes dampen performance on older ones—remains a concern. Not to mention, the process is quite resource-intensive, with each self-edit evaluation taking between 30 to 45 seconds.
Overall, SEAL offers a refreshing way to break past the limits imposed by traditional, human-written training data. By enabling ongoing learning and adaptation, it gives AI systems the potential to keep evolving and even generate valuable insights in areas that haven’t received much attention before.