RxT-Beta-Micro-Supervised 290M vs Stateless LLM Reference 275M

Compare Experimental Reactive Transformer with Stateless LLM Reference, trained on the same limited real-world data.

Both models were pre-trained on 10B tokens from english wikipedia and FineWeb-edu, then fine-tuned on 1.1M single interactions and on 30k filtered multi-turn conversations.

That's very small amount of pre-training data, compared to 1T/2T tokens in production small LLMs. Experiment is made to prove that RxT is learning faster and achieve better results, even after very short training.

Accuracy (next token prediction) in multi-turn conversation training (validation dataset):

RxT 88%
LLM 60%

Limitations

Supervised version of the model is still in intermediate stage and will be further improved in Reinforcement Learning stages (demo will be constantly updated), so model could generate inaccurate answers and memory retention is weak.