This Week In AI: (January 27th - February 2nd 2025)

Table of Contents
This week in AI saw a handful of developments from OpenAI, including: two new model launches and an allegation against DeepSeek. The UK passed new laws surrounding AI-generated content, and Bletchley Park unveil a new exhibit! In research, Reinforcement Learning is making strides! UC Berkeley finds RL outperforms supervised fine-tuning in foundation models, MetaFAI develop a new model-free RL algorithm that requires little to no fine-tuning, and Meta introduce diffusion tokenizers.
OpenAI's o3-mini
On January 31st 2025, OpenAI unveiled o3-mini, the newest addition to its reasoning series of models. It was designed to deliver advanced capabilities in science, technology, engineering, and mathematics (STEM) subjects while maintaining low cost efficiency and reduced latency in comparison on to its o1 series of models.

This marks the first ever time that OpenAI has made any of its advanced reasoning models available to free-tier users, albeit with severe usage limitations. Obviously, this comes as somewhat of a strategic move as DeepSeek‘s R1 model was released last week which rivalled ChatGPT’s performance.
Interestingly, for this article, I wanted to see what ChatGPT o3-mini had to say when asked ”what do you do that is new compared to the o1-mini model?”. Well…the o3-mini model might be amazing at maths, but for some strange reason, it does not know of its own existence which previous models have been.

UK Introduces Legislation For AI-Generated Sexual Abuse Content
In a world-leading move, the U.K has announced forthcoming legislation to criminalise the creation, possession, and distribution of AI-generated child sexual abuse material and non-consensual sexually explicit deepfake images. The proposed laws, set to be included in the upcoming Crime and Policing Bill, will make offenders who generate such content face up to five years in prison. Additionally, possessing manuals that instruct users how to use AI for abusive purposes will carry a three-year prison sentence.
The U.K has always had a proactive stance in being a world leader to addressing challenges posed by the emerging technologies, especially surrounding the safety of the people. In 2024, AI-generated child abuse images surged nearly five-fold. Experts have welcomed the new laws but call for further regulation surrounding AI misuse.
Did DeepSeek Use ChatGPT To Train R1?
After the launch and announcement of DeepSeek’s R1 model last week that rivalled OpenAI’s ChatGPT, OpenAI is now investigating DeepSeek for allegedly using ChatGPT to train R1. The main concern centres on the use of a technique called “distillation”, where a new model is trained to replicate the behaviour of a larger advanced model using its outputs. While distillation is common practice in AI development and research, OpenAI’s terms of service prohibit using their outputs to develop competing models.
Microsoft, a major partner of OpenAI, detected unauthorised data extraction potentially linked to DeepSeek. DeepSeek has not publicly responded to these allegations, but I am sure we will discuss further developments on this newsletter!
OpenAI Launches ChatGPT-Gov
On January 28th 2025, OpenAI launched ChatGPT-Gov, a specialised version of its popular AI chatbot specifically tailored for the use of U.S government agencies. This aims to provide secure and efficient access to OpenAI's advanced models for U.S officials. This introduction will allow agencies to automate routine administrative processes securely, providing faster responses to public inquiries, and improve citizen engagement.
Bletchley Park Unveils "The Age of AI" Exhibition
Bletchley Park, known for its pivotal role in the U.K. in World War 2 for codebreaking, launches a new exhibition this week titled ”The Age of AI”. The exhibit shows visitors the evolution of artificial intelligence, tracing its roots from the groundbreaking work of wartime codebreakers to its current applications and future possibilities.

Visitors will explore the contributions of pioneers such as Alan Turing (famous for cracking the Enigma code whilst working at Bletchley Park during WW2), Donald Michaels, and Irving Good.
SFT vs RL in Foundation Models
A new study from HKU and UC Berkeley finds that supervised fine-tuning (SFT) in foundational models post-training leads to behaviour like memorisation whilst reinforcement learning enhances generalisation massively across textual and visual tasks. The empirical evaluation found that the RL-driven models generalise better by learning principles to drive the best outcome, whilst the SFT-driven models memorised the training data and repeated those actions during evaluation. Despite RL clearly having advantages in this study, the authors acknowledge that SFT remains more useful currently to stabilise the model's outputs.

Diffusion Tokenizer
A new research UC San Diego introduces DiTo (Diffusion Tokenizer), out of UC San Diego introduces DiTo (Diffusion Tokenizer)- a self-supervised approach for learning compact visual representations crucial for image generation. Unlike prior works, DiTo simplifies the process by using a single diffusion L2 loss, making it more efficient and scalable.

The empirical evaluation demonstrates DiTo achieving superior performance compared to state-of-the-art tokenizers in image reconstruction. Interestingly, DiTo excels in retaining intricate details like text and symbols, areas where prior works have struggled.
Advancing General-Purpose Model-Free RL
This week, Meta FAIR introduced MR.Q compared, a model-free reinforcement learning (RL) algorithm that enhances sample efficiency and generalization. Unlike other RL methods that require extensive hyperparameter tuning for different tasks, MR.Q achieves competitive performance across 118 diverse environments using a single set of stationary hyperparameters. As someone in RL research, this is pretty huge!

MR.Q essentially bridges the gap between model-based and model-free RL by learning approximately linear value function representations without the computational burden of simulated rollouts. I might have to do a paper spotlight on this one!