LLM-embedded malware and ransomware represent a new cyber threat category where adversaries generate malicious logic dynamically at runtime using AI models, making attacks more adaptive and harder to detect than traditional fixed-code payloads. This rapidly evolving technique enables polymorphic, cross-platform threats, demonstrated in tools like MalTerminal and PromptLock, and is already being adopted by state actors and criminal groups for scalable espionage and ransomware operations.
Overview
Large language models (LLM) embedded malware is advancing from concept to practice, with operators wiring models directly into loaders so malicious logic is generated at runtime rather than shipped as a fixed payload. These new findings are crucial because every execution can produce different code or commands, which undermines static detection and delays containment. MalTerminal represents one of the earliest documented cases: a Python-based tool that utilizes GPT-4 via cloud APIs to generate either ransomware or a reverse shell dynamically. PromptLock demonstrates how a local model can be abused; it leverages an Ollama-exposed endpoint to generate Lua scripts directly on the victim machine. State-aligned experimentation has also been reported: PROMPTSTEAL, linked to APT28, embeds prompts and hundreds of API tokens to dynamically generate system commands during espionage operations. Early proofs, such as HYAS’s BlackMamba, further highlight polymorphic payload generation that mutates on each run to bypass endpoint controls. Because models and APIs are platform-agnostic, the same techniques can be used to target Windows, Linux, and macOS. Although large-scale campaigns have not yet been observed, the trajectory from experiments to targeted operations is clear.
Key Findings:
- LLM-embedded malware is emerging as a new category of threat, with malicious logic generated at runtime rather than being embedded in code, which complicates detection and response.
- MalTerminal uses GPT-4 through cloud APIs to dynamically produce ransomware or remote shells on demand, while PromptLock abuses a local Ollama model to generate Lua payloads for reconnaissance, theft, and encryption.
- State-backed experimentation has surfaced in LAMEHUG/PROMPTSTEAL, tied to APT28, which embeds prompts and hundreds of API tokens to generate commands for espionage dynamically.
- Early proofs, such as BlackMamba, demonstrate polymorphic code generation that mutates on each execution, further evading endpoint controls.
- Prompts-as-code and embedded API keys are the core enablers, making dynamic payloads possible but also leaving artifacts that defenders can monitor.
- Immediate Action: Inventory and constrain local LLM endpoints, enforce strong secrets management to prevent exposed API keys, and monitor for unexplained LLM API usage or runtime code generation events.
1.0 Threat Overview
1.1 Historical Context
The idea of embedding artificial intelligence into malware has circulated for years, but practical cases only began to surface between 2023 and 2025, as large language models (LLMs) became more powerful and widely available. The first known proof-of-concept was BlackMamba in late 2023, which used AI to generate polymorphic keylogger code that changed on every execution.[1] While experimental, it proved that AI could be leveraged to bypass signature-based detection by ensuring no two payloads looked alike. By mid-2025, this experimentation escalated to state-linked adoption. LAMEHUG/PROMPTSTEAL, attributed to the Russian APT28 group, embedded more than 280 HuggingFace API keys alongside prompt structures that dynamically instructed models to generate system shell commands for espionage.[2] This marked the first time a state actor had operationalized LLM integration inside real-world malware, using runtime AI generation to avoid fixed indicators and extend campaign resilience.
Soon after, ransomware-focused cases emerged that showed how LLM integration could power broader classes of malware. MalTerminal appeared as one of the earliest practical frameworks: a Python executable that queried GPT-4 via cloud APIs to generate ransomware or a reverse shell on demand.[3] By outsourcing payload creation to the model itself, MalTerminal introduced runtime variability that forced defenders to look for artifacts like embedded API keys and prompt structures instead of static signatures. PromptLock followed as another proof-of-concept, written in Go and designed to work with a locally hosted model accessed through the Ollama API.[4] Instead of relying on the cloud, it generated Lua scripts directly on the victim’s device to perform reconnaissance, data theft, and encryption, proving that adversaries could achieve runtime variability without external dependencies. Together, these cases demonstrate a clear progression: from academic proofs, such as BlackMamba, to espionage-focused adoption with PROMPTSTEAL, to ransomware prototypes in MalTerminal and PromptLock. The trajectory demonstrates how quickly AI has shifted from a peripheral tool for attackers to an embedded execution engine within malware, laying the groundwork for scalable, adaptive AI-driven threats.
1.2 Technique Breakdown
LLM-embedded malware differs from traditional threats because it does not always carry a fixed malicious payload inside the binary. Instead, it relies on prompts and access to a large language model, either cloud-based or local, to generate the malicious logic when needed. This means the “instructions” for the attack are created on demand, and the same malware can behave differently each time it runs. Defenders are forced to shift their focus away from static signatures and toward hunting for artifacts such as API keys, prompt structures, and suspicious model interactions.
Key Techniques Observed:
2.0 Recommendations for Mitigation
LLM-embedded malware introduces risks that bypass traditional detection, meaning leadership cannot rely solely on SOC monitoring or endpoint tools to prevent impact. The following five recommendations focus on strategic and technical safeguards that executives and IT leadership can implement directly. These steps address how models, APIs, and enterprise data flows are controlled, specifically the areas most likely to be abused if this threat progresses from a proof-of-concept to widespread use.
2.1 Control AI Model Access
- Require all business units to route LLM/API traffic through a secure enterprise gateway that enforces authentication, logging, and rate limiting.
- Block direct outbound calls from endpoints to public AI APIs to prevent hidden malware prompts from running undetected.
- Limit which apps and teams can access AI models, secure local AI runtimes with access controls, and restrict unmonitored use of external AI APIs to reduce the attack surface.
2.2 Secure API Keys and Secrets
- Mandate enterprise-wide storage of API keys in a centralized key vault with automated rotation every 90 days.
- Forbid developers and vendors from embedding API keys inside applications or local configs.
- Enforce executive-level reviews of key usage to ensure only authorized services have access.
2.3 Restrict Local AI Runtime Installations
- Block installation of local inference runtimes (e.g., Ollama, vLLM) on employee endpoints unless pre-approved.
- If local models are business-critical, isolate them on dedicated servers with no access to sensitive corporate data.
- Require IT teams to audit systems for unauthorized AI runtimes periodically.
2.4 Segment AI-Enabled Applications
- Place any AI-integrated applications in separate network zones, away from finance, HR, and executive systems.
- Forbid AI-driven tools from accessing shared drives containing regulated or sensitive data unless reviewed and approved.
- Require SaaS vendors to disclose whether their products embed AI models or API calls before adoption.
2.5 Harden Enterprise Data Paths Against AI-Generated Payloads
- Enforce strict allow/deny lists for file types and scripts that can move between user endpoints and core systems.
- Require content disarm and reconstruction for files entering through email, chat, or collaboration platforms to neutralize dynamically generated payloads.
- Implement executive-level mandates for network segmentation to prevent AI-generated ransomware from triggering on a single endpoint and subsequently propagating laterally across business-critical assets.
3.0 Preconditions for Exploitation
For LLM-embedded malware to function, certain conditions must be in place that enable the malware to generate and execute its malicious instructions at runtime. These preconditions highlight both technical dependencies and operational gaps that adversaries exploit. Unlike traditional malware, which arrives with fixed payloads, these threats depend on access to AI models, valid prompts, and execution environments that allow dynamically generated code to run. Understanding these requirements is critical for anticipating where defenses can break down.
4.0 Threat Actor Utilization
While many observed cases of LLM-embedded malware remain experimental, state-aligned groups and criminal operators are already exploring its potential. The table below summarizes how different real-world use cases or research discoveries illustrate adversarial use of AI models inside malware.
Takeaway: These examples demonstrate adversaries experimenting with both nation-state espionage and criminal ransomware tooling, employing techniques that span cloud APIs, local inference engines, and polymorphic payload generation. The trend suggests a growing interest in embedding AI directly into malware as a core capability, rather than a peripheral support tool.
5.0 Risk and Impact
The rise of LLM-embedded malware represents a significant strategic shift in cyber threats. By generating code dynamically at runtime, these tools undermine static detection, making each execution unique and thereby delaying identification and response. For organizations, this means attackers can more easily bypass endpoint defenses and tailor payloads in real time to the environment they compromise. The confirmed use of PROMPTSTEAL by APT28 shows that state-aligned espionage actors are already experimenting with this capability, while proofs like PromptLock and MalTerminal highlight how ransomware and remote access could evolve into highly adaptive, AI-driven campaigns. The impact extends beyond individual infections—dynamic, model-driven malware could scale across Windows, Linux, and macOS, eroding defenders’ visibility and increasing the risk of stealthy, persistent compromises.
6.0 Hunter Insights
LLM-embedded malware is poised to redefine the cyber threat landscape over the next 12–24 months, moving from experimental demonstration to targeted attacks. The ability to generate malicious code dynamically at runtime undermines signature and static behavioral detection, allowing each execution to be polymorphic and tailored to its environment. This approach facilitates rapid evasion of endpoint controls, enables cross-platform attack campaigns, and allows adversaries to bypass traditional SOC visibility. The current trend, with nation-state actors (like APT28 via PROMPTSTEAL) already experimenting, strongly suggests that highly adaptive, AI-driven ransomware and espionage operations will be both scalable and persistent, extending across Windows, Linux, and macOS.
Future campaigns are likely to blur further the lines between traditional malware and adaptive AI-enabled tooling, embedding prompts and API keys as core enablers. As more threat actors refine their ability to leverage LLMs, both in the cloud and on-premises, defenders must shift from detecting static code to holistically monitoring LLM activity, API key usage, and anomalous runtime code generation across enterprise environments. Without interventions such as strong controls on model/API access and robust secrets management, organizations risk facing a new generation of malware that operates below the radar, eroding both containment and response capabilities.