LLM-Embedded Malware & Ransomware

LLM-embedded malware and ransomware represent a new cyber threat category where adversaries generate malicious logic dynamically at runtime using AI models, making attacks more adaptive and harder to detect than traditional fixed-code payloads. This rapidly evolving technique enables polymorphic, cross-platform threats, demonstrated in tools like MalTerminal and PromptLock, and is already being adopted by state actors and criminal groups for scalable espionage and ransomware operations.

Overview

Large language models (LLM) embedded malware is advancing from concept to practice, with operators wiring models directly into loaders so malicious logic is generated at runtime rather than shipped as a fixed payload. These new findings are crucial because every execution can produce different code or commands, which undermines static detection and delays containment. MalTerminal represents one of the earliest documented cases: a Python-based tool that utilizes GPT-4 via cloud APIs to generate either ransomware or a reverse shell dynamically. PromptLock demonstrates how a local model can be abused; it leverages an Ollama-exposed endpoint to generate Lua scripts directly on the victim machine. State-aligned experimentation has also been reported: PROMPTSTEAL, linked to APT28, embeds prompts and hundreds of API tokens to dynamically generate system commands during espionage operations. Early proofs, such as HYAS’s BlackMamba, further highlight polymorphic payload generation that mutates on each run to bypass endpoint controls. Because models and APIs are platform-agnostic, the same techniques can be used to target Windows, Linux, and macOS. Although large-scale campaigns have not yet been observed, the trajectory from experiments to targeted operations is clear.

Key Findings:

LLM-embedded malware is emerging as a new category of threat, with malicious logic generated at runtime rather than being embedded in code, which complicates detection and response.
MalTerminal uses GPT-4 through cloud APIs to dynamically produce ransomware or remote shells on demand, while PromptLock abuses a local Ollama model to generate Lua payloads for reconnaissance, theft, and encryption.
State-backed experimentation has surfaced in LAMEHUG/PROMPTSTEAL, tied to APT28, which embeds prompts and hundreds of API tokens to generate commands for espionage dynamically.
Early proofs, such as BlackMamba, demonstrate polymorphic code generation that mutates on each execution, further evading endpoint controls.
Prompts-as-code and embedded API keys are the core enablers, making dynamic payloads possible but also leaving artifacts that defenders can monitor.
Immediate Action: Inventory and constrain local LLM endpoints, enforce strong secrets management to prevent exposed API keys, and monitor for unexplained LLM API usage or runtime code generation events.

1.0 Threat Overview

1.1 Historical Context

The idea of embedding artificial intelligence into malware has circulated for years, but practical cases only began to surface between 2023 and 2025, as large language models (LLMs) became more powerful and widely available. The first known proof-of-concept was BlackMamba in late 2023, which used AI to generate polymorphic keylogger code that changed on every execution.[1] While experimental, it proved that AI could be leveraged to bypass signature-based detection by ensuring no two payloads looked alike. By mid-2025, this experimentation escalated to state-linked adoption. LAMEHUG/PROMPTSTEAL, attributed to the Russian APT28 group, embedded more than 280 HuggingFace API keys alongside prompt structures that dynamically instructed models to generate system shell commands for espionage.[2] This marked the first time a state actor had operationalized LLM integration inside real-world malware, using runtime AI generation to avoid fixed indicators and extend campaign resilience.

Soon after, ransomware-focused cases emerged that showed how LLM integration could power broader classes of malware. MalTerminal appeared as one of the earliest practical frameworks: a Python executable that queried GPT-4 via cloud APIs to generate ransomware or a reverse shell on demand.[3] By outsourcing payload creation to the model itself, MalTerminal introduced runtime variability that forced defenders to look for artifacts like embedded API keys and prompt structures instead of static signatures. PromptLock followed as another proof-of-concept, written in Go and designed to work with a locally hosted model accessed through the Ollama API.[4] Instead of relying on the cloud, it generated Lua scripts directly on the victim’s device to perform reconnaissance, data theft, and encryption, proving that adversaries could achieve runtime variability without external dependencies. Together, these cases demonstrate a clear progression: from academic proofs, such as BlackMamba, to espionage-focused adoption with PROMPTSTEAL, to ransomware prototypes in MalTerminal and PromptLock. The trajectory demonstrates how quickly AI has shifted from a peripheral tool for attackers to an embedded execution engine within malware, laying the groundwork for scalable, adaptive AI-driven threats.

1.2 Technique Breakdown

LLM-embedded malware differs from traditional threats because it does not always carry a fixed malicious payload inside the binary. Instead, it relies on prompts and access to a large language model, either cloud-based or local, to generate the malicious logic when needed. This means the “instructions” for the attack are created on demand, and the same malware can behave differently each time it runs. Defenders are forced to shift their focus away from static signatures and toward hunting for artifacts such as API keys, prompt structures, and suspicious model interactions.

Key Techniques Observed:

AI-Powered Malware Techniques

Cloud-based Runtime Generation

MalTerminal

Malware makes API calls to a remote LLM (e.g., GPT-4) to generate ransomware or remote access payloads in real time. This offloads malicious logic to the cloud, reducing what defenders can see on disk.

Attack Mechanism

Remote API calls to cloud-based LLMs for real-time payload generation, reducing local forensic artifacts.

Defense Impact

Minimal disk footprint complicates static analysis and signature-based detection methods.

GPT-4 Ransomware RAT Cloud API

Local Model Abuse

PromptLock

Malware hosts or accesses a local model (e.g., via Ollama API) to generate scripts directly on the victim's device. In PromptLock, Lua scripts were produced to perform reconnaissance, file theft, and encryption.

Attack Mechanism

Local LLM exploitation via Ollama API to generate Lua scripts for reconnaissance and file operations.

Defense Impact

Bypasses cloud API dependencies while maintaining dynamic payload generation capabilities.

Ollama API Lua Scripts Reconnaissance File Theft Encryption

API Key Embedding

PromptSteal

Adversaries embed stolen or leaked API keys (sometimes hundreds at once) inside malware to maintain reliable access to LLM services. This enables resilience against blacklisting while providing a clear hunting artifact.

Attack Mechanism

Mass embedding of stolen API keys to ensure persistent LLM access despite individual key blacklisting.

Defense Opportunity

Embedded API keys provide clear hunting artifacts and attribution opportunities for defenders.

API Key Theft Mass Embedding Blacklist Evasion Hunt Artifacts

Prompts-as-Code

Instead of prewritten commands, malware contains prompts instructing the model to generate system commands, scripts, or shellcode. These can define the attacker's role ("act as a system administrator"), dictate behavior ("return only commands"), or bypass guardrails.

Attack Mechanism

Prompt engineering techniques to generate malicious code, bypass AI safety measures, and assume privileged roles.

Defense Impact

Traditional static analysis ineffective against prompt-driven code generation and role assumption.

Prompt Engineering Role Playing Guardrail Bypass Shellcode

Polymorphic Payloads

BlackMamba

Proof-of-concept malware demonstrated how AI can generate fresh code at every execution, making each sample unique and complicating detection. This creates a constantly moving target for defenders.

Attack Mechanism

AI-driven code mutation at runtime creating unique samples per execution cycle.

Critical Defense Challenge

                            Creates a constantly moving target that defeats signature-based detection and complicates behavioral analysis.
                        

Polymorphism Runtime Mutation Unique Samples Proof of Concept

Cross-Platform Payloads

LLM-generated logic can be written in lightweight scripting languages like Lua or Python, making it easy to adapt across Windows, Linux, and macOS with minimal modification.

Attack Mechanism

Lightweight scripting language generation enabling seamless cross-platform deployment.

Defense Impact

Single codebase threatens multiple operating systems simultaneously, expanding attack surface.

Windows Linux macOS Lua Python

2.0 Recommendations for Mitigation

LLM-embedded malware introduces risks that bypass traditional detection, meaning leadership cannot rely solely on SOC monitoring or endpoint tools to prevent impact. The following five recommendations focus on strategic and technical safeguards that executives and IT leadership can implement directly. These steps address how models, APIs, and enterprise data flows are controlled, specifically the areas most likely to be abused if this threat progresses from a proof-of-concept to widespread use.

2.1 Control AI Model Access

Require all business units to route LLM/API traffic through a secure enterprise gateway that enforces authentication, logging, and rate limiting.
Block direct outbound calls from endpoints to public AI APIs to prevent hidden malware prompts from running undetected.
Limit which apps and teams can access AI models, secure local AI runtimes with access controls, and restrict unmonitored use of external AI APIs to reduce the attack surface.

2.2 Secure API Keys and Secrets

Mandate enterprise-wide storage of API keys in a centralized key vault with automated rotation every 90 days.
Forbid developers and vendors from embedding API keys inside applications or local configs.
Enforce executive-level reviews of key usage to ensure only authorized services have access.

2.3 Restrict Local AI Runtime Installations

Block installation of local inference runtimes (e.g., Ollama, vLLM) on employee endpoints unless pre-approved.
If local models are business-critical, isolate them on dedicated servers with no access to sensitive corporate data.
Require IT teams to audit systems for unauthorized AI runtimes periodically.

2.4 Segment AI-Enabled Applications

Place any AI-integrated applications in separate network zones, away from finance, HR, and executive systems.
Forbid AI-driven tools from accessing shared drives containing regulated or sensitive data unless reviewed and approved.
Require SaaS vendors to disclose whether their products embed AI models or API calls before adoption.

2.5 Harden Enterprise Data Paths Against AI-Generated Payloads

Enforce strict allow/deny lists for file types and scripts that can move between user endpoints and core systems.
Require content disarm and reconstruction for files entering through email, chat, or collaboration platforms to neutralize dynamically generated payloads.
Implement executive-level mandates for network segmentation to prevent AI-generated ransomware from triggering on a single endpoint and subsequently propagating laterally across business-critical assets.

3.0 Preconditions for Exploitation

For LLM-embedded malware to function, certain conditions must be in place that enable the malware to generate and execute its malicious instructions at runtime. These preconditions highlight both technical dependencies and operational gaps that adversaries exploit. Unlike traditional malware, which arrives with fixed payloads, these threats depend on access to AI models, valid prompts, and execution environments that allow dynamically generated code to run. Understanding these requirements is critical for anticipating where defenses can break down.

AI-Powered Malware Prerequisites

Model or API Access

Critical

The malware must connect to a local or cloud-based LLM (e.g., OpenAI GPT-4 via API, Ollama-hosted models) to generate malicious code. Without that connection, the malware cannot produce its payload.

Technical Requirements

Functional API endpoints or local inference services accessible to the malware process.

Failure Impact

Complete operational failure - malware cannot generate or execute payloads without LLM access.

Examples

OpenAI GPT-4 API, Ollama-hosted models, local inference engines

GPT-4 Ollama API Access Local Models

Embedded Prompts and Keys

Critical

Hardcoded prompts and API keys must be present inside the malware to instruct the LLM. These are the "blueprints" for the attack and a key artifact defenders can monitor.

Technical Requirements

Embedded authentication credentials and structured prompts for LLM instruction.

Defense Opportunity

Embedded keys and prompts provide clear hunting artifacts and attribution opportunities.

Intelligence Value

These embedded artifacts reveal attack methodologies and can be used to develop detection signatures.

API Keys Hardcoded Prompts Hunt Artifacts Attribution

Executable Runtime Environment

Infrastructure

The victim system must support the execution of code generated by the model (e.g., Lua, Python, PowerShell, Bash). This enables the LLM's output to be transformed into an active payload.

Technical Requirements

Available interpreters and runtime environments for executing dynamically generated code.

Mitigation Strategies

Application whitelisting, script execution policies, and runtime environment restrictions.

Supported Runtimes

Lua interpreters, Python runtime, PowerShell, Bash shells, JavaScript engines

Lua Python PowerShell Bash JavaScript

Network or Local API Reachability

Connectivity

For cloud-based cases, such as MalTerminal, outbound network connectivity is required to contact the LLM service. For local approaches, such as PromptLock, the attacker relies on access to a locally hosted inference API.

Connectivity Requirements

Outbound HTTPS access to cloud APIs or local network access to inference services.

Network Controls

Egress filtering, API endpoint blocking, and network segmentation can disrupt operations.

Attack Examples

MalTerminal (cloud-based), PromptLock (local API access)

HTTPS Egress MalTerminal PromptLock Network Filtering

Lack of Monitoring or Controls

Defensive Gap

Environments that do not audit model usage, restrict API calls, or monitor anomalous script generation provide attackers with the freedom to operate without triggering alarms.

Security Gaps

Absence of API monitoring, script generation detection, and behavioral analysis capabilities.

Recommended Controls

API usage monitoring, anomalous script detection, behavioral analysis, and execution logging.

Defense Priority

Implementing comprehensive monitoring is critical as AI-powered malware becomes more prevalent.

API Monitoring Script Detection Behavioral Analysis Execution Logging

4.0 Threat Actor Utilization

While many observed cases of LLM-embedded malware remain experimental, state-aligned groups and criminal operators are already exploring its potential. The table below summarizes how different real-world use cases or research discoveries illustrate adversarial use of AI models inside malware.

Takeaway: These examples demonstrate adversaries experimenting with both nation-state espionage and criminal ransomware tooling, employing techniques that span cloud APIs, local inference engines, and polymorphic payload generation. The trend suggests a growing interest in embedding AI directly into malware as a core capability, rather than a peripheral support tool.

AI-Powered Malware Attribution Analysis

PROMPTSTEAL

APT28

Technique Applied

Embedded HuggingFace API keys and prompts to dynamically generate shell commands for espionage operations.

Operational Impact

Enabled flexible data collection and command execution while avoiding static payload detection.

Attribution Confidence

APT28 / Fancy Bear

HuggingFace API Embedding Shell Commands Espionage

BlackMamba

Research

Technique Applied

Polymorphic keylogger delivered via LLM prompt injection that generated new payload code each run.

Research Impact

Demonstrated runtime variability to bypass endpoint defenses.

Research Context

                            Proof-of-concept developed by HYAS Labs to demonstrate AI-powered polymorphic malware capabilities.
                        

Research Organization

HYAS Labs

Polymorphic Keylogger Prompt Injection Runtime Variability

MalTerminal

Unattributed

Technique Applied

Used OpenAI GPT-4 via API to produce ransomware or reverse shell code at runtime.

Operational Impact

Outsourced payload creation to the cloud, making detection harder and enabling versatile attack paths.

Attribution Status

Unattributed Tool

OpenAI GPT-4 Ransomware Reverse Shell Cloud Generation

PromptLock

Research

Technique Applied

Leveraged a locally hosted LLM through Ollama to generate Lua scripts for reconnaissance, exfiltration, and encryption.

Research Impact

Showed that on-device inference can power ransomware without needing external infrastructure.

Research Context

                            Research conducted by ESET to demonstrate local AI model abuse for malware generation.
                        

Research Organization

ESET Researchers

Ollama Lua Scripts Reconnaissance Exfiltration Encryption

5.0 Risk and Impact

The rise of LLM-embedded malware represents a significant strategic shift in cyber threats. By generating code dynamically at runtime, these tools undermine static detection, making each execution unique and thereby delaying identification and response. For organizations, this means attackers can more easily bypass endpoint defenses and tailor payloads in real time to the environment they compromise. The confirmed use of PROMPTSTEAL by APT28 shows that state-aligned espionage actors are already experimenting with this capability, while proofs like PromptLock and MalTerminal highlight how ransomware and remote access could evolve into highly adaptive, AI-driven campaigns. The impact extends beyond individual infections—dynamic, model-driven malware could scale across Windows, Linux, and macOS, eroding defenders’ visibility and increasing the risk of stealthy, persistent compromises.

6.0 Hunter Insights

LLM-embedded malware is poised to redefine the cyber threat landscape over the next 12–24 months, moving from experimental demonstration to targeted attacks. The ability to generate malicious code dynamically at runtime undermines signature and static behavioral detection, allowing each execution to be polymorphic and tailored to its environment. This approach facilitates rapid evasion of endpoint controls, enables cross-platform attack campaigns, and allows adversaries to bypass traditional SOC visibility. The current trend, with nation-state actors (like APT28 via PROMPTSTEAL) already experimenting, strongly suggests that highly adaptive, AI-driven ransomware and espionage operations will be both scalable and persistent, extending across Windows, Linux, and macOS.

Future campaigns are likely to blur further the lines between traditional malware and adaptive AI-enabled tooling, embedding prompts and API keys as core enablers. As more threat actors refine their ability to leverage LLMs, both in the cloud and on-premises, defenders must shift from detecting static code to holistically monitoring LLM activity, API key usage, and anomalous runtime code generation across enterprise environments. Without interventions such as strong controls on model/API access and robust secrets management, organizations risk facing a new generation of malware that operates below the radar, eroding both containment and response capabilities.

💡

Hunter Strategy encourages our readers to look for updates in our daily Trending Topics and on Twitter.

LLM-Embedded Malware & Ransomware

Overview

Key Findings:

1.0 Threat Overview

1.1 Historical Context

1.2 Technique Breakdown

Key Techniques Observed:

2.0 Recommendations for Mitigation

2.1 Control AI Model Access

2.2 Secure API Keys and Secrets

2.3 Restrict Local AI Runtime Installations

2.4 Segment AI-Enabled Applications

2.5 Harden Enterprise Data Paths Against AI-Generated Payloads

3.0 Preconditions for Exploitation

4.0 Threat Actor Utilization

5.0 Risk and Impact

6.0 Hunter Insights

About Author

Read Next

Monthly Wrap

Trending Topics