Trojans in AI models
#1
Bug 
Quote:Hidden logic, data poisoning, and other targeted attack methods via AI systems.
 
[Image: trojans-in-AI-models-featured.jpg]

Over the coming decades, security risks associated with AI systems will be a major focus of researchers’ efforts. One of the least explored risks today is the possibility of trojanizing an AI model. This involves embedding hidden functionality or intentional errors into a machine learning system that appears to be working correctly at first glance. There are various methods to create such a Trojan horse, differing in complexity and scope — and they must all be protected against.

Malicious code in the model

Certain ML model storage formats can contain executable code. For example, arbitrary code can be executed while loading a file in a pickle format, the standard Python format used for data serialization (converting data into a form that is convenient for storing and transferring). Particularly, this format is used in a deep learning library PyTorch. In another popular machine learning library, TensorFlow, models in the .keras and HDF5 formats support a “lambda layer”, which also executes arbitrary Python commands. This code can easily conceal malicious functionality.

TensorFlow’s documentation includes a warning that a TensorFlow model can read and write files, send and receive network data, and even launch child processes. In other words, it’s essentially a full-fledged program.

Malicious code can activate as soon as an ML model is loaded. In February 2024, approximately 100 models with malicious functionality were discovered in the popular repository of public models, Hugging Face. Of these, 20% created a reverse shell on the infected device, and 10% launched additional software.

Training dataset poisoning

Models can be trojanized at the training stage by manipulating the initial datasets. This process, called data poisoning, can be either targeted or untargeted. Targeted poisoning trains a model to work incorrectly in specific cases (for example, always claiming that Yuri Gagarin was the first person on the Moon). Untargeted poisoning aims to degrade the model’s overall quality.

Targeted attacks are difficult to detect in a trained model because they require very specific input data. But poisoning the input data for a large model is costly, as it requires altering a significant volume of data without being detected.

In practice, there are known cases of manipulating models that continue to learn while in operation. The most striking example is the poisoning of Microsoft’s Tay chatbot, which was trained to express racist and extremist views in less than a day. A more practical example is the attempts to poison Gmail’s spam classifier.

Here, attackers mark tens of thousands of spam emails as legitimate to allow more spam through to user inboxes.

The same goal can be achieved by altering training labels in annotated datasets or by injecting poisoned data into the fine-tuning process of a pre-trained model.

Continue Reading...
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)
[-]
Welcome
You have to register before you can post on our site.

Username/Email:


Password:





[-]
Recent Posts
GFYI [Official] Ashampoo® Photo Optimiz...
this tool would help...zevish — 07:18
Sandboxie 1.15.12 / 5.70.12
Sandboxie Plus ver...harlan4096 — 05:57
K-Lite Codec Pack 18.9.0 / 18.9.2 Update
Changes in 18.9.2 ...harlan4096 — 05:55
Brave 1.78.94
Release Channel 1....harlan4096 — 05:54
Gmail's new feature lets you react to em...
Google has launche...harlan4096 — 05:52

[-]
Birthdays
Today's Birthdays
avatar (44)centfootadoni
Upcoming Birthdays
avatar (27)akiratoriyama
avatar (47)Jerrycix
avatar (39)awedoli
avatar (81)WinRARHowTo
avatar (37)owysykan
avatar (48)beautgok
avatar (38)axuben
avatar (44)talsmanthago
avatar (30)mocetor
avatar (45)piomaibhaict
avatar (50)kingbfef
avatar (37)izenesiq
avatar (39)ihijudu
avatar (44)tiojusop
avatar (41)Damiennug
avatar (39)acoraxe
avatar (48)contjrat
avatar (40)axylisyb
avatar (43)tukrublape
avatar (40)iruqi
avatar (41)saitetib
avatar (35)ypasodiny
avatar (38)omapek
avatar (47)Geraldtuh
avatar (43)knigiJow
avatar (45)1stOnecal
avatar (49)Mirzojap
avatar (35)idilysaju
avatar (44)xclubDum
avatar (40)Stewartanilm
avatar (43)nikitaxople
avatar (39)GregoryRog
avatar (44)mediumog
avatar (39)odukoromu
avatar (45)Joanna4589

[-]
Online Staff
There are no staff members currently online.

>