minsta

Security researchers bypass Microsoft Azure AI Content Safety

Pabio October 29, 2024

Resistance tests

Mindgard deployed these two filters in front of ChatGPT 3.5 Turbo using Azure OpenAI, then accessed the target LLM via Mindgard’s Automated AI Red Teaming platform.

Two attack methods have been used against filters: character injection (adding specific types of characters and irregular text patterns, etc.) and adversarial ML evasion (looking for blind spots in classification ML).

Character injection reduced Prompt Guard’s jailbreak detection effectiveness from 89% to 7% when exposed to diacritics (e.g. changing the letter a to á), homoglyphs (e.g. , closely resembling characters such as 0 and O), digital replacement (“Leet talk”), and spaced characters. The effectiveness of AI text moderation has also been reduced using similar techniques.

Le-verdict

Le-verdict

Security researchers bypass Microsoft Azure AI Content Safety

Resistance tests

LEAVE A RESPONSE Cancel reply

Pabio

School meals – Behind the news

Going solo with Diljit Dosanjh

No buyer yet for series documenting renovation of Canada’s largest abandoned house

Homegrown Vancouver Canucks player Arshdeep Bains scores his first NHL goal

Security researchers bypass Microsoft Azure AI Content Safety

Resistance tests

LEAVE A RESPONSE Cancel reply

Pabio

You Might Also Like

School meals – Behind the news

Going solo with Diljit Dosanjh

No buyer yet for series documenting renovation of Canada’s largest abandoned house

Homegrown Vancouver Canucks player Arshdeep Bains scores his first NHL goal