Policies are lists of shieldr_rule objects plus
thresholds. You can start with a built-in policy and append
domain-specific rules.
For the source model behind the built-in policies, see
vignette("policy-design", package = "llmshieldr").
Every rule has the same shape:
id: unique rule identifier. The recommended convention
is llmXX.category.name, such as
llm02.ticket_id.pattern: regex pattern, or NULLfn: R predicate function, or NULLowasp: OWASP LLM category such as
llm02severity: low, medium,
high, or criticalaction: allow, redact, or
blockdescription: human-readable explanationExactly one of pattern or fn must be
supplied. Regex rules produce match spans that can be redacted. Function
rules are useful when the condition is easier to express in R.
Function rules may return:
TRUE or FALSEFinding lists can include rule_id, owasp,
severity, action, description,
match, start, end, and
source. Include start and end
when you want custom function findings to participate in redaction.
Severity maps to risk score contributions:
| Severity | Contribution |
|---|---|
low |
0.1 |
medium |
0.3 |
high |
0.6 |
critical |
1.0 |
Findings are deduplicated, overlapping spans from the same evidence
are scored once, distinct findings are summed, and the total is capped
at 1.0. Synthetic context findings are capped separately. A
policy’s thresholds then decide the final action. Defaults are
redact_at = 0.4 and block_at = 0.75.
Regex rules are the simplest way to redact or block recognizable text.
guardrails <- add_rule(
guardrails,
id = "llm02.ticket_id",
pattern = "\\bTICKET-[0-9]{6}\\b",
owasp = "llm02",
severity = "medium",
action = "redact",
description = "Internal support ticket identifier."
)
scan_prompt("Summarize TICKET-123456 for the support team.", guardrails)
#> llmshieldr report
#> action: redact
#> risk_score: 0.300
#> findings: 1Function rules let you express checks that are easier to write in R than in a single regular expression.
contains_student_address <- function(text) {
grepl("\\bstudent\\b", text, ignore.case = TRUE) &&
grepl("\\bhome address\\b", text, ignore.case = TRUE)
}
education <- policy("education_safe")
education <- add_rule(
education,
id = "llm02.student.address",
fn = contains_student_address,
owasp = "llm02",
severity = "high",
action = "redact",
description = "Student home address reference."
)
scan_prompt("The student home address appears in the form.", education)
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2Function rules can also return span-aware findings:
ticket_span_rule <- function(text) {
hit <- regexpr("\\bTICKET-[0-9]{6}\\b", text, perl = TRUE)
if (identical(as.integer(hit[[1]]), -1L)) {
return(FALSE)
}
start <- as.integer(hit[[1]])
end <- start + as.integer(attr(hit, "match.length")) - 1L
list(
rule_id = "llm02.ticket_id.fn",
owasp = "llm02",
severity = "medium",
action = "redact",
description = "Internal support ticket identifier.",
match = substr(text, start, end),
start = start,
end = end
)
}Healthcare and life sciences often add identifiers beyond generic PII.
pharma <- policy("pharma_gxp")
pharma <- add_rule(
pharma,
id = "llm02.site_id",
pattern = "\\bSITE-[0-9]{3}\\b",
owasp = "llm02",
severity = "medium",
action = "redact",
description = "Clinical trial site identifier."
)Finance workflows often tighten language around recommendations and promises.
Use list_rules() to inspect a policy before
deployment.
list_rules(guardrails)
#> id owasp severity action has_pattern has_fn
#> 1 llm01.injection.basic llm01 critical block TRUE FALSE
#> 2 llm01.injection.indirect llm01 critical block TRUE FALSE
#> 3 llm01.nlp.intent llm01 high block FALSE TRUE
#> 4 llm02.pii.email llm02 medium redact TRUE FALSE
#> 5 llm02.pii.phone llm02 medium redact TRUE FALSE
#> 6 llm02.pii.ssn llm02 high redact TRUE FALSE
#> 7 llm02.phi.condition llm02 high redact TRUE FALSE
#> 8 llm02.secret.api_key llm02 high redact TRUE FALSE
#> 9 llm02.secret.bearer llm02 high redact TRUE FALSE
#> 10 llm02.secret.aws llm02 high redact TRUE FALSE
#> 11 llm02.secret.password llm02 high redact TRUE FALSE
#> 12 llm02.secret.connection_string llm02 high redact TRUE FALSE
#> 13 llm07.system_prompt.extraction llm07 critical block TRUE FALSE
#> 14 llm06.agency.language llm06 critical block TRUE FALSE
#> 15 llm02.ticket_id llm02 medium redact TRUE FALSEThe resulting table includes has_pattern and
has_fn, which make it easy to audit whether a policy is
mostly regex-based, function-based, or mixed.
Custom rule ids that do not follow the llmXX. naming
convention still work, but shieldr_rule() warns because
OWASP risk summaries are clearest when rule ids carry the category
prefix.
For every new rule, keep at least:
The packaged evaluation corpus at
inst/extdata/security_eval_cases.csv is a small starting
point for these cases. Add application-specific corpora outside the
package when examples contain real or sensitive data.