Safely anonymizes Personally Identifiable Information ('PII') in a dataset by interactively prompting the user to keep, drop, or scramble each column.
Usage
mask_identity(envir = parent.frame())Details
The function provides a guided workflow for data anonymization:
Scans the calling environment for available data frames and prompts the user to select one.
Iterates through every column in the selected data frame, displaying its name and type.
For each column, the user chooses one of three actions:
Keep: Leaves the column unchanged.
Scramble: For numeric data, it shuffles the values to preserve the distribution while breaking the link to individuals. For text/factors, it replaces values with sequential placeholders (e.g., "Masked_0001").
Drop: Removes the column entirely from the dataset.
Saves the resulting anonymized data frame back to
envirwith a_maskedsuffix.Optionally generates a
dput()output of the first 20 rows for easy, safe sharing.
Warning
This function modifies files on disk or the global environment. Please ensure you have a backup or are using version control (e.g., Git) before execution.
Examples
if (interactive()) {
mask_identity()
}
