Code structure and customization

Code structure

The code is organized as a Maven project into two source folders with various java packages and resources in the src/main project directory. Generated code is written to the src/main folder and it is regenerated each time the DBmasker service is used. Be aware of that customization might be overwritten each time it is regenerated.

The system creates java packages in the src/main/java folders based on the property Root package defined as parameter to the DBmasker service. The default package is example.anonymizer.

Code packages and files

Code packages, files and directories



Source folder for customized java files

<root package>.conversions

Package containing custom conversion classes. Defined for Randomized columns or for masked columns using a column input source. Converts input strings into another string format or converts string into another data type for randomized columns. (See in AnonymizerHotel example)

<root package>.distributions

Package containing custom distribution classes. Defined for Dependent tables when creating data for tables. Used for determining the distribution of foreign keys between the parent and child tables. (See in AnonymizerHotel example)

<root package>.transformations

Package containing custom transformations. Defined for Masked column. Used for transforming column value before being written to database. (See in AnonymizerHotel example)


Source folder containing resource text files used as input files to mask columns.


Source folder for generated files. All files within this folder path are regenerated every time the project is built. DO NOT MODIFY. The generated source is described in the AnonymizerAPI javadoc.

<root package>

Start file containing java main method for jar file

<root package>

Connection logic.

<root package>

Task execution tree root

<root package>.<task name>

Application code for performing anonymizations, creations, deletions and erasures. A separate package is created for each task and sub-task. Within each package are various java files for performing the various functions.


Contains all interfaces, (reading/writing and (Java logging). All interfaces and abstract classes are explained in the javadoc.


(Internal) Contains multiple classes for each supported column data type


Built in conversions used for converting string input to various other data types. (String2Date, String2DateTime, String2Decimal, String2Integer, String2Time)


(Internal) Contains abstract classes for various functions


The four supplied SAR writers for XML and JSON


Built in distributions determine how foreign keys are distributed between parent and child tables


This will add rows such that any missing combinations of foreign key values are present.

Parent values will be assigned evenly among the new rows, but with an additional ability to set a random deviation.

Foreign key columns will be randomly assigned from available values. This is the Default distribution.


(Internal) Contains mask classes for various functions


(Internal) Contains noise classes for various data types


Built in transformations.

Replaces last digit of credit card number with calculated checksum using Luhn algorithm

Translates various characters as space, hyphen or underscore.


Connection parameters for the database you will connect to.


Maven config file.


Conversions are defined for randomized columns or for masked columns using a column input source or text field input source. When used for a column input source, the conversion is used for manipulating the string format and converting the column from one string format to another. When used for a text field, the conversion is used to manipulate the text file entry. When used for a randomized column, it converts the string (The column is always read as a string) into another data type.

All custom conversions must be created in the src/main/java/<java package>.conversions package.
public class ParseDigits implements IConversion {
public static final String LABEL = "ParseDigits - simply remove all non-digits";
public Object convert(String txt) {
return "0";
char[] chars = txt.toCharArray();
StringBuilder sb = new StringBuilder();
for (char c : chars) {
return sb.length()==0?"0":sb.toString();
  • The custom class must implement the IConversion interface and can override the convert method.

  • The convert method always assumes a string input and returns a string.

  • The conversion is defined in the user defined classes section and used as defined by the convert terminal.

Using text files with multiple columns of data

If the input source text file contains multiple columns of data, a conversion may be created to assign the correct delimited value. For example, a text file containing a City, State and Zip Code delimited by a tab can have an associated conversion file to select the first entry as illustrated below. All columns can use the same input text file with similar conversion methods and as long as it is using sequence and repeatable random it will pick the same line from text file.
public class TabDelim1 implements IConversion {
public static final String LABEL = "TabDelim1 - Pick column 1 delimited by tab";
public Object convert(String input) throws Exception {
return input.split("\\t")[0];


Transformations are defined for Masked columns and are used for transforming column values before being written to the database. All custom transformations must be created in the src/main/java/<java package>.transformations package.

* Generalization example that demonstrates using current value set in the calculation.
* It divided current values into 4 equal size buckets where the value represents the average.
public class QuartileGeneralization implements ITransformation, IPreScan {
public static final String LABEL = "QuartileGeneralization - Create 4 groups and preserve the average";
public String transform(String input) {
int in=Integer.parseInt(input);
for (int i = 0; i < 4; i++) {
return String.valueOf(tot[i]/each);
return String.valueOf(tot[3]/each);
int[] max=new int[]{0,0,0,0};
int[] tot=new int[]{0,0,0,0};
double each;
public void scan(int col, List<String[]> rows) {
List<Integer> list=new ArrayList<>();
for (String[] row : rows) {
each = list.size()/4.0;
for (int i = 0; i < list.size(); i++) {
int bucket=(int)(i/each);
Integer x = list.get(i);
  • The custom class must implement the ITransformation interface and override the transform method

  • Can also implement IPreScan and override the scan method. This provides you with the entire set of records in the scan method which you can inspect and use for the transform method

  • The transformation is defined in the user defined classes section and used as defined by the transform terminal.


Distributions are defined for dependent tables when creating data for tables. It is used for determining the distribution of foreign keys between the parent and child tables. All custom distributions must be created in the src/main/java/<java package>.distributions package.
* Simple distribution where it ensures a minimum occurances of each parent
* when creating new records
public class MinPerParent implements IDistribution {
public static final String LABEL =
"MinPerParent - Ensures a minimum number of occurrences of each parent value";
public static final String PARENT_LABEL= "Minimum #rows / parent";
int[] min;
public int calculateNewRows(CreateParent[] parents, int numExistRows, List<String[]> existing) {
min=new int[parents.length];
for (int i = 0; i < parents.length; i++) {
int x1 = 0;
for (int i = 0; i < parents.length; i++) {
CreateParent ct = parents[i];
int dmin = 0;
for (int j : ct.count) {
dmin += Math.max(min[i] - j, 0);
x1 = Math.max(x1,dmin );
return x1;
public void distribute(List<String> columns, CreateParent[] parents, List<String[]> rows) {
for (int i = 0; i < parents.length; i++) {
CreateParent parent = parents[i];
int irow = 0;
int[] a = parent.count;
for (int icol = 0; icol < a.length; icol++) {
for (int j = a[icol]; j < min[i]; j++) {
if (irow >= rows.size())
IDistribution.assignRow(parent, columns, rows.get(irow++), icol);
int icol = 0;
int size = parent.parentRows.size();
while (size > 0 && irow < rows.size()) {
IDistribution.assignRow(parent, columns, rows.get(irow++), icol++);
if (icol >= size)
icol = 0;
  • The custom class must implement the IDistribution interface and override the distribute and calculateNewRows methods

  • The distribution is defined in the user defined classes section and used as defined by the distribute terminal.

Custom code strategies

Be aware of that customization might be overwritten each time the service is regenerated. All custom code will also reside in the src/main/java or src/main/resources folders.

Using the application JAR from other Java programs

The JAR file can easily be included on the classpath and run from other Java programs. The build process creates a source JAR which should be attached to the development environment. Handling of database connection can be done using the Anonymizer connection mechanism or entirely from the calling Java program. Each task can be run separately and in any order and they may also be subclassed. All interfaces and abstract classes are explained in the javadoc.

Entry points:

  • Anonymizer extends AbstractAnonymizer which implements IAnonymizer

    • runAll, runTasks, getTaskRoot, setEraseParam, setAutoCommit, emptyDB

  • ConfigUtil

    • getConfig - returns the Properties object that can be assigned from the API instead of editing the file.

  • Log - a logging facade using java.util.logging

    • setLevel, logger.addHandler - must attach loghandler to deal with log messages

  • IContext - keeps additional configuration for running tasks

    • setRunType, setRunParams, setAutoCommit, setRepeatableRandom


ConfigUtil.getConfig() returns a Java Properties object which can be manipulated like a Map.

Configuring values
// setting the file encryption key.


IContext is a placeholder for run-time data and connection information. A ContextFactory provides methods to create three types of context. You may alternatively implement your own.


ContextFactory method


createAnonymizeContext(Connection connection)


createEraseContext(Connection connection, String[] params)


createSarContext(Connection connection, String[] params, ISarWriter sarwriter)

The values for auto-commit and repeatable-random you may have to set additionally.

Additional context


setRunType(RunType run)


Type of processing set by factory method.

setRunParams(String[] parameter)

Array of strings

Parameters set by factory method.

setRepeatableRandom(boolean repeatable)


True - start the number generation with same seed and keep consistent results each time it's run.

setAutoCommit(boolean autocommit)



Static variables

Customize how indentation of run status is written to logs.


Log handler may be configured to suit your needs.

Configure Log handler
// To configure the Anonymizer's Log handler instead of Java Logging's default ConsoleHandler.
// To add your own handler
Log.addhandler(new MyLogHandler());
// To ignore Info and Warning type messages


Anonymizer gives a practical entry point, but does not necessarily need to be included to run tasks.

Running tasks with or without Anonymizer
// Instantiating Anonymizer will give access to IAnonymizer methods such as getTasks().
// The ContextFactory object gives factory methods based on the connection from
try {
IAnonymizer main=new Anonymizer();
IContext context = ContextFactory.createAnonymizeContext(Connect.createDefaultConnection());
main.getTaskRoot().run(context); // Runs all tasks
main.runTasks(context, Arrays.asList("Task1,Task2".split(","))); // To run specific tasks in a list
} catch (Throwable e) {
// Having your own connection, you may instead access tasks directly, without the Anonymizer class.
// The Factory methods in AbstractContext can be used if you have a connection.
IContext context = ContextFactory.createAnonymizeContext(connection);
new TaskRoot().run(context);
// Running Erase task
String[] params = new String[]{"1000234"};
IContext context = ContextFactory.createEraseContext(connection, params);
new MyEraseTask().run(context);

Running SAR exports

Although it might be sufficient to provide the XML for a Subject Access Request, it is more appropriate that the data be exported and run through a report generator for providing a PDF or HTML output. A SAR response would most likely have to contain many other elements in addition to the information contained in the database.

SAR tasks are more appropriate to run via the JAR APIs.

Running SAR export
// Running SAR task with Json output
// Set parameter value for the task
String[] params = new String[]{"1"};
try (
FileOutputStream fw=new FileOutputStream("MySar.json");
ISarWriter sw = new SimpleJsonSarWriter(fw);
Connection connection = Connect.createDefaultConnection();
IContext context = ContextFactory.createSarContext(connection,params, sw);
new MySarTask().run(context); // Runs the task named "MySarTask"
} catch (Throwable e) {

A SarWriter must implement the ISarWriter interface, but it may be easier to extend the AbstractSarWriter, which handles the hierarchy of data.

There are four generic writers provided for XML and JSON and these may be convenient to subclass for your own needs.

public class SimpleXmlSarWriter extends AbstractSarWriter {
public SimpleXmlSarWriter() {
super(" ");
public String writeColumn(String column, String label, String comment, String value) {
return " "+id(column)+"='"+escape(value)+"'\n";
public String writeTable(String table, String label, String comment, String columns, String children) {
return "<"+id(table)+"\n"+columns+indent+"/>\n";
return "<"+id(table)+"\n"+columns+" >\n"+children+"</"+id(table)+">\n";
public String writeRoot(String inner) {
return "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+inner;
public String escape(char c) {
return escapeXml(c);

Running AnonymizerHotel tasks with embedded jar

The class code sample below may be run with the generated anonymizerhotel-0.0.1.jar file.

Sample using API
package example.anonymizer.apisample;
import java.sql.Connection;
import java.sql.SQLException;
import example.anonymizer.Connect;
import example.anonymizer.anonymize.Anonymize_CUSTOMER;
import example.anonymizer.erase.Erase_CUSTOMER;
import example.anonymizer.sar.SAR_CUSTOMER;
import no.esito.anonymizer.ContextFactory;
import no.esito.anonymizer.IContext;
import no.esito.anonymizer.Log;
import no.esito.anonymizer.sarwriter.JsonSarWriter;
public class UsingAPI {
public static void main(String[] args) {
try {
// Set console-loghandler instead of Java's logger
// Connect to the default database specified in
Connection conn = Connect.createDefaultConnection();
// Run the samples
} catch (ClassNotFoundException e) {
} catch (SQLException e) {
} catch (Throwable e) {
private static void runSampleAnonymizeTask(Connection conn) throws Throwable {
// Create a run-time context
IContext context = ContextFactory.createAnonymizeContext(conn);
context.setRepeatableRandom(true); // If you want same results consistently
// Run the Anonymize_CUSTOMER task
new Anonymize_CUSTOMER().run(context);
private static void runSampleEraseTask(Connection conn) throws Throwable {
// Create a run-time context and supply the parameter, where customerno = %PARAMETER%
IContext context = ContextFactory.createEraseContext(conn, new String[] {"1000234"});
// Run the Erase_CUSTOMER task
new Erase_CUSTOMER().run(context);
private static void runSampleSarTask(Connection conn) throws Throwable {
try (
FileOutputStream out = new FileOutputStream("sar.json");
JsonSarWriter writer = new JsonSarWriter(out);
// Create a run-time context and supply the parameter, where customerno = %PARAMETER%
IContext context = ContextFactory.createSarContext(conn, new String[] {"1000235"}, writer);
// Run the SAR_CUSTOMER task
new SAR_CUSTOMER().run(context);