Quick tour through the descriptor file format
<setup>
The benerator configuration file is XML based. An XML schema is provided. The document root is a setup element:
<?xml version="1.0" encoding="utf-8"?>
<setup xmlns="https://www.benerator.de/schema/3.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.benerator.de/schema/3.0.0 https://www.benerator.de/schema/rapiddweller-benerator-ce-3.0.0.xsd">
<!-- content here -->
</setup>
benerator descriptor files are supposed to be named benerator.xml
or end with the suffix .ben.xml
.
benerator properties
Several global benerator properties allow for customization of its behavior:
name | description | default setting |
---|---|---|
defaultEncoding | the default file encoding to use for reading and writing text files | the system's file encoding |
defaultLineSeparator | the line separator to use by default | the system's line separator |
defaultTimeZone | The time zone to use | The system's time zone |
defaultLocale | The locale to use if none has been specified explicitly | The system's language code, e.g. 'de' |
defaultDataset | The dataset to use if none has been specified explicitly | The system's country's two-letter ISO code, e.g. 'US' |
defaultPageSize | the number of entities to create in one 'run', typically a transaction | 1 |
defaultScript | The default script engine to use for evaluating script expressions | ben (rapiddwellerScript) |
defaultNull | tells if nullable attribute should always be generated as null by default | true |
defaultSeparator | the default column separator to use for csv files | , |
defaultErrorHandler | the default error handling mechanism to use | fatal |
validate | Boolean flag to turn off validation (e.g. of XML validity and type definition consistency). | true |
maxCount | limits the maximum cardinality of all entity and association generations. If set to 0, cardinalities will not be limited. | -1 |
defaultOneToOne | When set to to Benerator assumes each relation is one-to-one. | false |
acceptUnknownSimpleTypes | When set to true, Benerator accepts unknown simple data types from its DescriptorProviders, relying on the user to choose the correct data type when generating. | false |
defaultSourceScripted | When set to true, Benerator resolves script expressions that are contained in imported data files. When set to false, all data is imported 'as is' | false |
You can configure them in the <setup>
element, e.g.
<setup xmlns=...
defaultencoding="utf-8"
defaultPageSize="1000">
<include>
Inclusion of properties files
An alternative way to specify the Benerator properties from the previous chapter is to specify them in a properties file, e.g.
context.defaultEncoding=UTF-8
context.defaultPageSize=1000
and include the properties file in the benerator descriptor file:
<include uri="my.properties"/>
This way you can easily use different settings in different environments (see 'Staging').
File entries that do not begin with 'benerator“ are simply put into the generation context and can be used to configure generation behavior.
Sub-Invocation of descriptor files
Besides properties files, Benerator descriptor files can be included too, e.g.
<include uri="subgeneration.ben.xml"/>
Global settings
benerator supports global settings. They can be evaluated using script expressions, e.g. {user_count}. This way, different types of settings may be evaluated:
-
system environment
-
Java virtual machine parameters
-
context variables
A setting is explicitly defined using a setting element:
<setting name="threshold" value="5"/>
<import>
Benerator has lots of plugin interfaces but is agnostic of most implementors. So you need to explicitly import what you need.
The following packages are imported by default (providing, for example, the ConsoleExporter):
com.rapiddweller.benerator.consumer | General-purpose consumer classes |
---|---|
com.rapiddweller.benerator.primitive | Generators for primitive data types |
com.rapiddweller.benerator.primitive.datetime | Generators for date, time and timestamp data |
com.rapiddweller.benerator.distribution.sequence | Distributions of 'Sequence' type |
com.rapiddweller.benerator.distribution.function | Distributions of 'Function' type |
com.rapiddweller.benerator.distribution.cumulative | Distributions of type 'CumulativeDistributionFunction' |
com.rapiddweller.benerator.sample | Generator components that use sample sets or seeds |
com.rapiddweller.model.consumer | ConsoleExporter and LoggingConsumer |
com.rapiddweller.common.converter | Converter components from rd-lib-common |
com.rapiddweller.common.format | Format components from rd-lib-common |
com.rapiddweller.common.validator | Validator components from rd-lib-common |
com.rapiddweller.platform.fixedwidth | Fixed column width file importer and exporter |
com.rapiddweller.platform.csv | CSV file importer and exporter |
com.rapiddweller.platform.dbunit | DbUnit file importer and exporter |
com.rapiddweller.platform.xls | Excel(TM) Sheet importer and exporter |
Benerator extensions can be bundled as domains (logical extensions) or platforms (technical extensions). You can export different bundles as comma-separated lists:
<import domains="address, net"/>
<import domains="organization"/>
<import platforms="csv, db"/>
Imports must be the first elements used in a descriptor file.
When using a Benerator plugin or another library, you need to make sure that Benerator finds its binary. There are three alternatives:
-
Putting the associated jar file(s) into the lib folder of your Benerator installation. This way it is available for all data generation projects on your machine. If you work in a team where everyone is familiar with Benerator and the toolset is not based on Maven, this is generally the preferred approach.
-
Create a subfolder named lib under the data generation project folder and put the jar file(s) there. When distributing the project to be executed on machines with plain Benerator installations, distribute the full folder content including the lib subfolder.
-
When using Maven to run Benerator, simply create the necessary Maven dependencies and Maven will acquire all needed libraries dynamically. Read more about this in 'Maven Benerator Plugin'
<generate>
<generate>
elements are used to generate data from scratch. There are lots of configuration options. The minimal
configuration specifies the type of data to be generated. For now, all generated data are 'entities' (composite data).
<generate type="Person" count="10" consumer="ConsoleExporter"/>
This will make Benerator generate 10 'Person' Entities and send them to a ConsoleExporter that prints out the persons to the console. But what is a Person? Benerator will figure it out by itself if it knows e.g. a database with a 'PERSON' table, an XML schema with a 'Person' element, or any other 'DescriptorProvider'. Benerator will generate database-valid or XML-Schema-valid data automatically. More about this later.
If you want to generate unlimited amounts of data or let the called components decide how much data to generate,
use count="unbounded"
Let us start without DescriptorProviders, manually putting together what we need.
Entities consist of members, e.g. <attribute>
s, <id>
s or <reference>
s. I will concentrate on attributes in the
following sections and explain ids and references later.
"constant"
The simplest way to define data generation is using the same value for all generated data:
<generate type="Person" count="10" consumer="ConsoleExporter">
<attribute name="active" type="boolean" constant="true"/>
</generate>
So we define, that all Person entities are generated with an 'active' attribute of type 'boolean' that is set to 'true'.
"values"
Attributes may be randomly set from a list of comma-separated values
<generate type="Person" count="10" consumer="ConsoleExporter">
<attribute name="firstName" type="string" values="'Alice','Bob','Charly'"/>
<attribute name="rank" type="int" values="1,2,3"/>
</generate>
So we define, that Person entities have a 'firstName' attribute that is 'Alice', 'Bob' or 'Charly' and a rank of 1, 2 or 3. Note that string literals must be 'quoted', while number or Boolean literals do not.
"pattern": Generation by Regular Expression
String attribute generation can be configured using the "pattern" attribute with a regular expression, for example:
<generate type="Person" count="10" consumer="ConsoleExporter">
<attribute name="salutation" type="string" pattern="(Mr|Mrs)"/>
<attribute name="postalCode" type="string" pattern="[1-9][0-9]{4}"/>
</generate>
You can find a detailed description of Benerator's regular expression support in Regular Expression Support.
<iterate>
The <iterate>
element is used to iterate through pre-existing data, e.g. in a data file or database. The general form
is
<iterate type="Person" source="persons.csv"/>
which iterates through all Persons defined in a CSV-file called 'persons.csv'.
Note
By default, iteration goes once from beginning to the end. Consider using the parameter cyclic="true"
for iterating
repeatedly and check this manual for applying distributions or filter the data to iterate through. Learn more in
Relational Databases->Determining attribute values
"offset"
In whatever type of data generation or iteration, an offset can be applied to skip the heading entries of a data source, e.g.
<iterate type="Person" source="persons.csv" offset="10"/>
leaves out the first ten entries of the persons.csv
file.
<echo>
The meaning of the <echo>
element is similar to the echo command in batch files: Simply writing information to the
console to inform the user what is happening, e.g.
<echo>Running...</echo>
For Mac OS X users there is a nice extra feature: When using type='speech'
, Benerator uses Mac OS X's speech
facility to speak the text. When executed on other operating systems, the text is only printed to the console:
<echo type="speech">Generation Finished</echo>
Echo statements may contain a script expression to be evaluated.
The usual markup like {ftl:
can be omitted when the lang
attribute is used:
<echo lang="ftl">Running on ${context.version}</echo>
<beep/>
makes Benerator emit a short beep
<comment>
The <comment>
element also prints output, not to the console, but to a logger. Thus you have the option of configuring
whether to ignore the output or where to send it to.
<comment>`Here we reach the critical part...`</comment>
Using XML comments <!-- -->
instead of comment descriptors would make it harder for you to comment out larger portions
of a file for testing and debugging.
<execute type="shell">
The <execute>
element serves to execute different kinds of code. One option is the execution of shell commands:
<execute type="shell">start-database.sh</execute>
The program output is printed to the console.
Note
Note that some windows shell commands are only available in the command-line interpreter. In order to invoke them, you need to call cmd /C, e.g.
<execute type="shell">cmd /C type myGeneratedFile.csv</execute>
You can use <execute>
for invoking scripts too (SQL, rapiddwellerScript, JavaScript, FreeMarker and more), but that
will be explained later.
<wait>
The <wait>
element makes Benerator wait for a fixed or a random amount of time.
A fixed amount of time is useful, e.g. for waiting until a system is initialized:
<wait duration="20000"/>
The duration is the time in milliseconds.
Random periods of wait time are useful when using Benerator to simulate client activity on a system. For this, you can
nest <wait>
elements in <generate>
elements. More about this later.
<error>
You can make Benerator signal an error with a message and code:
<error code="-3">An error has occured</error>
Note
If Benerator is not configured to do otherwise, it prints out the error message, cancels execution, finishes the process and returns the exit code to the operating system. If no exit code is specified, Benerator uses -1.
<if>
Evaluates a script expression and executes sub-elements depending on the result.
Either a decision to execute something or not:
<if test="com.rapiddweller.common.SystemInfo.isWindows()">
<echo>Running under Windows</echo>
</if>
or a decision between alternatives:
<if test="com.rapiddweller.common.SystemInfo.isWindows()">
<then>
<execute type="shell">cmd /C type export.csv</execute>
</then>
<else>
<execute type="shell">cat export.csv</execute>
</else>
</if>
A typical application of the <if>
element is to check if a required configuration is defined, and if not, to fall back
to a default...:
<if test="!context.contains('stage')">
<echo>No stage defined, falling back to 'dev'</echo>
<setting name="stage" value="dev"/>
</if>
...or to report an error:
<if test="com.rapiddweller.common.SystemInfo.isWindows()">
<error>No stage has been set</error>
</if>
<while>
The <while>
element executes sub elements as long as a boolean 'test' expression resolves to true:
<generate type="test" count="10" consumer="ConsoleExporter">
<id name="identifier" type="long"/>
<while test="this.identifier == 5">
<echo>add 10 to this.identifier</echo>
<execute type="ben">this.identifier = this.identifier + 10</execute>
<wait duration="1000"/>
</while>
</generate>
<id>
- Generating unique identifiers
For marking an entity member as an identifier, it is declared with an <id>
element, e.g.
<id name="identifier" type="long"/>
There are several special id generators available. If you do not specify one explicitly, Benerator takes the IncrementalIdGenerator.
For explicitly choosing or initializing an id generator, use the generator attribute, e.g.:
<id name="identifier" type="long" generator="new IncrementalIdGenerator(100)"/>
for using an IncrementalIdGenerator, that starts with the value 100.
See Common ID Generators for a complete ID generator reference and Using Relational Databases for database-related id generators.
Instead of using a generator, you can as well use other <attribute>
-like features, e.g. scripts:
<id name="id" type="long" script="parent.id"/>
Naming Conventions
For automatic support of special file content, the following naming conventions apply:
File Name | File Type |
---|---|
*.ben.xml | benerator descriptor file |
*.dbunit.xml | DbUnit data field |
*.csv | CSV file with data of simple type |
*.ent.csv | CSV file with entity data |
*.wgt.csv | CSV file with weighted data of simple type |
*.fcw | Fixed column width files with entity data |
*.set.properties | Dataset nesting definition |