28 February 2013

Data Factory

When I was writing PLSQL unit tests with Ruby, I came across a need to stage test data into tables. I didn't really want to involve an ORM like Active Record, but working with RAW insert statements quickly became a chore.

One reason is that some of my tables had a lot of columns - many of the columns were not nullable, but for a given test it didn't really matter what values were in most of the columns. For instance if a test needed to prove records with a status of 'pending' were picked up by a stored procedure, I just needed to ensure I had a few records with a status of 'pending', and all the other columns could be set to anything.

After wasting too much time with long insert statements and missing not null fields, I created the Data Factory gem to remove the need to write insert statements at all.

What is Data Factory

DataFactory is a simple Ruby gem that generates data random test data and inserts it into database tables.

DataFactory reads the table definition from the database, and generates random values for all not null columns. It inserts this data into the table, while providing the option of specifying non-random defaults to meet integrity constraints etc.

Usage

DataFactory is a simple gem, so a few examples explore a lot of the functionality. Note that these examples use Simple Oracle JDBC as the database access layer.

For a more complete manual, have a look at the documentation on Rubygems.

For these examples to run, create a table on the database as follows:

create table employees (emp_id     integer,
                        dept_id    integer,
                        first_name varchar2(50),
                        last_name  varchar2(50),
                        email      varchar2(50),
                        ssn        varchar2(10) not null);

Define a DataFactory Class

To use DataFactory, create a class for each table you want to interface with, and make it a sub-class of DataFactory::Base:

class Employee < DataFactory::Base

  set_table_name "employees"

  set_column_default :last_name, "Smith"
  set_column_default :email,   begin    
                                 "#{rand(10000)}@#{rand(10000)}.com"
                               end
end

In the class definition, use the settablename method to map the class to a particular table on the database.

Optionally, you can specify default values for columns in the table with the setcolumndefault method, which takes the table name followed by a value for the column, or a block that generates the value each time it is called, as with the email example.

Creating a Row

The first requirement is to connect to the database, and hand an instance of the database interface to DataFactory:

interface = SimpleOracleJDBC::Interface.create('sodonnel',
                                               'sodonnel',
                                               'local11gr2.world',
                                               'localhost',
                                               '1521')

DataFactory::Base.set_database_interface(interface)

Then a row can be created using the create! method, for example:

f = Employee.create!("emp_id" => 1001)

The create! call will take the column defaults defined in the Employee class, and merge in any column values passed into the create! method. Then it will generate a value for any other non-nullable columns in the table, and insert the row into the database.

An Employee instance is returned, containing all the generated values.

There is also a create method that works just like create! but does not issue a commit.

Finally there is a build method that creates an instance of the class with default and generated values, but does not insert it into the database at all.

Accessing The Column Values

When an instance of a DataFactory class is created, you can access the generated values for the columns with the column_values method, which returns a hash. The keys of the hash are the uppercase column names and the values contain the generated data:

f.column_values.keys.each do |k|
  puts "#{k} :: #{f.column_values[k]}"
end

# EMP_ID :: 1001
# DEPT_ID ::
# FIRST_NAME ::
# LAST_NAME :: Smith
# EMAIL :: 4506@5941.com
# SSN :: Gb3

Notice how columns that are nullable, have not got a default value and were not passed a value are generated with null values.

blog comments powered by Disqus