Friday, April 13, 2012

Creating a delimited text file (test fixture) for Cucumber tests using Ruby.

This post is intended for anyone who needs to create a delimited text file in Ruby.  I talk a lot of specifics using my real world example, but you can can skip the Cucumber parts if that doesn't interest you.  I will briefly discuss other delimiters such as tabs, commas (for CSV files), etc.

A test fixture in software automation is data you have to have in place before you can run your test.  The Cucumber Book suggests using FactoryGirl, and I may very well move this code into that structure in the future when I get around to understanding it.  In the short term, I needed to generate a text file called a Ledes file.  I needed to create these files with data filled in from my Cucumber tests.  After I create the file, I upload it and manipulate it using the web pages on our system.

A Ledes file is used to transfer information between systems in the Legal world.  The standard was developed in 1998, and uses pipe | delimited text files with closed bracket line terminators [].  It's very simple.  A Ledes file is just headers and a bunch of line items.  Below is the file my code created for this example.

Ledes File


LEDES1998B[]
INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID[]
20120412|1334244177||||20120412|20120412|Automated Test Invoice||E|1|||20120412||E115|||Automated Test Line Item|12-3456789|33.42|||ABC.1D2345E[]
20120412|1334244177||||20120412|20120412|Automated Test Invoice||E|1|||20120412||E101|||Automated Test Line Item|12-3456789|127.94|||ABC.1D2345E[]


The first line is just a header for the file.  The second line lists all the headers for each line item.  The third and fourth lines list line items.  You will notice that I left many of the values blank.  This is because I did not need them for my testing.  It was still important to have the pipes there however.  As the Cucumber book suggests, we're going to develop outside in.  So here is my test:

features/create_invoice.feature


  Scenario: Upload Ledes File
    Given that I am logged in as a Firm User
    When I create an invoice with the following values:
      | LAW_FIRM_ID         | 12-3456789  |
      | CLIENT_MATTER_ID[]  | ABC.1D2345E |
    And I upload the invoice with the following line items:
      | EXP/FEE/INV_ADJ_TYPE  | LINE_ITEM_UNIT_COST | LINE_ITEM_EXPENSE_CODE  |
      | E                     |               33.42 | E115                    |
      | E                     |              127.94 | E101                    |
    Then the invoice should be created


As you can see, it's a very simple test.  First I make sure I am logged in as the correct kind of user.  The When/And steps are actually both needed to create an invoice.  The first table stores information that doesn't change between rows.  The second table holds the information that is different for each line item.

You will notice that in the first table, CLIENT_MATTER_ID is followed by brackets [].  This is because I'm directly passing the values through to my object that will create the file.  This means that if I make a typo in the first column of that table, I will get errors in my file.  Similarly, if I have typos in the first row of my second table, I will have errors in my file.  This was a conscious decision on my part when creating a Ubiquitous Language.  (Here is a more in-depth discussion on Ubiquitous Language and Agile Development.)

Everyone in my company is used to looking at and editing Ledes files.  When I use the exact headers in the steps, it is familiar to them.  They are used to seeing the line terminator with CLIENT_MATTER_ID, so I kept it in.  You may have noticed that the ID itself is also followed by the same line terminator in the file.  I left it out in in my table however, because everyone knows that it is not part of the ID, and it would be confusing if it was there and annoying to update the value in tests.

So now lets take a look at the step definitions.  I will skip the login code, as it is not germane to this discussion.  Let's look at the implementation of my When step:

features/step_definitions/create_invoice.rb - When Step


When /^I create an invoice with the following values:$/ do |table|
  @file_values = table.rows_hash
  @invoice_number = @file_values['INVOICE_NUMBER']
end


All this line does is pass information to the next step.  It's a bit of a kluge.  I'm creating two instance variables, and storing the data from the table in them.  First I'm turning my table into a hash using rows_hash which gives me a key/value pair for each row.  This method only works if you have two columns in your table.

Then I'm assigning the @invoice_number instance variable with whatever value is stored with the INVOICE_NUMBER key.  If I don't pass that value (which I didn't in this case), then @invoice_number will have a value of nil.  Why am I doing this?  Let's look at the definition of the And step:

features/step_definitions/create_invoice.rb - And Step


And /^I upload the invoice with the following line items:$/ do |table|
  if @invoice_number.nil?
    @invoice_number = Time.now.to_i.to_s
    @file_values['INVOICE_NUMBER'] = @invoice_number
  end
  filename = Dir.pwd.to_s + "/features/test_data/Automated Ledes File #{@invoice_number}.ledes"
 
  values = table.hashes

  my_file = LedesFile.new(filename)
  values.each do |line|
    line.merge!(@file_values)
    temp_line = LedesLineItem.new(line)
    my_file.write(temp_line)
  end


Meaty goodness.  The first thing I'm doing is checking to see if @invoice_number got a value in the last step.  If not, I'm going to make one up using the current timestamp.  I translate it into an integer to get a string of numbers, and then a string so that it's just a string value.  If I don't turn it into a string, I will have conflicts later on.  Then I'm going to pass my new invoice number into the @file_values hash, because I need it to be there too.

Then I'm going to create a file name.  Actually, it's a file path and file name.  Using Dir.pwd.to_s, I'm going to get my current directory and start my path off with that.  Whenever I run Cucumber, my directory changes to wherever the features/ folder resides.  I'm then creating the filename with the invoice number.  As you can see, if I pass in the invoice number from the script, it will get overwritten.  This is because you can only have one invoice with a given invoice number in the system at a time.

After I set filename, I'm going to create and fill a hash called values using the hashes method.  This turns the table into an array of hashes with the first row as the keys for all the subsequent rows.  Then I create a new LedesFile object.  I use a block to iterate through the values in values (creative n'est-ce pas?).  With each line, I merge the values from @file_values, because they need to be in each line.  I'm using the merge!() method, which is called a bang method.  It forces an overwrite.  This means any values in @file_values overwrite values in line.

Then we create a temp_line LedesLineItem object to hold our line item, and write it to the file.

So let's dig a little deeper and see how LedesLineItem and LedesFile work.  First, let us take a look at features/support/ledes_line_item.rb:

features/support/ledes_line_item.rb


class LedesLineItem

  attr_accessor :line

  def initialize(values)
    units = 1
    @line = Hash.new( "" )
    @line = {
        'INVOICE_DATE' => Date.today.strftime("%Y%m%d"),
        'BILLING_START_DATE' => Date.today.strftime("%Y%m%d"),
        'BILLING_END_DATE' => Date.today.strftime("%Y%m%d"),
        'INVOICE_DESCRIPTION' => "Automated Test Invoice",
        'LINE_ITEM_NUMBER_OF_UNITS' => units,
        'LINE_ITEM_DATE' => Date.today.strftime("%Y%m%d"),
        'LINE_ITEM_DESCRIPTION' => "Automated Test Line Item",
    }

    line.merge!(values)

  end

end

The member line is a hash that can be accessed externally.  I am simply populating it with some default values, and then anything passed through from our step is overwritten with merge!.  So for example if I want to change the INVOICE_DATE, I could just pass that in from my Cucumber step.

And here's the LedesFile object that puts the lines together:

features/support/ledes_file.rb


require_relative "ledes_line_item.rb"

class LedesFile

  attr_accessor :filename

  def initialize(filename)
    @filename = filename
    File.open(@filename, 'w') do |ledes_file|
      # File header
      ledes_file.puts "LEDES1998B[]"
      # Column headers.  I'm told these headers do not change, so I'm hard-coding them here.  I'm also told that the
      # order doesn't matter.  So I put them in and then read them out to make sure the lines get put in the right
      # order.  This means I don't care how the hash is created or passed in.
      ledes_file.puts "INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID[]"
    end
  end

  def write(line)
    File.open(@filename, 'a+') do |ledes_file|
      file = ledes_file.readlines # Read the headers in so we can place values in the appropriate order.
      headers = file[1].split("|")
      headers.each_with_index { |key, i| headers[i] = key.strip }
      headers.each do |key|
        ledes_file.print line.line[key]
        if key == 'CLIENT_MATTER_ID[]'
          ledes_file.print "[]\n"
        else
          ledes_file.print "|"
        end
      end
    end
  end

end

When a LedesFile object is created, a file is actually created on the system.  It is populated with a file headers and then column headers.  When a line is written, I open the file back up for writing, read out the headers, and then use that to make sure that each value from the passed in line goes into the correct column.  Then each column is terminated with a pipe, and if we are putting the CLIENT_MATTER_ID, I know we are at the end of the line and I drop brackets instead.

I never explicitly close the file, as that is handled when the block exits.  If I were worried about performance, I would make sure I could handle an array of lines so I only have to open the file for writing once.  Also, note the require_relative line.  This makes sure that when the LedesLineItem object is reference, we can find it in our current directory.

CSV and Other Delimiters

If I wanted to use a different delimiter, I would edit my headers in line 15 so that it appeared there.  Then in line 22, I would split on that character instead (for tabs use \t).  Line 26 searches for my last header, so you would need to change it to look for a different value, and in line 27 if you just need a new line, remove the brackets.  Finally in line 29, you would change the pipe to your delimiter.  You could of course put the delimiter in a variable, but I leave that as an exercise for the reader.

Conclusion

In the end, there were a number of things I needed to learn, such as how to write to files, and the difference between single and double quote strings in Ruby.  Once I got those ironed out, this was a very easy process.  I write this for anyone coming after me who finds it more of a struggle.

No comments:

Post a Comment