Validating CSV Files in Rails

Validating CSV imports in Rails

Last Updated on September 3, 2020 by Christopher G Mendla

I needed to validate CSV files that were being imported into a Rails app. The validation was simple: Look for blank entries, ensure columns used for associations (IDs) were integers and check that certain other entries were numeric.

The first thing I checked was to make sure that the header of the CSV file matched what we expected. The simplest way to do this was to set an ‘expected header’ and then check the header on the CSV.

Here is a stripped down example for a hypothetical employee list. In the controller, I would  have :

expected_header = ["EmployeeNo", "First", "Last", "Bonus"]
 csv_error = true if !helpers.check_header(expected_header,csv_file)
We are using Rails 5 so we can use helpers. to call a method in a helper for the controller.
In the helper, we put
def check_header(expected_header,csv_file)
  header = CSV.open(csv_file, 'r') { |csv| csv.first }
  valid_csv = true 
  for i in (0..header.size-1)
    if header[i].downcase != expected_header[i].downcase
    valid_csv = false
  end
end

if !valid_csv
  $csv_error = "Header:
  #{header}
  Expected Header:
  #{expected_header} "
end
return valid_csv
end
A couple of notes on the text below
  • We have already set csv_file in the controller.
  • We are grabbing the first line of the csv which should be the header.
  • We walk through each header.  .downcase is used because there were some differences in case from different sources.
  • we compare the expected header with the header and return valid_csv if it matches
The next lines in the controller are as follows. We are checking to see if we got an error checking the header. If not, then we walk through the data.
if !csv_error
  CSV.foreach(csv_file, :headers => true) do |row|
  csv_error = true if !helpers.check_employee_csv(row)
end
The method called in the helper is something like
def check_lists_csv(row)
  ok = true
  ok = false unless data_is_present(row, 0, "EmployeeID)
  ok = false unless data_is_integer(row, 0, "EmployeeID)
  ok = false unless data_is_present(row, 1, "First")
  ok = false unless data_is_present(row, 2, "Last")
  ok = false unless data_is_present(row, 3, "Bonus)
  ok = false unless_data_is_numeric(row, 3, "Bonus")
  return ok
end
Notes on the code above:
  • row is passed from the controller and contains the line of data from the csv
  • the second parameter is the index of the column. The column numbers begin with 0
  • the third parameter is the name of the column for the error message.
Here are the methods that check for blank, integer and numeric in the helper
 
def data_is_numeric(row, column_number, column_name)
  data_numeric = true
  if row[column_number.to_i].to_s.match(/A[+-]?d+?(.d+)?Z/) == nil
    $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER}          >> #{row} >>  #{column_name} is not numeric
    "
   data_numeric = true
end
return data_numeric
end
def data_is_integer(row, column_number, column_name)
data_integer = true
  if row[column_number.to_i].to_s.match(/^(0|[1-9][0-9]*)$/) ==   nil
  $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >>   #{row} >>  #{column_name} is not an integer
  "
  data_integer = true
end
return data_integer
end
def data_is_present(row, column_number, column_name)
  data_present = true
  if row[column_number.to_i].blank?
    $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >> #{row} >>  #{column_name} is blank
   "
  data_present = false
end
return data_present
end

If the csv validations pass, then we process the csv

if !csv_error
  CSV.foreach(csv_file, :headers => true) do |row|
     update_focus(row)
  end
  file = "Employee"
else
  redirect_to file_error_path and return
  $csv_error = $csv_error + "Your CSV was NOT uploaded  Please fix the error(s) and retry"
end
NOTE – I know that there are problems with global variables but they work for the app.
 
The file error view is simply

There was an error in the file upload
The file you were attempting to upload was not replaced

<%=  $csv_error.html_safe %>

This works and requires about 3 seconds to process about 40k of records.

My first attempt was taking about 8 minutes to process 40k of records. The problem was that I was trying to address variations in case with something like

ok = false unless data_is_present(row, "EmployeeID") || data_is_present(row, "Employeeid")

We had CSVs with both versions of the header. For some reason, that was taking a lot of time to process. By using the index number of the header, it avoided whatever was causing the slow processing.

Note – This code is from a couple of years back and there are deviations from best practices such as redundantly returning the value for a method.

Leave a Reply

Your email address will not be published. Required fields are marked *