December 6

1 comments

Validating CSV imports in Rails

By Christopher Mendla

December 6, 2017


Last Updated on September 9, 2023 by Christopher G Mendla

I needed to validate CSV files that were being imported into a Rails app. The validation was simple: Look for blank entries, ensure columns used for associations (IDs) were integers and check that certain other entries were numeric.

The first thing I checked was to make sure that the header of the CSV file matched what we expected. The simplest way to do this was to set an ‘expected header’ and then check the header on the CSV.

An example

Here is a stripped down example for a hypothetical employee list. In the controller, I would  have :

expected_header = ["EmployeeNo", "First", "Last", "Bonus"]
 csv_error = true if !helpers.check_header(expected_header,csv_file)

A helper

We are using Rails 5 so we can use helpers. to call a method in a helper for the controller.
In the helper, we put
def check_header(expected_header,csv_file)
  header = CSV.open(csv_file, 'r') { |csv| csv.first }
  valid_csv = true 
  for i in (0..header.size-1)
    if header[i].downcase != expected_header[i].downcase
    valid_csv = false
  end
end

if !valid_csv
  $csv_error = "Header:
  #{header}
  Expected Header:
  #{expected_header} "
end
return valid_csv
end

Notes

A couple of notes on the text below
  • We have already set csv_file in the controller.
  • We are grabbing the first line of the csv which should be the header.
  • We walk through each header.  .downcase is used because there were some differences in case from different sources.
  • we compare the expected header with the header and return valid_csv if it matches
The next lines in the controller are as follows. We are checking to see if we got an error checking the header. If not, then we walk through the data.
if !csv_error
  CSV.foreach(csv_file, :headers => true) do |row|
  csv_error = true if !helpers.check_employee_csv(row)
end
The method called in the helper is something like
def check_lists_csv(row)
  ok = true
  ok = false unless data_is_present(row, 0, "EmployeeID)
  ok = false unless data_is_integer(row, 0, "EmployeeID)
  ok = false unless data_is_present(row, 1, "First")
  ok = false unless data_is_present(row, 2, "Last")
  ok = false unless data_is_present(row, 3, "Bonus)
  ok = false unless_data_is_numeric(row, 3, "Bonus")
  return ok
end
Notes on the code above:
  • row is passed from the controller and contains the line of data from the csv
  • the second parameter is the index of the column. The column numbers begin with 0
  • the third parameter is the name of the column for the error message.
Here are the methods that check for blank, integer and numeric in the helper
 
def data_is_numeric(row, column_number, column_name)
  data_numeric = true
  if row[column_number.to_i].to_s.match(/A[+-]?d+?(.d+)?Z/) == nil
    $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER}          >> #{row} >>  #{column_name} is not numeric
    "
   data_numeric = true
end
return data_numeric
end
def data_is_integer(row, column_number, column_name)
data_integer = true
  if row[column_number.to_i].to_s.match(/^(0|[1-9][0-9]*)$/) ==   nil
  $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >>   #{row} >>  #{column_name} is not an integer
  "
  data_integer = true
end
return data_integer
end
def data_is_present(row, column_number, column_name)
  data_present = true
  if row[column_number.to_i].blank?
    $csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >> #{row} >>  #{column_name} is blank
   "
  data_present = false
end
return data_present
end

If the csv validations pass, then we process the csv

if !csv_error
  CSV.foreach(csv_file, :headers => true) do |row|
     update_focus(row)
  end
  file = "Employee"
else
  redirect_to file_error_path and return
  $csv_error = $csv_error + "Your CSV was NOT uploaded  Please fix the error(s) and retry"
end
NOTE – I know that there are problems with global variables but they work for the app.
 
The file error view is simply

There was an error in the file upload
The file you were attempting to upload was not replaced

<%=  $csv_error.html_safe %>

Results

This works and requires about 3 seconds to process about 40k of records.

My first attempt was taking about 8 minutes to process 40k of records. The problem was that I was trying to address variations in case with something like

ok = false unless data_is_present(row, "EmployeeID") || data_is_present(row, "Employeeid")

We had CSVs with both versions of the header. For some reason, that was taking a lot of time to process. By using the index number of the header, it avoided whatever was causing the slow processing.

Note – This code is from a couple of years back and there are deviations from best practices such as redundantly returning the value for a method.

Christopher Mendla

About the author

Leave a Reply

Your email address will not be published. Required fields are marked

  1. Olá amigo, estou tentando validar um CSV, e acabei achando seu post e gostaria de saber como posso migrar esse seu código para Ruby puro. É um desafio que tenho que realizar mas como sou novo no Ruby e também na área de de programação principalmente back-end, estou perdido. Se houver a possibilidade me socorrer ficarei muito grato.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}