Skipping blank lines in ruby CSV parsing

Jul 13, 2015
ruby, tech
How to ignore blank lines when parsing a csv file with ruby's CSV parser.

I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).

Most of the lines were simply rows of commas:

1
2
3
4
5
6
7
Row,Of,Headers
some,valid,data
,,
,,
,,
,,
,,

The CSV library has an option named skip_blanks but the documentation says Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data, so that’s not actually helpful in this case.

What is needed is skip_lines with a regular expression that will match any lines with just column separators (/^(?:,\s*)+$/).
The resulting code looks like this:

1
2
3
4
5
6
7
8
9
10
require 'csv'
CSV.foreach('/tmp/tmp.csv',
            headers: true,
            skip_blanks: true,
            skip_lines: /^(?:,\s*)+$/) do |row|
  puts row.inspect
end

#<CSV::Row "Row":"some" "Of":"valid" "Headers":"data">
#=> nil
comments powered by Disqus