Skipping blank lines in ruby CSV parsing
I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).
Most of the lines were simply rows of commas:
1 2 3 4 5 6 7
Row,Of,Headers some,valid,data ,, ,, ,, ,, ,,
The CSV library has an option named
skip_blanks but the documentation says
Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data, so that’s not actually helpful in this case.
What is needed is
skip_lines with a regular expression that will match any lines with just column separators (
The resulting code looks like this:
1 2 3 4 5 6 7 8 9 10
require 'csv' CSV.foreach('/tmp/tmp.csv', headers: true, skip_blanks: true, skip_lines: /^(?:,\s*)+$/) do |row| puts row.inspect end #<CSV::Row "Row":"some" "Of":"valid" "Headers":"data"> #=> nil