Saturday, October 27, 2012

Split CSV into multiple files

Receiving files in a format that is...less than helpful for your project needs is almost an everyday occurrence. On this occurrence I received roughly 30,000 files of hourly solar irradiance data. The files had multiple years all crammed together i.e. file 1 had hourly data from years 2002 - 2008.  In my project I needed to separate the individual years of hourly data - Bash to the rescue.

#Get a list of all CSV files
FILES=$(find -name *.csv)
#loop through the files
for file in $FILES
      #Get the basename of the current file
      #get the path of the current file
      #Split file into yearly data files.
      split -l $file 8760

      #Rename files to appropriate years. Split automatically names your output xaa, xab..and so
      # forth depending on how many files result from your split. If you know how many files
      #will result from your split, its easy to rename appropriately.
      mv xaa  $path"/"$base"_2002.csv"
      mv xab  $path"/"$base"_2003.csv"
      mv xac  $path"/"$base"_2004.csv"
      mv xad  $path"/"$base"_2005.csv"
      mv xae  $path"/"$base"_2006.csv"
      mv xaf  $path"/"$base"_2007.csv"
      mv xag  $path"/"$base"_2008.csv"
  echo wrote $file

Add columns of data to an existing CSV - a case for Bash over Python

I do a lot of scripting in Python, I love it, but sometimes things are just more easy in Bash. Take for instance appending new columns of data into a CSV file. To go about this in Python you would have to do something like the following: Stackoverflow Example. Back yet? Ok, and you would still have to add some logic to open the other file and read through its lines, etc...BUT in Bash, this command is so, so simple:

#paste is the utility and "-d ," specifies the delimiter type
paste -d , file_1.csv file_2.csv > combined_file.csv