Saturday, October 27, 2012

Split CSV into multiple files

Receiving files in a format that is...less than helpful for your project needs is almost an everyday occurrence. On this occurrence I received roughly 30,000 files of hourly solar irradiance data. The files had multiple years all crammed together i.e. file 1 had hourly data from years 2002 - 2008.  In my project I needed to separate the individual years of hourly data - Bash to the rescue.

#Get a list of all CSV files
FILES=$(find -name *.csv)
#loop through the files
for file in $FILES
      #Get the basename of the current file
      #get the path of the current file
      #Split file into yearly data files.
      split -l $file 8760

      #Rename files to appropriate years. Split automatically names your output xaa, xab..and so
      # forth depending on how many files result from your split. If you know how many files
      #will result from your split, its easy to rename appropriately.
      mv xaa  $path"/"$base"_2002.csv"
      mv xab  $path"/"$base"_2003.csv"
      mv xac  $path"/"$base"_2004.csv"
      mv xad  $path"/"$base"_2005.csv"
      mv xae  $path"/"$base"_2006.csv"
      mv xaf  $path"/"$base"_2007.csv"
      mv xag  $path"/"$base"_2008.csv"
  echo wrote $file

No comments:

Post a Comment