PowerShell : Merge Delimited Files on a Common Key

PowerShell

Many times I need to merge two separate datasets that have a common key column – for example:

dataset 1:

ID, Name, Weight, Height

dataset 2:

ID, Address1, Address2, Phone1, Phone2

I can’t just concat the data together, instead I need the merge to happen around the ID key, so that each ID only has a single row in the result.  So, of course I wrote a powershell script to do this for me.  Take a look at it below.  I called it “Merge-TsvFiles”, but it takes a delimiter as one of its paramters, so you could use this for CSV or whatever.

If you have any thoughts as to how this can be further improved code-wise or speed-wise, please comment!

PowerShell
function Merge-TsvFiles {
  param(
    [string] $file1 = $(throw "file1 required."),
    [string] $file2 = $(throw "file2 required."),
    [string] $delimiter = "`t",
    [string] $key = "ProfileID"
  )

  $data1 = import-csv -path $file1 -delimiter $delimiter
  $data2 = import-csv -path $file2 -delimiter $delimiter

  #write-host "$data1.Length total data rows, and $data2.Length total merge data rows."
  $i = 0

  #grab the new column names
  $newcols = $data2 | gm -MemberType NoteProperty

  Foreach ($datarow in $data1)
  {
    $i++
    $data2 | where-object {$_.$($key) -eq $data1.$($key)} | %{
      Foreach ($col in $newcols) {
        if ($col.Name -ne $key) {
          Add-Member -inputObject $datarow -name $col.Name -value $_.$($col.Name) -MemberType NoteProperty;
        }
      }
    }
  Write-Progress -activity "Merging data" -status "Merged $i of $($data1.Length) rows..." -percentComplete (($i / $data1.length) * 100) 
}

write-host "Writing output..."
$data1 | export-csv "output.txt" -delimiter "`t"
}
PowerShell

Leave a Reply

Your email address will not be published. Required fields are marked *