Many times I need to merge two separate datasets that have a common key column – for example:
dataset 1:
ID, Name, Weight, Height
dataset 2:
ID, Address1, Address2, Phone1, Phone2
I can’t just concat the data together, instead I need the merge to happen around the ID key, so that each ID only has a single row in the result. So, of course I wrote a powershell script to do this for me. Take a look at it below. I called it “Merge-TsvFiles”, but it takes a delimiter as one of its paramters, so you could use this for CSV or whatever.
If you have any thoughts as to how this can be further improved code-wise or speed-wise, please comment!
function Merge-TsvFiles {
param(
[string] $file1 = $(throw "file1 required."),
[string] $file2 = $(throw "file2 required."),
[string] $delimiter = "`t",
[string] $key = "ProfileID"
)
$data1 = import-csv -path $file1 -delimiter $delimiter
$data2 = import-csv -path $file2 -delimiter $delimiter
#write-host "$data1.Length total data rows, and $data2.Length total merge data rows."
$i = 0
#grab the new column names
$newcols = $data2 | gm -MemberType NoteProperty
Foreach ($datarow in $data1)
{
$i++
$data2 | where-object {$_.$($key) -eq $data1.$($key)} | %{
Foreach ($col in $newcols) {
if ($col.Name -ne $key) {
Add-Member -inputObject $datarow -name $col.Name -value $_.$($col.Name) -MemberType NoteProperty;
}
}
}
Write-Progress -activity "Merging data" -status "Merged $i of $($data1.Length) rows..." -percentComplete (($i / $data1.length) * 100)
}
write-host "Writing output..."
$data1 | export-csv "output.txt" -delimiter "`t"
}
PowerShell