Tuesday, May 25, 2010

The joy of awk: transforming group vCards

A system's analyst's life involves more data munging than I'd like to admit. Despite all of the advances in computing, sometimes the old tools are the best tools. Such is the case with awk and its even-more abstruse cousin, sed.

Awk really isn't all that bad. The way to envision it is a giant while-loop with branching if-blocks that use regular expressions to match parts of the current lines and then do stuff like print parts of lines or calculate things. Its syntax and grammar is more friendly to casual use than perl or bash or sed.

Anyway, here's an awk program I spent about 60 minutes writing that transforms group vCards into csv files, because Microsoft Outlook only reads the first entry from a group vCard and Address Book on Macintosh doesn't write anything except group vCards. Maybe it will be of use to you.

vcard2csv.awk

BEGIN {newcard = 0
FS = ":"
name = ""
orgname = ""
email = ""
OFS = ","
}
$1 == "BEGIN" {
newcard = 1
next}
newcard == 1 {
if ($1 == "FN"){
name = $2
}
if ($1 == "ORG"){
split($2, cleanedorg, ";")
orgname = cleanedorg[1]
}
if ($1 == "EMAIL;type=INTERNET;type=WORK;type=pref"){
email = $2
}
}
$1 == "END" {
print name, orgname, email
name = ""
orgname = ""
email = ""
newcard = 0
}


You can run it by typing awk -f vcard2csv.awk < export.vcf > export.csv in a terminal. You might need to convert the line endings in the vcf to Unix (LF) first.