Came back from my holidays to find a report that a drop-down list of countries on our web site did not include Germany. A quick check showed that it also did not include Ireland. Oops.
On investigation, it had to do with a conflict between two tables.
One table was using
ISO 3166 country codes;
the other was using
FIPS 10-4.
Most of the codes are the same, but many are not. For example, Germany is
GM in FIPS-speak and DE in ISO-speak.
Even worse, AU is Austria in FIPS-speak but Australia in
ISO-speak. The nice folks at daml.org have made available
an XML file with the conversions.
I decided arbitrarily to use ISO codes throughout, so I needed to update the table with the FIPS codes. It's a moderately trivial task, but there is one wrinkle. If I just did a one-step update, I could make Austria annex Australia as follows:
- FIPS
AS→ ISOAU - FIPS
AU→ ISOAT— oops, also converted ISOAU!
To avoid this problem, I first convert to lower-case ISO values and convert back to upper case only after all the codes have been converted.
Here's the Perl code. I make no claims for elegance. I often find myself writing this kind of code where I want to automate some task that could in principle be done manually. As long as it's functional and correct, it does the job for me.
# /usr/bin/perl -w
#
# Generates SQL to replace FIPS 10-4 country codes with their ISO 3166
# equivalents. Takes as input an RDF file as distributed at
# http://www.daml.org/2001/12/factbook/p-sameIndividualAs
use strict;
my %conversions;
my $fips_code;
while (<>) {
# This RE is perhaps unduly dependent on the format of the RDF file.
# Update as necessary for your particular input file.
if (/countries\/(fips|iso)#(..)'/) {
if ($1 eq 'fips') {
$fips_code = $2;
} else {
if ($fips_code ne $2) {
$conversions{$fips_code} = $2;
}
}
}
}
# first step: converts to new code in lower case
#
# By using this approach we are sure we never convert a code that
# has already been converted. This might otherwise occur with
# codes that are used in both FIPS and ISO with different meanings.
# 'AU' is an example, meaning Austria in FIPS-speak but Australia
# in ISO-speak.
for $fips_code (sort keys %conversions) {
print "update country_table set country_code = '";
print lc $conversions{$fips_code};
print "'\n";
print " where country_code = '$fips_code';\n";
}
# second step: convert back to upper case
for my $iso_code (sort values %conversions) {
print "update country_table set country_code = '$iso_code'\n";
print " where country_code = '";
print lc ${iso_code};
print "';\n";
}