Came back from my holidays to find a report that a drop-down list of countries on our web site did not include Germany. A quick check showed that it also did not include Ireland. Oops.
On investigation, it had to do with a conflict between two tables.
One table was using
ISO 3166 country codes;
the other was using
FIPS 10-4.
Most of the codes are the same, but many are not. For example, Germany is
GM
in FIPS-speak and DE
in ISO-speak.
Even worse, AU
is Austria in FIPS-speak but Australia in
ISO-speak. The nice folks at daml.org have made available
an XML file with the conversions.
I decided arbitrarily to use ISO codes throughout, so I needed to update the table with the FIPS codes. It's a moderately trivial task, but there is one wrinkle. If I just did a one-step update, I could make Austria annex Australia as follows:
- FIPS
AS
→ ISOAU
- FIPS
AU
→ ISOAT
— oops, also converted ISOAU
!
To avoid this problem, I first convert to lower-case ISO values and convert back to upper case only after all the codes have been converted.
Here's the Perl code. I make no claims for elegance. I often find myself writing this kind of code where I want to automate some task that could in principle be done manually. As long as it's functional and correct, it does the job for me.
# /usr/bin/perl -w # # Generates SQL to replace FIPS 10-4 country codes with their ISO 3166 # equivalents. Takes as input an RDF file as distributed at # http://www.daml.org/2001/12/factbook/p-sameIndividualAs use strict; my %conversions; my $fips_code; while (<>) { # This RE is perhaps unduly dependent on the format of the RDF file. # Update as necessary for your particular input file. if (/countries\/(fips|iso)#(..)'/) { if ($1 eq 'fips') { $fips_code = $2; } else { if ($fips_code ne $2) { $conversions{$fips_code} = $2; } } } } # first step: converts to new code in lower case # # By using this approach we are sure we never convert a code that # has already been converted. This might otherwise occur with # codes that are used in both FIPS and ISO with different meanings. # 'AU' is an example, meaning Austria in FIPS-speak but Australia # in ISO-speak. for $fips_code (sort keys %conversions) { print "update country_table set country_code = '"; print lc $conversions{$fips_code}; print "'\n"; print " where country_code = '$fips_code';\n"; } # second step: convert back to upper case for my $iso_code (sort values %conversions) { print "update country_table set country_code = '$iso_code'\n"; print " where country_code = '"; print lc ${iso_code}; print "';\n"; }