Fixing a Country Code Muddle

Came back from my holidays to find a report that a drop-down list of countries on our web site did not include Germany. A quick check showed that it also did not include Ireland. Oops.

On investigation, it had to do with a conflict between two tables. One table was using ISO 3166 country codes; the other was using FIPS 10-4. Most of the codes are the same, but many are not. For example, Germany is GM in FIPS-speak and DE in ISO-speak. Even worse, AU is Austria in FIPS-speak but Australia in ISO-speak. The nice folks at daml.org have made available an XML file with the conversions.

I decided arbitrarily to use ISO codes throughout, so I needed to update the table with the FIPS codes. It's a moderately trivial task, but there is one wrinkle. If I just did a one-step update, I could make Austria annex Australia as follows:

  1. FIPS AS → ISO AU
  2. FIPS AU → ISO AT — oops, also converted ISO AU!

To avoid this problem, I first convert to lower-case ISO values and convert back to upper case only after all the codes have been converted.

Here's the Perl code. I make no claims for elegance. I often find myself writing this kind of code where I want to automate some task that could in principle be done manually. As long as it's functional and correct, it does the job for me.


# /usr/bin/perl -w
#
# Generates SQL to replace FIPS 10-4 country codes with their ISO 3166
# equivalents.  Takes as input an RDF file as distributed at
# http://www.daml.org/2001/12/factbook/p-sameIndividualAs
use strict;

my %conversions;
my $fips_code;

while (<>) {
	# This RE is perhaps unduly dependent on the format of the RDF file.
	# Update as necessary for your particular input file.
	if (/countries\/(fips|iso)#(..)'/) {
		if ($1 eq 'fips') {
			$fips_code = $2;
		} else {
			if ($fips_code ne $2) {
				$conversions{$fips_code} = $2;
			}
		}
	}
}

# first step: converts to new code in lower case
#
# By using this approach we are sure we never convert a code that
# has already been converted.  This might otherwise occur with
# codes that are used in both FIPS and ISO with different meanings.
# 'AU' is an example, meaning Austria in FIPS-speak but Australia
# in ISO-speak.
for $fips_code (sort keys %conversions) {
	print "update country_table set country_code = '";
	print lc $conversions{$fips_code};
	print "'\n";
	print " where country_code = '$fips_code';\n";
}


# second step: convert back to upper case
for my $iso_code (sort values %conversions) {
	print "update country_table set country_code = '$iso_code'\n";
	print " where country_code = '";
	print lc ${iso_code};
	print "';\n";
}

About this Entry

This page contains a single entry by Christian published on December 29, 2003 11:16 AM.

The Joys of a Digital Camera was the previous entry in this blog.

Back to Workaday Life is the next entry in this blog.

Find recent content on the main index or look in the archive to find all content.