On the Theme of Unexpected Behaviour...

| No Comments

I was processing a file with UTF-8 text and finding that my output was coming out in ISO-8859-1 (a.k.a. latin1). I verified this first using od -c | less and then by running recode latin1..utf8. Either Perl, XML::Parser or my code was silently converting the text.

It turned out that Perl was responsible this time. Or at least, the fix was at the Perl level. Again, the answer is in the friendly man pages, specifically in perluniintro. There are three alternatives:

  1. open FH, ">:utf8", "file";
  2. binmode(STDOUT, ":utf8");
  3. use open ’:utf8’; # open as a pragma

That makes not one but two cases recently solution was in the man pages. It helps that these are new style man pages, with lots of tasty example code.

Leave a comment