HTML, Unicode, and Perl

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Jeremy C. Reed
Date:  
Subject: HTML, Unicode, and Perl
On Mon, 30 Jun 2003, Matt Alexander wrote:

> Matt Alexander said:
> > Does anyone know how I would take an HTML unicode character and convert=

it
> > to the actual unicode character in a text file using Perl? For example=

,
> > let's say I have López. I'd like the ó to be converted to th=

e
> > character with the o and the accent over it and saved to a plain text
> > file.


#!/usr/bin/perl
while ($line=3D<>) {
$line =3D~ s/(&#)([0-9]+)(;)/ chr($2) /eg;
print $line;
}


(I use the opposite for that to encode for webpages.)

> I figured it out. Browsers use decimal for unicode. So I would convert
> 243 in decimal to F3 in hex and then I can print the character:
>
> perl -e 'print "\x{F3}\n"


$ echo 'L&#243;pez' | ~/scripts/iso-to-ascii
L=F3pez



Jeremy C. Reed
http://bsd.reedmedia.net/