HTML, Unicode, and Perl

Attachments:
Message as email (text/plain)

Author: Jeremy C. Reed
Date:
Subject: HTML, Unicode, and Perl

On Mon, 30 Jun 2003, Matt Alexander wrote:

> Matt Alexander said:
> > Does anyone know how I would take an HTML unicode character and convert=
it
> > to the actual unicode character in a text file using Perl? For example=
,
> > let's say I have López. I'd like the ó to be converted to th=
e
> > character with the o and the accent over it and saved to a plain text
> > file.

#!/usr/bin/perl
while ($line=3D<>) {
$line =3D~ s/(&#)([0-9]+)(;)/ chr($2) /eg;
print $line;
}

(I use the opposite for that to encode for webpages.)

> I figured it out. Browsers use decimal for unicode. So I would convert
> 243 in decimal to F3 in hex and then I can print the character:
>
> perl -e 'print "\x{F3}\n"

$ echo 'López' | ~/scripts/iso-to-ascii
L=F3pez

Jeremy C. Reed
http://bsd.reedmedia.net/

This message is part of the following thread:
	the complete thread tree sorted by date
	JD Austin at