Matt Alexander wrote:
>Does anyone know how I would take an HTML unicode character and convert it
>to the actual unicode character in a text file using Perl? For example,
>let's say I have López. I'd like the ó to be converted to the
>character with the o and the accent over it and saved to a plain text
>file.
>Thanks,
>~M
>---------------------------------------------------
>PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>To subscribe, unsubscribe, or to change you mail settings:
>http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
>
There are a number of perl modules that deal with unicode,
I haven't used it but a quick glance under
http://www.cpan.org/modules/01modules.index.html
Unicode AMICHAUER <
http://www.cpan.org/authors/id/A/AM/AMICHAUER>
Unicode-Lite-0.12.tar.gz
<
http://www.cpan.org/authors/id/A/AM/AMICHAUER/Unicode-Lite-0.12.tar.gz>
(
http://www.cpan.org/authors/id/A/AM/AMICHAUER/Unicode-Lite-0.12.tar.gz)
showed a function that appeared to do what you want.
Using this module you can likely read in the entire file line by line,
call this function to convert the characters
and write out to a new file.
Heres a snippet of the readme file:
FUNCTIONS
convertor SRC_CP DST_CP [FLGS] [CHAR]
Creates convertor function and returns reference to her, for further
fast direct call.
The param FLGS operates replacing by SBCS->SBCS converting if any
char from SRC_CP is absent at DST_CP. The order of search of
substitution:
UL_7BT - to equivalent 7bit char or sequence of 7bit chars
UL_SEQ - to equivalent char or sequence of chars
UL_EQV - to equivalent char
UL_ENT - to entity - �
UL_CHR - to [CHAR].
UL_ALL - UL_SEQ or UL_EQV and UL_ENT or UL_CHR
JD
--
JD Austin
Database Administrator
Maricopa Community Colleges
john.austin@domail.maricopa.edu
480.731.8759