Main Page |
Recipe 1.5 Using Named Unicode Characters1.5.1 ProblemYou want to use Unicode names for fancy characters in your code without worrying about their code points. 1.5.2 SolutionPlace a use charnames at the top of your file, then freely insert "\N{CHARSPEC}" escapes into your string literals. 1.5.3 DiscussionThe use charnames pragma lets you use symbolic names for Unicode characters. These are compile-time constants that you access with the \N{CHARSPEC} double-quoted string sequence. Several subpragmas are supported. The :full subpragma grants access to the full range of character names, but you have to write them out in full, exactly as they occur in the Unicode character database, including the loud, all-capitals notation. The :short subpragma gives convenient shortcuts. Any import without a colon tag is taken to be a script name, giving case-sensitive shortcuts for those scripts. use charnames ':full'; print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n"; D is called delta. use charnames ':short'; print "\N{greek:Delta} is an upper-case delta.\n"; D is an upper-case delta. use charnames qw(cyrillic greek); print "\N{Sigma} and \N{sigma} are Greek sigmas.\n"; print "\N{Be} and \N{be} are Cyrillic bes.\n"; S and s are Greek sigmas. and are Cyrillic bes. Two functions, charnames::viacode and charnames::vianame, can translate between numeric code points and the long names. The Unicode documents use the notation U+XXXX to indicate the Unicode character whose code point is XXXX, so we'll use that here in our output. use charnames qw(:full); for $code (0xC4, 0x394) { printf "Character U+%04X (%s) is named %s\n", $code, chr($code), charnames::viacode($code); } Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESIS Character U+0394 (D) is named GREEK CAPITAL LETTER DELTA use charnames qw(:full); $name = "MUSIC SHARP SIGN"; $code = charnames::vianame($name); printf "%s is character U+%04X (%s)\n", $name, $code, chr($code); MUSIC SHARP SIGN is character U+266F (#) Here's how to find the path to Perl's copy of the Unicode character database: % perl -MConfig -le 'print "$Config{privlib}/unicore/NamesList.txt"'
/usr/local/lib/perl5/5.8.1/unicore/NamesList.txt
Read this file to learn the character names available to you. 1.5.4 See AlsoThe charnames(3) manpage and Chapter 31 of Programming Perl; the Unicode Character Database at http://www.unicode.org/ |
Main Page |