Recipe 1.4 Converting Between Characters and Values
1.4.1 Problem
You want to
print the number represented by a given character, or you want to
print a character given a
number.
1.4.2 Solution
Use ord to convert a
character to a number, or use chr to convert a
number to its corresponding character:
$num = ord($char);
$char = chr($num);
The %c format used in printf
and sprintf also converts a number to a character:
$char = sprintf("%c", $num); # slower than chr($num)
printf("Number %d is character %c\n", $num, $num);
Number 101 is character e
A C* template used with pack
and unpack can quickly convert many 8-bit bytes;
similarly, use U* for Unicode characters.
@bytes = unpack("C*", $string);
$string = pack("C*", @bytes);
$unistr = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
@unichars = unpack("U*", $unistr);
1.4.3 Discussion
Unlike low-level,
typeless languages such as assembler, Perl doesn't treat characters
and numbers interchangeably; it treats strings
and numbers interchangeably. That means you can't just assign
characters and numbers back and forth. Perl provides Pascal's
chr and ord to convert between
a character and its corresponding ordinal value:
$value = ord("e"); # now 101
$character = chr(101); # now "e"
If you already
have a character, it's really represented as a string of length one,
so just print it out directly using print or the
%s format in printf and
sprintf. The %c format forces
printf or sprintf to convert a
number into a character; it's not used for printing a character
that's already in character format (that is, a string).
printf("Number %d is character %c\n", 101, 101);
The
pack, unpack,
chr, and ord functions are all
faster than sprintf. Here are
pack and unpack in action:
@ascii_character_numbers = unpack("C*", "sample");
print "@ascii_character_numbers\n";
115 97 109 112 108 101
$word = pack("C*", @ascii_character_numbers);
$word = pack("C*", 115, 97, 109, 112, 108, 101); # same
print "$word\n";
sample
Here's how to convert from HAL to IBM:
$hal = "HAL";
@byte = unpack("C*", $hal);
foreach $val (@byte) {
$val++; # add one to each byte value
}
$ibm = pack("C*", @byte);
print "$ibm\n"; # prints "IBM"
On single-byte character data, such as plain old ASCII or any of the
various ISO 8859 charsets, the ord function
returns numbers from 0 to 255. These correspond to C's
unsigned char data type.
However, Perl understands more than that: it also has integrated
support for Unicode, the universal character encoding. If you pass
chr, sprintf "%c", or
pack "U*" numeric values greater than 255, the
return result will be a Unicode string.
Here are similar operations with Unicode:
@unicode_points = unpack("U*", "fac\x{0327}ade");
print "@unicode_points\n";
102 97 99 807 97 100 101
$word = pack("U*", @unicode_points);
print "$word\n";
façade
If all you're doing is printing out the characters' values, you
probably don't even need to use unpack. Perl's
printf and sprintf functions
understand a v modifier that works like this:
printf "%vd\n", "fac\x{0327}ade";
102.97.99.807.97.100.101
printf "%vx\n", "fac\x{0327}ade";
66.61.63.327.61.64.65
The numeric value of each character (that is, its "code point" in
Unicode parlance) in the string is emitted with a dot separator.
1.4.4 See Also
The
chr, ord,
printf, sprintf,
pack, and unpack functions in
perlfunc(1) and Chapter 29 of
Programming Perl
|