JavaScript EditorFreeware JavaScript Editor     Perl Tutorials 



Main Page Previous Section Next Section

Recipe 1.4 Converting Between Characters and Values

1.4.1 Problem

You want to print the number represented by a given character, or you want to print a character given a number.

1.4.2 Solution

Use ord to convert a character to a number, or use chr to convert a number to its corresponding character:

$num  = ord($char);
$char = chr($num);

The %c format used in printf and sprintf also converts a number to a character:

$char = sprintf("%c", $num);                # slower than chr($num)
printf("Number %d is character %c\n", $num, $num);
Number 101 is character e

A C* template used with pack and unpack can quickly convert many 8-bit bytes; similarly, use U* for Unicode characters.

@bytes = unpack("C*", $string);
$string = pack("C*", @bytes);

$unistr = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
@unichars = unpack("U*", $unistr);

1.4.3 Discussion

Unlike low-level, typeless languages such as assembler, Perl doesn't treat characters and numbers interchangeably; it treats strings and numbers interchangeably. That means you can't just assign characters and numbers back and forth. Perl provides Pascal's chr and ord to convert between a character and its corresponding ordinal value:

$value     = ord("e");    # now 101
$character = chr(101);    # now "e"

If you already have a character, it's really represented as a string of length one, so just print it out directly using print or the %s format in printf and sprintf. The %c format forces printf or sprintf to convert a number into a character; it's not used for printing a character that's already in character format (that is, a string).

printf("Number %d is character %c\n", 101, 101);

The pack, unpack, chr, and ord functions are all faster than sprintf. Here are pack and unpack in action:

@ascii_character_numbers = unpack("C*", "sample");
print "@ascii_character_numbers\n";
115 97 109 112 108 101

$word = pack("C*", @ascii_character_numbers);
$word = pack("C*", 115, 97, 109, 112, 108, 101);   # same
print "$word\n";
sample

Here's how to convert from HAL to IBM:

$hal = "HAL";
@byte = unpack("C*", $hal);
foreach $val (@byte) {
    $val++;                 # add one to each byte value
}
$ibm = pack("C*", @byte);
print "$ibm\n";             # prints "IBM"

On single-byte character data, such as plain old ASCII or any of the various ISO 8859 charsets, the ord function returns numbers from 0 to 255. These correspond to C's unsigned char data type.

However, Perl understands more than that: it also has integrated support for Unicode, the universal character encoding. If you pass chr, sprintf "%c", or pack "U*" numeric values greater than 255, the return result will be a Unicode string.

Here are similar operations with Unicode:

@unicode_points = unpack("U*", "fac\x{0327}ade");
print "@unicode_points\n";
102 97 99 807 97 100 101

$word = pack("U*", @unicode_points);
print "$word\n";
façade

If all you're doing is printing out the characters' values, you probably don't even need to use unpack. Perl's printf and sprintf functions understand a v modifier that works like this:

printf "%vd\n", "fac\x{0327}ade";
102.97.99.807.97.100.101

printf "%vx\n", "fac\x{0327}ade";
66.61.63.327.61.64.65

The numeric value of each character (that is, its "code point" in Unicode parlance) in the string is emitted with a dot separator.

1.4.4 See Also

The chr, ord, printf, sprintf, pack, and unpack functions in perlfunc(1) and Chapter 29 of Programming Perl

    Main Page Previous Section Next Section
    
    R7

    JavaScript EditorJavaScript Verifier     Perl Tutorials


    ©