Recipe 1.14 Properly Capitalizing a Title or Headline
1.14.1 Problem
You have a string
representing a headline, the title of book, or some other work that
needs proper capitalization.
1.14.2 Solution
Use a variant of this
tc( ) titlecasing function:
INIT {
our %nocap;
for (qw(
a an the
and but or
as at but by for from in into of off on onto per to with
))
{
$nocap{$_}++;
}
}
sub tc {
local $_ = shift;
# put into lowercase if on stop list, else titlecase
s/(\pL[\pL']*)/$nocap{$1} ? lc($1) : ucfirst(lc($1))/ge;
s/^(\pL[\pL']*) /\u\L$1/x; # last word guaranteed to cap
s/ (\pL[\pL']*)$/\u\L$1/x; # first word guaranteed to cap
# treat parenthesized portion as a complete title
s/\( (\pL[\pL']*) /(\u\L$1/x;
s/(\pL[\pL']*) \) /\u\L$1)/x;
# capitalize first word following colon or semi-colon
s/ ( [:;] \s+ ) (\pL[\pL']* ) /$1\u\L$2/x;
return $_;
}
1.14.3 Discussion
The rules for correctly capitalizing a headline or title in English
are more complex than simply capitalizing the first letter of each
word. If that's all you need to do, something like this should
suffice:
s/(\w+\S*\w*)/\u\L$1/g;
Most style guides tell you that the first and last words in the title
should always be capitalized, along with every other word that's not
an article, the particle "to" in an infinitive construct, a
coordinating conjunction, or a preposition.
Here's a demo, this time demonstrating the distinguishing property of
titlecase. Assume the tc function is as defined in
the Solution.
# with apologies (or kudos) to Stephen Brust, PJF,
# and to JRRT, as always.
@data = (
"the enchantress of \x{01F3}ur mountain",
"meeting the enchantress of \x{01F3}ur mountain",
"the lord of the rings: the fellowship of the ring",
);
$mask = "%-20s: %s\n";
sub tc_lame {
local $_ = shift;
s/(\w+\S*\w*)/\u\L$1/g;
return $_;
}
for $datum (@data) {
printf $mask, "ALL CAPITALS", uc($datum);
printf $mask, "no capitals", lc($datum);
printf $mask, "simple titlecase", tc_lame($datum);
printf $mask, "better titlecase", tc($datum);
print "\n";
}
ALL CAPITALS : THE ENCHANTRESS OF DZUR MOUNTAIN
no capitals : the enchantress of dzur mountain
simple titlecase : The Enchantress Of Dzur Mountain
better titlecase : The Enchantress of Dzur Mountain
ALL CAPITALS : MEETING THE ENCHANTRESS OF DZUR MOUNTAIN
no capitals : meeting the enchantress of dzur mountain
simple titlecase : Meeting The Enchantress Of Dzur Mountain
better titlecase : Meeting the Enchantress of Dzur Mountain
ALL CAPITALS : THE LORD OF THE RINGS: THE FELLOWSHIP OF THE RING
no capitals : the lord of the rings: the fellowship of the ring
simple titlecase : The Lord Of The Rings: The Fellowship Of The Ring
better titlecase : The Lord of the Rings: The Fellowship of the Ring
One thing to consider is that some style guides prefer capitalizing
only prepositions that are longer than three, four, or sometimes five
letters. O'Reilly & Associates, for example, keeps prepositions
of four or fewer letters in lowercase. Here's a longer list of
prepositions if you prefer, which you can modify to your needs:
@all_prepositions = qw{
about above absent across after against along amid amidst
among amongst around as at athwart before behind below
beneath beside besides between betwixt beyond but by circa
down during ere except for from in into near of off on onto
out over past per since than through till to toward towards
under until unto up upon versus via with within without
};
This kind of approach can take you only so far, though, because it
doesn't distinguish between words that can be several parts of
speech. Some prepositions on the list might also double as words that
should always be capitalized, such as subordinating conjunctions,
adverbs, or even adjectives. For example, it's "Down by the
Riverside" but "Getting By on Just $30 a Day", or "A Ringing in My
Ears" but "Bringing In the Sheaves".
Another consideration is that you might prefer to apply the
\u or ucfirst conversion by
itself without also putting the whole string into lowercase. That way
a word that's already in all capital letters, such as an acronym,
doesn't lose that trait. You probably wouldn't want to convert "FBI"
and "LBJ" into "Fbi" and "Lbj".
1.14.4 See Also
The uc, lc,
ucfirst, and lcfirst functions
in perlfunc(1) and Chapter 29 of
Programming Perl; the \L,
\U, \l, and
\u string escapes in the "Quote and Quote-like
Operators" section of perlop(1) and Chapter 5 of
Programming Perl
|