Source code editor What Is Ajax
↑
There are default settings for character sets and collations at four levels: server, database, table, and column. The following description may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results.
CHARACTER SET
is used in clauses that specify a character set. CHARSET
may be used as a synonym for CHARACTER SET
.
MySQL Server has a server character set and a server collation. These can be set at server startup and changed at runtime.
Initially, the server character set and collation depend on the options that you use when you start mysqld. You can use --character-set-server
for the character set. Along with it, you can add --collation-server
for the collation. If you don't specify a character set, that is the same as saying --character-set-server=latin1
. If you specify only a character set (for example, latin1
) but not a collation, that is the same as saying --character-set-server=latin1
--collation-server=latin1_swedish_ci
because latin1_swedish_ci
is the default collation for latin1
. Therefore, the following three commands all have the same effect:
shell>mysqld
shell>mysqld --character-set-server=latin1
shell>mysqld --character-set-server=latin1 \
--collation-server=latin1_swedish_ci
One way to change the settings is by recompiling. If you want to change the default server character set and collation when building from sources, use: --with-charset
and --with-collation
as arguments for configure. For example:
shell> ./configure --with-charset=utf-8
Or:
shell>./configure --with-charset=utf-8 \
--with-collation=latin1_german1_ci
Both mysqld and configure verify that the character set/collation combination is valid. If not, each program displays an error message and terminates.
The current server character set and collation can be determined from the values of the character_set_server
and collation_server
system variables. These variables can be changed at runtime.
Every database has a database character set and a database collation. The CREATE DATABASE
and ALTER DATABASE
statements have optional clauses for specifying the database character set and collation:
CREATE DATABASEdb_name
[[DEFAULT] CHARACTER SETcharset_name
] [[DEFAULT] COLLATEcollation_name
] ALTER DATABASEdb_name
[[DEFAULT] CHARACTER SETcharset_name
] [[DEFAULT] COLLATEcollation_name
]
The keyword SCHEMA
can be used instead of DATABASE
.
All database options are stored in a text file named db.opt
that can be found in the database directory.
The CHARACTER SET
and COLLATE
clauses make it possible to create databases with different character sets and collations on the same MySQL server.
Example:
CREATE DATABASE db_name
CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL chooses the database character set and database collation in the following manner:
If both CHARACTER SET
and X
COLLATE
were specified, then character set Y
X
and collation Y
.
If CHARACTER SET
was specified without X
COLLATE
, then character set X
and its default collation.
If COLLATE
was specified without Y
CHARACTER SET
, then the character set associated with Y
and collation Y
.
Otherwise, the server character set and server collation.
The database character set and collation are used as default values if the table character set and collation are not specified in CREATE TABLE
statements. They have no other purpose.
The character set and collation for the default database can be determined from the values of the character_set_database
and collation_database
system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level system variables, character_set_server
and collation_server
.
Every table has a table character set and a table collation. The CREATE TABLE
and ALTER TABLE
statements have optional clauses for specifying the table character set and collation:
CREATE TABLEtbl_name
(column_list
) [[DEFAULT] CHARACTER SETcharset_name
] [COLLATEcollation_name
]] ALTER TABLEtbl_name
[[DEFAULT] CHARACTER SETcharset_name
] [COLLATEcollation_name
]
Example:
CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci;
MySQL chooses the table character set and collation in the following manner:
If both CHARACTER SET
and X
COLLATE
were specified, then character set Y
X
and collation Y
.
If CHARACTER SET
was specified without X
COLLATE
, then character set X
and its default collation.
If COLLATE
was specified without Y
CHARACTER SET
, then the character set associated with Y
and collation Y
.
Otherwise, the database character set and collation.
The table character set and collation are used as default values if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.
Every “character” column (that is, a column of type CHAR
, VARCHAR
, or TEXT
) has a column character set and a column collation. Column definition syntax has optional clauses for specifying the column character set and collation:
col_name
{CHAR | VARCHAR | TEXT} (col_length
) [CHARACTER SETcharset_name
] [COLLATEcollation_name
]
Example:
CREATE TABLE Table1 ( column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci );
MySQL chooses the column character set and collation in the following manner:
If both CHARACTER SET
and X
COLLATE
were specified, then character set Y
X
and collation Y
are used.
If CHARACTER SET
was specified without X
COLLATE
, then character set X
and its default collation are used.
If COLLATE
was specified without Y
CHARACTER SET
, then the character set associated with Y
and collation Y
.
Otherwise, the table character set and collation are used.
The CHARACTER SET
and COLLATE
clauses are standard SQL.
Every character string literal has a character set and a collation.
A character string literal may have an optional character set introducer and COLLATE
clause:
[_charset_name
]'string
' [COLLATEcollation_name
]
Examples:
SELECT 'string
'; SELECT _latin1'string
'; SELECT _latin1'string
' COLLATE latin1_danish_ci;
For the simple statement SELECT '
, the string has the character set and collation defined by the string
'character_set_connection
and collation_connection
system variables.
The _
expression is formally called an introducer. It tells the parser, “the string that is about to follow uses character set charset_name
X
.” Because this has confused people in the past, we emphasize that an introducer does not cause any conversion; it is strictly a signal that does not change the string's value. An introducer is also legal before standard hex literal and numeric hex literal notation (x'
and literal
'0x
)>. nnnn
Examples:
SELECT _latin1 x'AABBCC'; SELECT _latin1 0xAABBCC;
MySQL determines a literal's character set and collation in the following manner:
If both _X
and COLLATE
were specified, then character set Y
X
and collation Y
are used.
If _X
is specified but COLLATE
is not specified, then character set X
and its default collation are used.
Otherwise, the character set and collation given by the character_set_connection
and collation_connection
system variables are used.
Examples:
A string with latin1
character set and latin1_german1_ci
collation:
SELECT _latin1'Mьller' COLLATE latin1_german1_ci;
A string with latin1
character set and its default collation (that is, latin1_swedish_ci
):
SELECT _latin1'Mьller';
A string with the connection default character set and collation:
SELECT 'Mьller';
Character set introducers and the COLLATE
clause are implemented according to standard SQL specifications.
An introducer indicates the character set for the following string, but does not change now how the parser performs escape processing within the string. Escapes are always interpreted by the parser according to the character set given by character_set_connection
.
The following examples show that escape processsing occurs using character_set_connection
even in the presence of an introducer. The examples use SET NAMES
(which changes character_set_connection
, as discussed in Section 10.4, “Connection Character Sets and Collations”), and display the resulting strings using the HEX()
function so that the exact string contents can be seen.
Example 1:
mysql>SET NAMES latin1;
Query OK, 0 rows affected (0.01 sec) mysql>SELECT HEX('а\n'), HEX(_sjis'а\n');
+------------+-----------------+ | HEX('а\n') | HEX(_sjis'а\n') | +------------+-----------------+ | E00A | E00A | +------------+-----------------+ 1 row in set (0.00 sec)
Here, ‘а
’ (hex value E0
) is followed by ‘\n
’, the escape sequence for newline. The escape sequence is interpreted using the character_set_connection
value of latin1
to produce a literal newline (hex value 0A
). This happens even for the second string. That is, the introducer of _sjis
does not affect the parser's escape processing.
Example 2:
mysql>SET NAMES sjis;
Query OK, 0 rows affected (0.00 sec) mysql>SELECT HEX('а\n'), HEX(_latin1'а\n');
+------------+-------------------+ | HEX('а\n') | HEX(_latin1'а\n') | +------------+-------------------+ | E05C6E | E05C6E | +------------+-------------------+ 1 row in set (0.04 sec)
Here, character_set_connection
is sjis
, a character set in which the sequence of ‘а
’ followed by ‘\
’ (hex values 05
and 5C
) is a valid multi-byte character. Hence, the first two bytes of the string are interpreted as a single sjis
character, and the ‘\
’ is not intrepreted as an escape character. The following ‘n
’ (hex value 6E
) is not interpreted as part of an escape sequence. This is true even for the second string; the introducer of _latin1
does not affect escape processing.
Standard SQL defines NCHAR
or NATIONAL CHAR
as a way to indicate that a CHAR
column should use some predefined character set. MySQL 5.0 uses utf8
as this predefined character set. For example, these data type declarations are equivalent:
CHAR(10) CHARACTER SET utf8 NATIONAL CHARACTER(10) NCHAR(10)
As are these:
VARCHAR(10) CHARACTER SET utf8 NATIONAL VARCHAR(10) NCHAR VARCHAR(10) NATIONAL CHARACTER VARYING(10) NATIONAL CHAR VARYING(10)
You can use N'
to create a string in the national character set. These two statements are equivalent:literal
'
SELECT N'some text'; SELECT _utf8'some text';
For information on upgrading character sets to MySQL 5.0 from versions prior to 4.1, see the MySQL 3.23, 4.0, 4.1 Reference Manual.
The following examples show how MySQL determines default character set and collation values.
Example 1: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci ) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;
Here we have a column with a latin1
character set and a latin1_german1_ci
collation. The definition is explicit, so that's straightforward. Notice that there is no problem with storing a latin1
column in a latin2
table.
Example 2: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
This time we have a column with a latin1
character set and a default collation. Although it might seem natural, the default collation is not taken from the table level. Instead, because the default collation for latin1
is always latin1_swedish_ci
, column c1
has a collation of latin1_swedish_ci
(not latin1_danish_ci
).
Example 3: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
We have a column with a default character set and a default collation. In this circumstance, MySQL checks the table level to determine the column character set and collation. Consequently, the character set for column c1
is latin1
and its collation is latin1_danish_ci
.
Example 4: Database, Table, and Column Definition
CREATE DATABASE d1 DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci; USE d1; CREATE TABLE t1 ( c1 CHAR(10) );
We create a column without specifying its character set and collation. We're also not specifying a character set and a collation at the table level. In this circumstance, MySQL checks the database level to determine the table settings, which thereafter become the column settings.) Consequently, the character set for column c1
is latin2
and its collation is latin2_czech_ci
.