JavaScript Editor Source code editor     What Is Ajax 


Main Page

10.4. Connection Character Sets and Collations

Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:

Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.

Consider what a “connection” is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:

You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip the rest of this section).

There are two statements that affect the connection character sets:

SET NAMES 'charset_name'
SET CHARACTER SET charset_name

SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'cp1251' tells the server “future incoming messages from this client are in character set cp1251.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)

A SET NAMES 'x' statement is equivalent to these three statements:

SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

Setting character_set_connection to x also sets collation_connection to the default collation for x. It is not necessary to set that collation explicitly. To specify a particular collation for the character sets, use the optional COLLATE clause:

SET NAMES 'charset_name' COLLATE 'collation_name'

SET CHARACTER SET is similar to SET NAMES but sets character_set_connection and collation_connection to character_set_database and collation_database. A SET CHARACTER SET x statement is equivalent to these three statements:

SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;

Setting collation_connection also sets character_set_connection to the character set associated with the collation (equivalent to executing SET character_set_connection = @@character_set_database). It is not necessary to set character_set_connection explicitly.

When a client connects, it sends to the server the name of the character set that it wants to use. The server uses the name to set the character_set_client, character_set_results, and character_set_connection system variables. In effect, the server performs a SET NAMES operation using the character set name.

With the mysql client, it is not necessary to execute SET NAMES every time you start up if you want to use a character set different from the default. You can add the --default-character-set option setting to your mysql statement line, or in your option file. For example, the following option file setting changes the three character set variables set to koi8r each time you invoke mysql:

[mysql]
default-character-set=koi8r

If you are using the mysql client with auto-reconnect enabled (which is not recommended), it is preferable to use the charset command rather than SET NAMES. For example:

mysql> charset utf8
Charset changed

The charset command issues a SET NAMES statement, and also changes the default character set that is used if mysql reconnects after the connection has dropped.

Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET CHARACTER SET, then for SELECT column1 FROM t, the server sends back all the values for column1 using the character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET CHARACTER SET latin1 before issuing the SELECT statement, the server converts the latin2 values to latin1 just before sending results back. Conversion may be lossy if there are characters that are not in both character sets.

If you do not want the server to perform any conversion of result sets, set character_set_results to NULL:

SET character_set_results = NULL;

Note: Currently, UCS-2 cannot be used as a client character set, which means that SET NAMES 'ucs2' does not work.

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

You must also consider the environment within which your MySQL application executes. For example, if you will send statements using UTF-8 test taken from a file that you create in an editor, you should edit the file with the locale of your environment set to UTF-8 so that the file's encoding is correct and so that the operating system handles it correctly. For a script that executes in a Web environment, the script must handle the character encoding properly for its interaction with the MySQL server, and it must generate pages that correctly indicate the encoding so that browsers know now to display the content of the pages.

JavaScript Editor Source code editor     What Is Ajax