This page last changed on Mar 14, 2007 by don.willis@atlassian.com.

The database used with Confluence should be configured to use the same character encoding as Confluence. The recommended encoding is Unicode UTF-8.

There are two places where character encoding may need to be configured:

  • when creating the database
  • when connecting to the database (JDBC connection URL or properties).

The configuration details for each type of database are different. Some examples are below.

JDBC connection settings

MySQL

Append "useUnicode=true to your JDBC URL:

jdbc:mysql://hostname:port/database?autoReconnect=true&useUnicode=true

Creating a UTF-8 database

MySQL

CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;

Use the status command to obtain database character encoding information.

For more information see the MySQL documentation.

PostgreSQL

CREATE DATABASE confluence WITH ENCODING 'UNICODE';

Or from the command-line:

$ createdb -E UNICODE confluence

For more information see the PostgreSQL documentation.

For PostgreSQL running under Windows

Please note that international characters sets are only fully supported and functional when using PostgreSQL 8.1 and above under Microsoft Windows.

For PostgreSQL running under Linux

Please make sure you check the following to ensure proper handling of international characters in your database

When PostgreSQL creates an initial database cluster, it sets certain important configuration options based on the host enviroment. The command responsible for creating the PostgreSQL environment initdb will check environment variables such as LC_CTYPE and LC_COLLATE (or the more general LC_ALL) for settings to use as database defaults related to international string handling. As such it is important to make sure that your PostgreSQL environment is configured correctly before you install Confluence.

To do this, connect to your PostgreSQL instance using pgsql and issue the following command:

SHOW LC_CTYPE;

If LC_CTYPE is set to either "C" or "POSIX" then certain string functions such as converting to and from upper and lower case will not work correctly with international characters. Correct settings for this value take the form <LOCALE>.<ENCODING> (en_AU.UTF8 for example).

If your LC_CTYPE is incorrect please check the PostgreSQL documentation for information on configuring database localisation. It is not easy to change these settings with a database that already contains data.

Updating existing database to UTF-8

MySQL database with existing data

Before proceeding with the following changes, please backup your database.

This example shows how to change your database from latin1 to utf8.

  1. Dump the database to a text file using mysqldump tool from the command-line :
    mysqldump -p --default_character-set=latin1 -u <username> --skip-set-charset confluence > confluence_database.sql
  2. Open the SQL file in a text editor and change all character sets from 'latin1' to 'utf8'
  3. cp confluence_database.sql confluence_utf8.sql
  4. Encode all the latin1 characters as UTF-8:
    recode latin1..utf8 confluence_utf8.sql (Recode utility available from http://directory.fsf.org/recode.html)

In MySQL:

  1. DROP DATABASE confluence;
  2. CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;

Finally, reimport the UTF-8 text file:

  1. mysql -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

To support large imports, the parameter '--max_allowed_packet=64M' used above sets the maximum size of an SQL statement to be very large. In some circumstances, you may need to increase it further, especially if attachments are stored in the database.

Testing database encoding

See Troubleshooting Character Encodings for a number of tests you can run to ensure your database encoding is correct.

Related Documentation

Known Issues for MySQL


MySQL screenshot.JPG (image/jpeg)
MySQL screenshot.JPG (image/jpeg)

I had a problem with the character encoding.

I have added &characterEncoding=UTF-8 to the JDBC-URL to solve the problem.

Thanks to the Atlassian support team.

Posted by seyfert@gdsys.de at Jul 04, 2006 00:35

This is necessary with MySQL when the server's encoding is not UTF-8 and the database's is UTF-8. Regardless of the database encoding, it appears the MySQL JDBC drivers use the server's encoding for certain operations.

If you have the choice, change the server's encoding to UTF-8 instead.

You can check your server and database encoding with the status command in MySQL.

Posted by mryall at Jul 05, 2006 02:46

Quoting:

Finally, reimport the UTF-8 text file:

  1. mysql -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

For large imports, add 'max_allowed_packet=32M' under mysqld in /etc/my.cnf.

Is that right? For larger imports you want to try and reduce the maximum allowed packet size compared to the standard suggested command? Wouldn't the command line override the /etc/my.cnf file anyway?

Posted by dhardiker@adaptavist.com at Jul 18, 2006 05:34

Thanks for spotting that, Dan. I've updated it to be a bit clearer.

Posted by mryall at Jul 19, 2006 02:11

I am setting up confluence to use Ms SQL 2000 server. How do I create a sql 2000 database with utf-8 encoding? I create a database using default setting, it failed in "Charactor Encoding test". I feel that Ms use COLLATE instead of encoding in terminology but I can't seem to find Unicode or UTF8 as an option.

 Please advise, thanks

Posted by bsong@conair.ca at Aug 03, 2006 14:58

Microsoft SQL Server supports Unicode by default in new databases, but you may need to fix your collation settings so the case-sensitivity test doesn't fail.

I noticed you raised a support case for this issue. We will respond to you there.

Posted by mryall at Aug 04, 2006 02:13

The cause of this issue has been patched. See CONF-6742 for details.

Posted by david.soul@atlassian.com at Sep 04, 2006 00:13
Document generated by Confluence on Mar 22, 2007 20:58