How to convert old binary MediaWiki tables to UTF-8

I’ve been using Mediawiki for a long time now. One problem about it is binary tables created by very old (~year 2007) version of MediaWiki, text columns are latin1 encoding and latin1_bin collation. If you try to run them with current version of MediaWiki, you get following error:
Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' ()

I’ve used following hack in includes/db/DatabaseMysqli.php file to fix it:
$mysqli->query( 'SET names binary' );
But I felt that it’s time to finally fix my MediaWiki MySQL tables.
Please remember to backup your database before trying that!
How I did it (finally!):

ALTER TABLE `archive` MODIFY `ar_title` varbinary(255);
ALTER TABLE `archive` MODIFY `ar_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `archive` MODIFY `ar_user_text` varbinary(255);
ALTER TABLE `archive` MODIFY `ar_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `category` MODIFY `cat_title` varbinary(255);
ALTER TABLE `category` MODIFY `cat_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `categorylinks` MODIFY `cl_to` varbinary(255);
ALTER TABLE `categorylinks` MODIFY `cl_to` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `categorylinks` MODIFY `cl_sortkey_prefix` varbinary(255);
ALTER TABLE `categorylinks` MODIFY `cl_sortkey_prefix` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `filearchive` MODIFY `fa_name` varbinary(255);
ALTER TABLE `filearchive` MODIFY `fa_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `filearchive` MODIFY `fa_archive_name` varbinary(255);
ALTER TABLE `filearchive` MODIFY `fa_archive_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `filearchive` MODIFY `fa_user_text` varbinary(255);
ALTER TABLE `filearchive` MODIFY `fa_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `image` MODIFY `img_name` varbinary(255);
ALTER TABLE `image` MODIFY `img_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `image` MODIFY `img_user_text` varbinary(255);
ALTER TABLE `image` MODIFY `img_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `imagelinks` MODIFY `il_to` varbinary(255);
ALTER TABLE `imagelinks` MODIFY `il_to` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `interwiki` MODIFY `iw_prefix` varbinary(32);
ALTER TABLE `interwiki` MODIFY `iw_prefix` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `ipblocks` MODIFY `ipb_by_text` varbinary(255);
ALTER TABLE `ipblocks` MODIFY `ipb_by_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `job` MODIFY `job_title` varbinary(255);
ALTER TABLE `job` MODIFY `job_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `langlinks` MODIFY `ll_title` varbinary(255);
ALTER TABLE `langlinks` MODIFY `ll_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `logging` MODIFY `log_title` varbinary(255);
ALTER TABLE `logging` MODIFY `log_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `logging` MODIFY `log_comment` varbinary(255);
ALTER TABLE `logging` MODIFY `log_comment` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `logging` MODIFY `log_user_text` varbinary(255);
ALTER TABLE `logging` MODIFY `log_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `oldimage` MODIFY `oi_name` varbinary(255);
ALTER TABLE `oldimage` MODIFY `oi_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `oldimage` MODIFY `oi_archive_name` varbinary(255);
ALTER TABLE `oldimage` MODIFY `oi_archive_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `oldimage` MODIFY `oi_user_text` varbinary(255);
ALTER TABLE `oldimage` MODIFY `oi_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `page` MODIFY `page_title` varbinary(255) NOT NULL;
ALTER TABLE `page` MODIFY `page_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL;
ALTER TABLE `pagelinks` MODIFY `pl_title` varbinary(255) NOT NULL;
ALTER TABLE `pagelinks` MODIFY `pl_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL;
ALTER TABLE `protected_titles` MODIFY `pt_title` varbinary(255);
ALTER TABLE `protected_titles` MODIFY `pt_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `querycache` MODIFY `qc_title` varbinary(255);
ALTER TABLE `querycache` MODIFY `qc_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `querycachetwo` MODIFY `qcc_title` varbinary(255);
ALTER TABLE `querycachetwo` MODIFY `qcc_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `querycachetwo` MODIFY `qcc_titletwo` varbinary(255);
ALTER TABLE `querycachetwo` MODIFY `qcc_titletwo` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `recentchanges` MODIFY `rc_user_text` varbinary(255);
ALTER TABLE `recentchanges` MODIFY `rc_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `recentchanges` MODIFY `rc_title` varbinary(255);
ALTER TABLE `recentchanges` MODIFY `rc_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `recentchanges` MODIFY `rc_comment` varbinary(255);
ALTER TABLE `recentchanges` MODIFY `rc_comment` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `redirect` MODIFY `rd_title` varbinary(255);
ALTER TABLE `redirect` MODIFY `rd_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `redirect` MODIFY `rd_fragment` varbinary(255);
ALTER TABLE `redirect` MODIFY `rd_fragment` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `revision` MODIFY `rev_user_text` varbinary(255);
ALTER TABLE `revision` MODIFY `rev_user_text` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `templatelinks` MODIFY `tl_title` varbinary(255);
ALTER TABLE `templatelinks` MODIFY `tl_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `user` MODIFY `user_name` varbinary(255);
ALTER TABLE `user` MODIFY `user_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `user` MODIFY `user_real_name` varbinary(255);
ALTER TABLE `user` MODIFY `user_real_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
ALTER TABLE `watchlist` MODIFY `wl_title` varbinary(255);
ALTER TABLE `watchlist` MODIFY `wl_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';

I guess you’ll want to run maintenance/update.php after doing that.

4 comments.

  1. Brischel says:

    If all I need to do is convert a WordPress DB from Latin1 to UTF-8, can I splmiy run this as a MySQL Query?ALTER TABLE categories CHARACTER SET utf8 COLLATE utf8_unicode_ci, CHANGE title title VARBINARY(255)ALTER TABLE categories CHANGE title title VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci

  2. Romnik says:

    This works great, unless you need to reitan the unicode characters. The conversion to VARCHAR rips that away. (or more accurately, converts it to the ascii character that occupies the same byte position in the ascii character table)Not really a solution for dealing with data in an international user environment.

    • Nap says:

      varchar/latin1 -> varbinary -> varchar/utf8 conversion is binary safe and is safe for international wiki. I did test it myself on my wiki with lots of non-ascii unicode characters before posting this.

Add comment


%d bloggers like this: