Thursday, 5 March 2009

The Firebird database: Problem handling UTF8 characters

The 'Latin capital letter I with dot above', İ (Unicode 0130), strikes again! This innocent looking Turkish character seems to be reliable when it comes to breaking software that should be able to handle UTF8. (See also this post for a Java example.)

This time it breaks the Firebird database (in my case, v2.1.1 on a 64-bit Debian system). Downcasing some random characters in a database configured to handle UTF8 works fine:

SELECT LOWER('AӴЁΪΣƓ') FROM RDB$DATABASE

returns the expected string, aӵёϊσɠ.

However, when you throw in the trouble-making İ, everything blows up:

SELECT LOWER('AӴЁΪΣƓİ') FROM RDB$DATABASE
*** IBPP::SQLException ***
Context: Statement::Fetch
Message: isc_dsql_fetch failed.

SQL Message : -104
Invalid token

Engine Code : 335544849
Engine Message :
Malformed string

Slightly different input, generates a different error message:
SELECT LOWER('İA') FROM RDB$DATABASE
*** IBPP::SQLException ***
Context: Statement::Fetch
Message: isc_dsql_fetch failed.

SQL Message : -802
Arithmetic overflow or division by zero has occurred.

Engine Code : 335544321
Engine Message :
arithmetic exception, numeric overflow, or string truncation

There is an item on the Firebird user list, but without any answers so far.

Update: As mariuz points out in a comment below, this defect now seems to be fixed in an upcoming version. See this bug tracker item.

4 comments:

mariuz said...

I have tested on Ubuntu with flamerobin 0.9
x64 with firebird 2.1 your queries and all went ok
Database charset was set to none
Could you tell me the exact version of Debian you use and
versions for wxwidgets , flamerobin ?

Nikolaj Lindberg said...

Hi mariuz,

I don't think Flamerobin has anything to do with this.

I can reproduce the problem on my laptop, running Debian Lenny, using the standard Firebird 2.0 package. Check out the following isql-fb sessions:


SQL> create database 'localhost:/tmp/tr_test.gdb' default character set UTF8;

...

nikolaj@demo:~$ isql-fb -user sysdba -password topsecret -ch utf8
Use CONNECT or CREATE DATABASE to specify a database
SQL> connect '/tmp/tr_test.gdb';
Database: '/tmp/tr_test.gdb', User: sysdba
SQL> SELECT LOWER('M-CM-') FROM RDB$DATABASE;

LOWER
======
ä

SQL> SELECT LOWER('M-DM-0') FROM RDB$DATABASE;

LOWER
======
Statement failed, SQLCODE = -104
Malformed string
SQL>

(Sorry for the strange characters in the input, but isql-fb cannot show these characters correctly. In the first case, the command is LOWER('Ä'), in the second case, it is LOWER('İ').)

mariuz said...

seems to be fixed in 2.1.3

http://tracker.firebirdsql.org/browse/CORE-2355

Anonymous said...

Hi,
I also try to download data from a firebird db to import into Oracle 11g database.
Text files I create with isql statements are terrible with Turkish characters.
Please tell me if you have found a solution to this.
By the way, I am Metin from Istanbul
irmakm_sgc@hotmail.com
Thanks anyway.