Opened 11 years ago

Closed 11 years ago

#1493 closed Defect (Fixed)

Incompatible encoding on certain message in the channel (when using Cyrillic - Windows)

Reported by: snaury@… Owned by: timothy
Component: Colloquy (Mac) Version: 2.2 (Mac)
Severity: Normal Keywords:
Cc: snaury@…

Description

I'm currently using Cyrillic - Windows encoding on RusNet?, and I see once in a while that someone says "всё" on the channel, which shows as incompatible encoding in the channel window (the strange thing is that in the console it sometimes shows as garbage and sometimes shows correctly, as russian word "всё")

Russian word "всё" is E2 F1 B8 is codepage 1251 and I have no idea how these three characters said alone by someone can trigger such a bug. If there is anything else after the word, bug won't trigger, only when someone says this word alone in the channel.

Is there some kind of encoding autodetection that gets mixed up?

Change History (5)

comment:1 Changed 11 years ago by zach

Does this happen when you use UTF-8 encoding as well?

comment:2 Changed 11 years ago by snaury

No, it's not happening when I'm using UTF-8. But when I'm connected to a server using CP1251 (Cyrillic Windows), then the simplest testcase is running "/privmsg MyNickname? :всё" in server console. On CP1251 I see (incompatible encoding) on Colloquy 2.2.1.

comment:3 Changed 11 years ago by snaury

  • Cc snaury@… added

comment:4 Changed 11 years ago by snaury

I found that the problem is an error in isValidUTF8 functions that is used in initWithChatFormat (the whole autodetecting utf8 here is blatantly wrong, in my opinion). The problem is that E2 is seen as a start of triplet (E2 & F0 == E0), then it is not a long triplet, then it checks if third character is a continuation character (B8 & C0 == 80) and voila: it thinks it's valid utf-8, when it's not! The thing here is that all consecutive characters must be continuation characters, and isValidUTF8 never checks that, in fact for all utf-8 bytelens it never checks the second character except for 2-byte case. Please add s[i + 1] in all cases, please!

comment:5 Changed 11 years ago by timothy

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r4407.

Note: See TracTickets for help on using tickets.