07 Mar, 2010, flumpy wrote in the 1st comment:
Votes: 0
Hey

I'm trying to write some foreign characters onto the users screen using my annoying telnet library. It's coming out as garbage, and I guess its doing something weird with the encoding.
For example:

Town Square  
Vous �tes debout dans la place de la ville et groovy tout va bien. C'est une belle journ�e et le ciel est clair.
Les gens bussle propos dans leur vie quotidienne. Vous �tes debout dans le centre-ville.
Vous voyez:Un paquet de flambeaux.


The lib I'm using is the telnetd library for java (well, my own bugfixed version), but I'm not sure if it's specifically a limitation of this library (not got the functionality) or something I could fix with a simple telnet instruction, or what.

Really, I'm a bit stumped. Can anyone help? What would you need to be able to help?
07 Mar, 2010, David Haley wrote in the 2nd comment:
Votes: 0
How are you writing the characters to the stream? Does this happen in all clients? (Or is it a client bug?) What if you print to stdout rather than to telnet? What character data is being received by the client?
07 Mar, 2010, flumpy wrote in the 3rd comment:
Votes: 0
Some background: I'm sending off the translation to google translate (URL encoding first) and then html unescaping the return values. I am sending the header Accept-Charset as UTF-8 (the same as the URL encoding).

I am using windows telnet and mushclient to test.

When I first output to stdout (using println) I got some weird chars too, cause I was url encoding the string before hand with ISO-8859-1:
G Town Square a 

f Vous ?tes debout dans la place de la ville et groovy tout va bien.
C'est une belle journ?e et le ciel est clair. Les gens bussle propos dans leur vie quotidienne.
Vous ?tes debout dans le centre-ville. ..


When I changed it to UTF-8 I got correct stdout output:

f Vous tes debout dans la place de la ville et groovy tout va bien.
C'est une belle journe et le ciel est clair.
Les gens bussle propos dans leur vie quotidienne.
Vous tes debout dans le centre-ville. a

Vous voyez:
Un paquet de flambeaux


.. but mushclient was b0rked:

Vous êtes debout dans la place de la ville et groovy tout va bien.
C'est une belle journée et le ciel est clair. Les gens bussle propos dans leur vie quotidienne.
Vous êtes debout dans le centre-ville.
Vous voyez:Un paquet de flambeaux.




Any clues?
07 Mar, 2010, Scandum wrote in the 4th comment:
Votes: 0
Looks like mushclient isn't handling utf-8 very well. You'd probably want to use extended ASCII instead.
08 Mar, 2010, David Haley wrote in the 5th comment:
Votes: 0
MUSHclient is supposed to handle UTF-8, although you might have to click an option somewhere. It sounds like a client issue, so if you can't get it working you should probably post to the forums on gammon.com.au with the bug report and the byte values of the text that's not displaying correctly.
08 Mar, 2010, flumpy wrote in the 6th comment:
Votes: 0
TBH I'll dig through this telnet lib to make sure it's not fapping with the char encoding somewhere in the bowels before I do, and also try Scandum's suggestion as well.


Cheers guys
08 Mar, 2010, flumpy wrote in the 7th comment:
Votes: 0
Got it working, thanks guys..

Seems like it was a lUser error, I hadn't switched on UTF-8 encoding on the client! duh.

Anyway, it works, if a little slowly at first (I cache any translations, I may even persist them locally at some point). I should have a release soon :)
08 Mar, 2010, donky wrote in the 8th comment:
Votes: 0
flumpy said:
Got it working, thanks guys..

Seems like it was a lUser error, I hadn't switched on UTF-8 encoding on the client! duh.

Anyway, it works, if a little slowly at first (I cache any translations, I may even persist them locally at some point). I should have a release soon :)

Out of curiosity, are you actuallly changing a client-side setting by hand? Or are you sending ESC%G from the server to ask the client to use the unicode character set?
08 Mar, 2010, David Haley wrote in the 9th comment:
Votes: 0
If he's referring to MUSHclient, as he stated earlier, then he probably ticked the box asking it to process UTF-8 characters.
08 Mar, 2010, flumpy wrote in the 10th comment:
Votes: 0
Yep, i found the right tick box…

Whats the ESC%G code? does that work too?
08 Mar, 2010, donky wrote in the 11th comment:
Votes: 0
flumpy said:
Yep, i found the right tick box…

Whats the ESC%G code? does that work too?

It should signal the telnet client to decode utf-8. Here's one reference to it (search for %G).

Putty supports it, although I would be surprised if Mushclient did, but if you could try and let me know I would appreciate it. Personally, when a player has to change a setting in a client to play a MUD and it can be avoided, then it should be avoided.
08 Mar, 2010, David Haley wrote in the 12th comment:
Votes: 0
If MUSHclient doesn't, and the standard says that it should, then I'd suggest just mentioning it to Nick on his forums – he's very fast at adding this kind of stuff.
09 Mar, 2010, donky wrote in the 13th comment:
Votes: 0
David Haley said:
If MUSHclient doesn't, and the standard says that it should, then I'd suggest just mentioning it to Nick on his forums – he's very fast at adding this kind of stuff.

I am unclear what is and what is not standard, when it comes to these escape codes. This may just be a Linux console affectation which Putty adopts, due to its attempt at emulating an xterm terminal type.

I've enumerated 124 characters from 0xC480 onward in Putty after sending chr(0x1b)+"%G". If I do not send the given escape code, the unicode characters do not appear.



Now, I am pretty sure the "shade" character that gets repeated is because I am somehow not following the telnet specification.

If a UTF-8 sequence contains 12, the screen is cleared. And from the 65th two byte UTF-8 sequence, two "shade" characters are printed rather than the correct unicode character. The two bytes sent are 0xC4C0. Repeating the preceding sequence yields a repeated unicode character, so it is not leakage from the preceding sequence.

Anyone know anything about a standard way of sending UTF-8 sequences over telnet?


In this case, the corruption is because I am sending an invalid UTF-8 sequence, there is no 0xC4C0.
0.0/13