20 Nov, 2014, softballs wrote in the 1st comment:
Votes: 0
Hi everyone,

I used to play a mud for many years and i have always wanted to learn how to code on it and now i am working as a developer (although only web programming) so i decided to setup a local merc mud of my own only to learn.

I had some troubles getting it to compile correctly, server was ubuntu 64bit, but i finally got the code to compile and run smoothly. I posted the updated code on https://github.com/iam-TJ/merc if anyone is interested.

My questions is this, I know the code currently don't support the use of swedish characters so I was wondering if anyone could point me in the right direction on what is needed to be updated in code to allow it?
20 Nov, 2014, alteraeon wrote in the 2nd comment:
Votes: 0
As far as I know, there are no mud codebases in common use that support unicode, and there are no mud clients which support unicode. Basically, you're screwed.

Your easiest and best bet would be to pick a good 7 bit ascii transliteration for the language, if such a transliteration exists.

A harder plan would be to pick some 8 bit ascii codepage which contains enough language characters to work with, and find some way to ensure your users are using it. Perhaps using browser only clients could do this.

A very time consuming plan would be to convert a codebase to unicode, or to create a new server that supports unicode, then release your own client which also supports unicode.

-dentin

Alter Aeon MUD
http://www.alteraeon.com
20 Nov, 2014, softballs wrote in the 3rd comment:
Votes: 0
Hi Dentin,

First i would like to thank you for taking your time and helping me with an answer! :)

I think my idea (time consuming or not) was to update the merc codebase to support unicode and that was what i was looking information on what it would require and perhaps some small pointers as to what needs to be done.

When i look at http://en.wikipedia.org/wiki/Comparison_... i can see that there are some that support unicode although not zmud/cmud which i thought did.
20 Nov, 2014, alteraeon wrote in the 4th comment:
Votes: 0
My guess on mud clients which do would probably be mushclient, which is actually updated, and browser based clients which probably can be made to work. You'll simply have to force your users to use a supported client, or provide a transliteration layer for regular telnet.

As for converting a server to unicode, your best bet is probably to pick one of the newer codebases which has decent support - perhaps the java based one (coffeemud I think) - and import your areas into it. Trying to convert an existing C codebase would be a colossal waste of time and effort.

<rant_mode>
The correct thing to do would be force the entire world to pick a single 8 bit codepage and transliterate all languages down to that codepage. I've recently gone through a decent chunk of the unicode character set to create a transliteration table, and there's absolutely no reason to have so many different versions of the same character. Hell, there's thirty different versions of some types of punctuation - things like the double quote character, commas, and dashes. Do we really need sixty different versions of the letters c and z?

I get the feeling that most of this mess is because language purists can't let go of their ego. They can't let go of their history, their tribe - "the invaders want to take away our language, if we give up the double accented n character and the c with the funny tail the invaders will win." Language should be about communicating information, and this kind of idiot thinking puts artificial barriers in the way of that, before we even get to the problem of semantics.

And yes, I would be perfectly fine with changing my own language.
</rant_mode>

-dentin

Alter Aeon MUD
http://www.alteraeon.com
20 Nov, 2014, softballs wrote in the 5th comment:
Votes: 0
Thank you for your reply and good job on your mud server and client, tried it out before and i liked it.

As my main intention of this is to extend my programming knowledge from mostly pure .net/web/javascript to also C programming my goal was to allow the chat command to be able to contain more than english characters. I have now problem reading/writing in english i just thought it would be a fun task but perhaps you are correct as this might be a task too big.

Reason why i want to go with the merc codebase is that the old mud i played on (its still up and running) was also based on the merc code and i know the mud owner/coder so it would be fun to one day help code it.
21 Nov, 2014, alteraeon wrote in the 6th comment:
Votes: 0
This is an old thread, but might give you more idea of what to do:

http://www.gammon.com.au/forum/?id=2681

If you've got a client capable of sending and receiving utf-8, you could detect that and allow the server to send and receive utf-8. Unicode strings could be handled on the server as normal strings in utf-8, and you could add a transliteration layer on the server to jam everything down to 7 bit ascii for non-unicode clients. Assuming you can find a client with utf-8 capability, this really wouldn't take very long.

-dentin

Alter Aeon MUD
http://www.alteraeon.com
21 Nov, 2014, softballs wrote in the 7th comment:
Votes: 0
Thank you for looking around for information in the matter, perhaps this is out of my league to actually implement now as i am still learning but i hope that i can eventually do this one day.

You are very kind that you have taken your time to answer me in the matter though!
21 Nov, 2014, Ssolvarain wrote in the 8th comment:
Votes: 0
MUSHclient supports unicode.

Just saying.
22 Nov, 2014, SlySven wrote in the 9th comment:
Votes: 0
Well TinTin++ has Unicode support (if you enable it with the right #CONFIG option). Mudlet can't be said to fully support Unicode yet, but I have nailed it up as objective for 4.0 and there is awareness of things that need to be worked on. One nice thing with Unicode is that (provided that your {systems} font handling system has access to a decent range of glyphs) there are all sorts of extra symbols just tucked away for users to enhance their playing environment / maps with…

Get familiar with utf-8 as a character encoding and avoid thinking that all characters can be represented as a single 16-bit value - otherwise those combining diacriticals and/or non-BMP characters will come to haunt you and your code! And on a related matter: just because a font is monospaced it does not mean that all characters are the same size! :redface:
22 Nov, 2014, quixadhal wrote in the 10th comment:
Votes: 0
If you want a MUD to properly support UTF-8, you'll need a full TELNET stack.

UTF-8 uses ASCII 255 to signal an extension character, which TELNET also uses as the IAC escape sequence. So, UTF-8 data has to be properly escaped via TELNET *AND* the MUD has to be able to handle sequences of data that are split across packets.

IE: If you happen to get some UTF-8 sequence that's escaped via TELNET, and it happens to be split across packets as (…. 255)(255 64…), you need to know that was the UTF-8 sequence (255 64) and NOT the TELNET sequence (IAC 64).

TLDR version.. most MUD server code assumes a "character" is one byte, and doesn't deal with managing state to know how to process anything outside that 8-bit window.
22 Nov, 2014, plamzi wrote in the 11th comment:
Votes: 0
If your goal is just Swedish characters, Extended ASCII should suffice. Google-wiki it.

The trouble with UTF8 is that it currently limits your choice of client. As far as I know, only MUSHClient and the Mud Portal web app have solid support for it.

FYI, I'm about to publish a codebase in js with full localization support. May be a good choice for tinkering for a web programmer.
23 Nov, 2014, softballs wrote in the 12th comment:
Votes: 0
Well my primary goal is to learn but yes i must admit swedish characters would be what i want to add at first anyways. I will google around extended ascii and see if i can find any information about how to add that to my merc codebase.

Your project looks really cool, i read about it before :)
24 Nov, 2014, SlySven wrote in the 13th comment:
Votes: 0
quixadhal said:
UTF-8 uses ASCII 255 to signal an extension character, which TELNET also uses as the IAC escape sequence. So, UTF-8 data has to be properly escaped via TELNET *AND* the MUD has to be able to handle sequences of data that are split across packets.
Sorry but that is NOT the case, the byte with ALL bits set 0xff is NEVER a valid UTF-8 character. As you can see here, the first byte of a multi-byte sequence that starts a character in UTF-8 that is not one of the ASCII (7 bits remember!) sub-set consists of the top TWO bits set plus an increasing number of lesser bits with the next LS Bit after that those being reset, the total number of those initial set bits is then the same as the total of bytes in that multi-byte sequence, the remaining bytes in that sequence all have the most significant bit set and the second most significant bit reset. As you can see this makes an all bits set byte an invalid character; in practice these means that in decoding an incoming telnet stream encoded as UTF-8 you must extract the telnet protocol stuff signaled by IAC bytes before you can reassemble the remaining data and process it as UTF-8 {the recommended practice is first to validate each character and reject "overlong" and malformed sequences, possibly removing each invalid sequence and replacing the first start character of each with "?" the three byte character (Unicode point: U+FFFD, encoded as 0xef 0xbf 0xbd) a.k.a. the replacement character - this helps to reduce the occurrence of mojibake.}

What can be the issue is as you suggest where a multi-byte character is split across packets, but UTF-8 was designed as a transport protocol so libraries to handle it will know all about the need to maintain a "state" between the receipt of individual incoming packets - provided the coder remembers to allocate the space for it as functions that are also used to process an entire "file" at once typically make that "state" an optional pointer to something that is not needed if the processing happens in one swell foop.
24 Nov, 2014, quixadhal wrote in the 14th comment:
Votes: 0
Your point about IAC is correct, and I was mistaken. My point was that most MUD drivers don't have any kind of state machine, because they don't actually implement TELNET. The vast majority are just raw TCP sockets that ignore TELNET data, or recognize a small subset of "well known" sequences, as long as they aren't split across packets.

While there may well be libraries that support UTF-8 nicely, I'm not sure they'll be of any use for the MUD developer. Your typical MUD will have code that produces messages which have to be sent to the socket output buffer, assets on disk that have to be read and written as UTF-8, and of course user input from sockets, which also has to be as UTF-8. All of that works well, provided your MUD doesn't corrupt anything.

However, there are lots of things in a typical MUD codebase which deal with characters as BYTES. Even if the socket code itself is nicely wrapped to account for multi-byte characters, how many places does old code use snprintf() or custom code to center, right-align, or otherwise pad things based on strlen()?

Then there's color codes. A typical Diku-style game has embedded color tokens like &R which gets translated to ESCYour point about IAC is correct, and I was mistaken. My point was that most MUD drivers don't have any kind of state machine, because they don't actually implement TELNET. The vast majority are just raw TCP sockets that ignore TELNET data, or recognize a small subset of "well known" sequences, as long as they aren't split across packets.

While there may well be libraries that support UTF-8 nicely, I'm not sure they'll be of any use for the MUD developer. Your typical MUD will have code that produces messages which have to be sent to the socket output buffer, assets on disk that have to be read and written as UTF-8, and of course user input from sockets, which also has to be as UTF-8. All of that works well, provided your MUD doesn't corrupt anything.

However, there are lots of things in a typical MUD codebase which deal with characters as BYTES. Even if the socket code itself is nicely wrapped to account for multi-byte characters, how many places does old code use snprintf() or custom code to center, right-align, or otherwise pad things based on strlen()?

Then there's color codes. A typical Diku-style game has embedded color tokens like &R which gets translated to ESC[31m for the typical ANSI terminal. I know many MUD's have custom things like color_strlen() to try to allow formatting and padding of things without counting the non-printable color code tokens. THOSE now have to understand UTF-8 as well.
24 Nov, 2014, SlySven wrote in the 15th comment:
Votes: 0
Yes, and that is where the work has to be done. I'm all too aware that there are a couple of methods in Mudlet's cTelnet class that will have to be gone over with a fine tooth comb to cover precisely this - but as long as we can differentiate what IS the Telnet wrapper and what is the underlying UTF-8 data we should be able to work it out. Now I am talking as a Client coder but the same processes must apply to the Server as well - after all the communication process is bi-directional {though I'm not talking about the language, though if you are writing a Arabic/Oriental bidi is also something to worry about! :lol: } but if you get everything nailed down right you can support any language that you can think of. After all, my Klingon is non-existent but I'm sure someone has some choice phrases written in the Unicode code-points created to support it!
24 Nov, 2014, plamzi wrote in the 16th comment:
Votes: 0
SlySven said:
…but if you get everything nailed down right you can support any language that you can think of. After all, my Klingon is non-existent but I'm sure someone has some choice phrases written in the Unicode code-points created to support it!


We're now going offtopic, I realize, but it's not just about world languages. There's a wealth of symbols out there that are fairly commonly supported and could add some welcome change from plain text. Take a look at what I'm doing in the Havoc thread. I would like to support an actively developed client like Mudlet, but the lack of UTF-8 and Unicode is currently a deal-breaker for me. I asked if it's in the pipeline and a dev on the Mudlet forums told me it would be a lot of work to make sure it's supported in the scripting environment properly. I have no doubt that's the case but I don't see why you can't limit it for a phase 1 release and enable rendering only first…
26 Nov, 2014, softballs wrote in the 17th comment:
Votes: 0
Reading both your posts in the matter makes me more and more think that i should stick to learning more basic stuff rather than thinking about implementing something that makes the mud be able to understand my swedish characters :)
26 Nov, 2014, softballs wrote in the 18th comment:
Votes: 0
If anyone would be willing to help me learn it would be much appreciated, i am thinking that (on weekends) someone would chat with me on skype and help me with best practices, examples, give me lessons, help me read and learn the merc code that would be much appreciated as i know that many people in here have so much knowledge in this.

If not i don't blame you :)
26 Nov, 2014, alteraeon wrote in the 19th comment:
Votes: 0
Plamzi - how do you handle situations where creature or item names might be in a language with characters that other players cannot type? How do clients do reliable text processing in the presence of unicode strings? We actually had problems with both of these things before I started stomping out extended ascii at the socket layer.
26 Nov, 2014, quixadhal wrote in the 20th comment:
Votes: 0
Back in 1988, we used to play a game on the VAX/VMS cluster called "MONSTER". It was, in many ways, the predecessor of MUD's. It had very simple combat, online creation (Sorry Locke!), and used a custom client/server, originally written in Pascal and using VMS "mailboxes" for communication instead of TCP sockets.

A friend of mine discovered that you could create a character using the extended ASCII characters, but that the only part of the game that could accept those extended characters was the character creation screen. The result, a character who could go around killing anyone and they couldn't fight back, nor could the admins kick him offline. Literally, the only way they could stop him was to shutdown the whole game.

Welcome to 1988 again. :)
0.0/23