09 Feb, 2010, Kline wrote in the 1st comment:
Votes: 0
So I was wanting to revamp/improve the existing (deprecated) email functions of my code and figured the first place to start would be writing a sane address validation routine. Just the syntax, for now, and perhaps later I'll at least do a DNS lookup on the host. The existing routine simply checked that the email was all letters (wrong) and had a single @ (ok). I wrote up the below while at work tonight just to throw things at it to test/debug so I can port it to my game when I get home; is there anything I can do to improve this while sticking to only C/C++ and not farming it off to the shell for regex or something?

Thanks for the comments.

#include <string>
#include <iostream>

using namespace std;

int main( int argc, char *argv[] )
{
string inaddr;
size_t pos = 0;
int i = 0;
static char special[] = "`~!#$%^&*()=[{]}\\|;:'\",<>/?";

if( argc < 2 )
{
cout << "You must supply an email address." << endl;
return 1;
}

inaddr = argv[1];

// Check for single @
if( inaddr.find("@") == string::npos )
{
cout << "No @ found! Invalid!" << endl;
return 1;
}

// Check for multiple @
i = 0;
pos = 0;
while( pos != string::npos )
{
pos = inaddr.find("@",pos);
if( pos != string::npos )
{
i++;
pos++;
}
}
if( i > 1 )
{
cout << i << " @ found! Invalid!" << endl;
return 1;
}

//Check for anything invalid
i = 0;
while( special[i] != '\0' )
{
if( inaddr.find(special[i]) != string::npos )
{
cout << "Invalid character found: " << special[i] << "." << endl;
return 1;
}
i++;
}

//Check for single . after the @
pos = inaddr.find("@");
if( inaddr.find(".",pos) == string::npos )
{
cout << "No . found after @! Invalid!" << endl;
return 1;
}

return 0;
}
09 Feb, 2010, KaVir wrote in the 2nd comment:
Votes: 0
Whenever a mud demands an email address, I always put "none". If that doesn't work, I try "none@none", then "none@none.none" (the last of which would work on your mud).

On one mud I ended up having to put "none@none.com" to log on, and one mud didn't even allow that - so I just made up an email address at random (I think I picked "bob@hotmail.com" or something, I'm sure whoever has that address already gets plenty of spam and didn't mind a bit more).

I see you allow "@.", which is something I hadn't considered trying before. Faster to type than "none", I'll have to remember that one.
09 Feb, 2010, Kline wrote in the 3rd comment:
Votes: 0
This isn't a mandatory thing; and was mostly to be used for immortals to put an email for themselves to receive notifications on. Just trying to save people from fat-fingering un-knowingly when they meant to enter something proper. Thanks for berating the entire concept though :).
09 Feb, 2010, Orrin wrote in the 4th comment:
Votes: 0
KaVir said:
Whenever a mud demands an email address, I always put "none". If that doesn't work, I try "none@none", then "none@none.none"

I always used to do nobody@nowhere.com. On our account creation you can just hit enter at the email prompt and it records your email address as "none" for you.
09 Feb, 2010, David Haley wrote in the 5th comment:
Votes: 0
Quote
Just trying to save people from fat-fingering un-knowingly when they meant to enter something proper.

Well, I suppose it's better than nothing, but you won't be able to detect my typo of davod@whatever.com, either. If the idea is just to avoid typos, why not have them enter it twice?

Orrin said:
I always used to do nobody@nowhere.com.

I'm partial too foo@bar.com, myself…
09 Feb, 2010, Kline wrote in the 6th comment:
Votes: 0
So although I should have known better, I didn't realize asking for code advice (something common here?) would generate nothing but a string of "What a horrible idea" and banter about what the best fake addresses to use are :). So, if you'd all like to continue that route please use [link=post]41818[/link] this nicely spawned thread. I'd hoped for some actual constructive advice like most others receive.
09 Feb, 2010, KaVir wrote in the 7th comment:
Votes: 0
Kline said:
So although I should have known better, I didn't realize asking for code advice (something common here?) would generate nothing but a string of "What a horrible idea" and banter about what the best fake addresses to use are :)

You said you wanted to improve your address validation routine, so I explained how I usually get around such validation routines. I thought it might be useful.

You're welcome, by the way.
09 Feb, 2010, Idealiad wrote in the 8th comment:
Votes: 0
I think it's pretty clear that a syntax check doesn't equal address validation. If you want to validate the address, send an email there and require a response.
09 Feb, 2010, KaVir wrote in the 9th comment:
Votes: 0
Idealiad said:
I think it's pretty clear that a syntax check doesn't equal address validation.

Not in the long run perhaps, but in his first post he specifically said that "the first place to start would be writing a sane address validation routine. Just the syntax, for now".
09 Feb, 2010, David Haley wrote in the 10th comment:
Votes: 0
Kline said:
So although I should have known better, I didn't realize asking for code advice (something common here?) would generate nothing but a string of "What a horrible idea" and banter about what the best fake addresses to use are :)

A common reaction in code reviews where I work, when you see something odd, is to ask why they're doing it, and if it's really want they want to do. You stated that your goal was to avoid typos ("fat-fingering"). You can solve that problem much more easily with other approaches (confirmation, entering twice, …). I don't think it's inappropriate to suggest how to solve (what has been communicated as) your problem, rather than to try to fix (what can be perceived as) a suboptimal solution, and to be entirely honest telling people off because they didn't answer your question directly isn't the best way to convince them to spend more time here. :wink:

Anyhow, if you're dead set on checking address syntax, I would implement this as a very simple state machine implementing the email address regular expression rules. The thing is that those rules are complicated, perhaps more complicated than you think. You'll need to decide if you want to accept all legal addresses, or perhaps accept a few more, or be a little restrictive. Anyhow, the rules are given in an RFC somewhere, I would google for "rfc email address" or something like that.
State machines are pretty easy to implement, especially since you don't need this to be general-purpose.
09 Feb, 2010, Kjwah wrote in the 11th comment:
Votes: 0
Why don't you generate a key of some sort, email it to the address supplied and allowed them to enter the key at a later time to validate their email. If it hasn't been done in about a week, let them know in game?

edit: if this has been suggested, sorry, I stopped reading the thread when it became about who likes to use what for fake email addresses.
10 Feb, 2010, Scandum wrote in the 12th comment:
Votes: 0
Cleanest would probably be using a regular expression, something like:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b
10 Feb, 2010, elanthis wrote in the 13th comment:
Votes: 0
… by what definition is that "cleanest" ? Not trying to pick a fight, it's just that that is a pretty ugly regex… and that's one of the simpler and less complete regex email validators.
10 Feb, 2010, Kline wrote in the 14th comment:
Votes: 0
Thanks for the suggestions (yes, validation code was suggested – may go that route additionally but I'd still like to pursue this as a learning exercise), although is there a way to run a regex direct in a compiled program? That, and it's not even efficient to attempt to list all valid domain extensions: I personally own a .us and a .cc domain which would fail that regex (that I receive email at).
10 Feb, 2010, David Haley wrote in the 15th comment:
Votes: 0
No, .us and .cc are covered by this clause before the .com etc.: [A-Z]{2}
<EDIT: Oops, no they're not! It should be: [a-zA-Z]{2}>

Honestly it's unclear to me what exactly you're trying to achieve here. That regex allows for some pretty funky emails, that are technically valid, but still rather unlikely. (What are the chances of having "{" in your email instead of "P"?) I'm not trying to be difficult here, I'm just really not sure what exact problem you're trying to address. A regular expression like that is a syntactic check (and as Elanthis said, only one of the simpler ones) but it's not exactly a typo catcher.

If the idea is truly to verify that people entered their actual email address, I would go with the confirm or enter twice route, and then use email validation of some sort as has been mentioned (e.g. sending them a link to click on to validate themselves).
10 Feb, 2010, Scandum wrote in the 16th comment:
Votes: 0
I spend a minute googling it, apparently it's "pretty ugly", who would have thought just from looking at it. As far as I can tell it does allow any alphabetic 2 letter domain extensions, and the listed 3 to 6 letter ones. Looks like you'd have to compile it as case insensitive.

Regarding implementation,

Best approach is using the pcre library, the interface is pretty easy. You can use the posix one, but it's a lot less powerful, and pcre is what most mud clients use.
10 Feb, 2010, elanthis wrote in the 17th comment:
Votes: 0
Quote
<EDIT: Oops, no they're not! It should be: [a-zA-Z]{2}>


The regex is intended to be used with the case-insensitive option to your respective parser. It's lifted verbatim from http://www.regular-expressions.info/emai... and there's a bit of accompanying text, gotchas, and recommendations on that page.

Here is the latest RFC on email address parsing: http://tools.ietf.org/html/rfc3696. In reality, there's just no good reason to deal with any of that. If your input string has one @ and has at least one . on the right side of the @, that's probably good enough for any purpose you actually have. I wouldn't go any farther unless you absolutely need to have the user's email address, in which case you can only possibly do that by verifying the address with a mailed token.

There's just no good reason to do it half-assed, it'll just be a waste of your time and get you into bad habits.
10 Feb, 2010, elanthis wrote in the 18th comment:
Votes: 0
I had the thought today in the shower (yes, I was thinking of you while naked) about just using the same technique as MTA sender verification. I'm not aware of non-MTA software doing it, but it's not particularly complex.

The gist of it is that your MUD will connect to the MX for the domain and initiate the first three steps of an SMTP transfer, checking to see if it gets any permanent error codes in the process. Many (most?) good MTAs will perform a local part check after the RCPT TO command (this may not be the case for particularly large setups like Google where the frontend MX doesn't access the user database) and reject the mail if the recipient is invalid. You will need to read the SMTP RFC, but it's not hard. You can reuse much of your MUD's existing code for handling player input since SMTP is line-oriented.

So your validation steps are thusly:

(1) query the list of MX records for the domain, sorted by priority; if there are no MX records, reject the email address
(2) for each MX record, attempt to connect to the server; continue to the next if the connection fails
(3) once connected, begin speaking SMTP; wait for the server greeting; if you receive a permanent error code at any time, abort the whole process and reject the user's email address
(4) communicate SMTP up until but not include the DATA command; that is, send the proper HELO, MAIL FROM, and RCPT TO commands, and check response codes for each
(5) if you do not receive any errors at this point, the email address is probably valid, so QUIT and accept it; if you receive a temporary error, QUIT and try the next server in the list of MX records
(6) if you run out of MX servers, that means you have received a temporary error from each; the email may be valid but you cannot verify at this time; up to you how to handle this case, but your options are basically either to reject the email just like with a permanent error, reject it but inform the user that their mail exchange is having temporary errors, or accept the email and possibly reattempt validation at a later time

Given the shoddy support for asynchronous TCP connect() in some UNIX variants, you may be better off farming this out to a separate process or thread, same as you may already be doing for DNS queries (although DNS doesn't need it as UDP is connectionless, a lot of async DNS libraries use them anyway to avoid requiring mainloop integration).
10 Feb, 2010, David Haley wrote in the 19th comment:
Votes: 0
That's a pretty interesting idea, Elanthis. One thing I'd keep in mind:
Elanthis said:
(6) if you run out of MX servers, that means you have received a temporary error from each; the email may be valid but you cannot verify at this time; up to you how to handle this case, but your options are basically either to reject the email just like with a permanent error, reject it but inform the user that their mail exchange is having temporary errors, or accept the email and possibly reattempt validation at a later time

More and more servers use greylisting as a defense against spammers, who tend to implement SMTP clients pretty poorly – in particular they don't come back after the greylisting server tells them to try later.

So if you go with Elanthis's suggestion (which again is a pretty interesting approach) you might not want to reject addresses with temporary failures. Or, you could fall back to email validation (where the recipient clicks a link or whatever) for these people, and let your computer's normal mail transport agent handle the try-again-later part.
10 Feb, 2010, elanthis wrote in the 20th comment:
Votes: 0
Implementing support for the "try later" SMTP commands shouldn't be hard. Chances are if you make the user wait, though, they will just stop caring, so you may want to use a delayed verifications. Basically, let them log in and play after one attempt at verification, and if the verification fails, let them continue to login anyway. If their account fails verification permanently later, just block them from logging back in until they enter a new email address.

I've seen a few MUDs do similar things for character name verification, for example, where an admin reviews new char names and can flag a name as inappropriate and requiring a change (generally for theme breaks in RP MUDs like making a character called "technodude" on a medieval MUD, not for idiots who make obviously unacceptable names like "masturbatingmonkey" or whatever) which is enforced on next login, or possibly within X minutes of play.
0.0/36