David Haley
Wizard


Group: Members
Posts: 7,783
Joined: Jun 30, 2007
|
#1 id:27556 Posted Jun 26, 2009, 7:54 am
|
The useful conversation got derailed then the thread got locked, so here we are.
The problem statement:
1- Some people don't like having their MUD crawled too often, for a variety of reasons. They want to control the frequency with which a crawler pings the MUD.
2- Some people don't want their MUD crawled at all. A subset of these people (perhaps all of them) don't want to bother with MSSP to tell crawlers to go away.
The solutions to (1) are fairly simple. If people are already implementing MSSP, they can use the a 'crawl delay' field to specify the minimum delay between connections.
The solution to (2) is less clear. If one doesn't want to implement or otherwise deal with MSSP, obviously they cannot use the 'crawl delay' field. Therefore they need to communicate to the crawler in some other way to go away.
Possibilities:
1- "robots.txt". If crawling www.example-mud.net port 4000, first check http://www.example-mud.net/mssp.txt (??) for instructions. Such instructions could say "don't ever", or specify delays, or maybe even times at which to not log on. (??)
2- "social approach". Get in touch with the crawler operator and ask them to please not crawl the game.
I think that, realistically, option 2 is the best here. Setting up the specification for option 1 is a lot of work most of which will be wasted, and besides this only helps a rather small set of people (namely the ones who are bothered by connection attempts in logs, and who for various reasons refuse to solve it on their end). I think that the vast majority of crawler operators will be reasonable, so getting in touch with them and allowing a, well, reasonable delay to fix it seems like it would work just fine. For these reasons I think that option 1 incurs far more cost than gain. Besides, not everybody has web space, so it wouldn't work in general anyhow.
So my proposal is that we do two things:
1- Add the crawl delay variable.
2- Very, very strongly suggest that crawlers allow people to opt out of MSSP scanning, preferably relatively easily and ideally without crawler admin intervention.
|
|
|
Hades_Kane
Wizard


Group: Moderators
Posts: 889
Joined: May 31, 2006
|
#2 id:27557 Posted Jun 26, 2009, 8:08 am
|
David Haley said:1- Some people don't like having their MUD crawled too often, for a variety of reasons.
Considering this is part of the log file that I logged into this morning, can you blame me?
Quote:
Read_from_descriptor: Connection reset by peer
Fri Jun 26 02:00:44 2009 :: Sock.sinaddr: 98.20.14.106
EDIT (Asylumius): Trimming this down. It used to show the same IP hitting the MUD every minute over and over. You get the picture.
That aside, from what I was able to gather from the other thread (and I would appreciate some clarification) this is the policy as I understand it:
-Right now, the only way to opt out is to remove your listing
-Eventually, you'll be able to opt out with your listing
-The crawler will ping every 30 minutes regardless
-Having the ability to customize the frequency of the crawler defeats the purpose of the crawler with regards to tracking player trends and connection info
-If you don't want it crawling every 30 minutes, your option is to opt out (in whatever ways that are available at the time)
Is that an accurate summation of what we are looking at?
|
Last edited Jun 26, 2009, 9:34 am by Asylumius
|
|
David Haley
Wizard


Group: Members
Posts: 7,783
Joined: Jun 30, 2007
|
#3 id:27558 Posted Jun 26, 2009, 8:12 am
|
Look, we get it. OK. It connects a lot. I don't think we need to say it again, let alone paste so much text in everybody's way. Could you edit that out?
Hades_Kane said:-Right now, the only way to opt out is to remove your listing
-Eventually, you'll be able to opt out with your listing
Yes, apparently MB combines the connection testing with the MSSP validating, so you can't opt out of MSSP without opting out of the listing as well. But that can change. I don't think we need to dwell too much on how the MB crawler happens to be implemented now; all of this is still in beta, after all.
Quote:-The crawler will ping every 30 minutes regardless
I don't think it needs to ping 30 minutes if it's just testing connections.
Quote:-Having the ability to customize the frequency of the crawler defeats the purpose of the crawler with regards to tracking player trends and connection info
Well, sort of. MSSP isn't just about tracking player trends. It's about getting automatically updated listing information. Having frequently updated player information is nice, but it's not the main point.
|
|
|
|
|
Hades_Kane
Wizard


Group: Moderators
Posts: 889
Joined: May 31, 2006
|
#5 id:27563 Posted Jun 26, 2009, 8:31 am
|
Scandum said:From what I gathered the primary purpose of the crawler is to check if the mud is up, and secondary purpose is gathering player stats through MSSP.
I'd suggest:
1) Crawl MUDs every 24 hours.
2) If a MUD supports MSSP increase crawl interval to every 3 hours.
3) If a MSSP MUD supports CRAWL DELAY set the crawl interval between 0:30 and 24 hours depending on the value returned using -1 for crawler default.
I don't think a crawler should go with shorter intervals than once an hour, unless specifically requested by a mud returning 0.
I'm with Scandum :)
|
|
|
Banner
Sorcerer


Group: Members
Posts: 391
Joined: Jul 14, 2006
|
#6 id:27569 Posted Jun 26, 2009, 9:21 am
|
I don't get that specific crawler coming to my MUD so I can't complain. However, connecting every 24 hours then defeats the purpose of the variable that tells how many players are on when the crawler connects, which would also be confuzzled if the crawler only connected once at like 1am in the morning when most MUDs will be dead. Why don't we change that variable to an average players variable instead, or instead of a set 24 hours, 30 minutes, 12 hours ect, let MUD admins pick when the crawler connects as previously stated. Perhaps you could tell it to connect not at all, connect at this specific time, or connect every 2 hours or variable'd as determined by the admin of the MUD.
If the MUD doesn't want to be crawled, why still crawl it at a delay? Why can't the crawler just check it and say 'don't crawl thus mud' and don't come back, or maybe ping it every 3-4 days or something.
|
|
|
David Haley
Wizard


Group: Members
Posts: 7,783
Joined: Jun 30, 2007
|
#7 id:27570 Posted Jun 26, 2009, 9:29 am
|
Can we please remove that huge log, or at least trim it? It doesn't contribute anything -- we know what the problem is and have already acknowledged it -- and it just gets in the way. Really, we get it, so let's not waste our time and space continually saying how much of a problem it is. Thanks...
Anyhow. I'm not sure we need to assume that the MB connection testing and MSSP crawler need to be the same program. But if they have to be, what Scandum proposed sounds reasonable. Given that player numbers seem to be important, I would suggest that crawl delay be in minutes after all, not hours, so that you can give easier values like 30 instead of 0:30.
|
|
|
Scandum
Wizard


Group: Members
Posts: 1,783
Joined: Aug 8, 2006
|
#8 id:27572 Posted Jun 26, 2009, 9:35 am
|
Banner said:I don't get that specific crawler coming to my MUD so I can't complain. However, connecting every 24 hours then defeats the purpose of the variable that tells how many players are on when the crawler connects, which would also be confuzzled if the crawler only connected once at like 1am in the morning when most MUDs will be dead.
I think the best options are: crawl every 3 hours or crawl every 11 hours.
If you crawl every 11 hours the hour crawled will change with 2 hours per day, so after 6 days you'll have 12 measurements spread out evenly over a 24 hour period.
|
|
|
Hades_Kane
Wizard


Group: Moderators
Posts: 889
Joined: May 31, 2006
|
#9 id:27573 Posted Jun 26, 2009, 9:38 am
|
David Haley said:Can we please remove that huge log, or at least trim it?
Asylumius beat me to it :p
|
|
|
KaVir
Wizard


Group: Members
Posts: 2,149
Joined: Jun 19, 2006
|
#10 id:27584 Posted Jun 26, 2009, 10:27 am
|
Scandum said:From what I gathered the primary purpose of the crawler is to check if the mud is up, and secondary purpose is gathering player stats through MSSP.
TMC already checks if muds are up (and has done so for a long time), and I imagine there are other listing sites that do so as well. So I agree that checking once every 24 hours should be fine. If the mud has MSSP then personally I'd be tempted to check once every hour, simply because it would look good on a graph showing the on- and off-peak playerbase.
|
......................... KaVir at God Wars II: godwars2.org 3000 Roomless world. Manual combat. Endless possibilities.
|
|
|
|
|
|
Scandum
Wizard


Group: Members
Posts: 1,783
Joined: Aug 8, 2006
|
#13 id:27594 Posted Jun 26, 2009, 11:58 am
|
So to get things going again:
Code (text): 1
2
3
4
5
6 |
"CRAWL DELAY" Preferred minimum number of hours between crawls. Send "-1"
to use the crawler's default.
|
With the implied understanding that a crawl delay of 0 means the mud doesn't mind crawl delays of less than 1 hour.
|
|
|
|
|
Cratylus
Wizard


Group: Members
Posts: 1,768
Joined: May 22, 2006
|
#15 id:27611 Posted Jun 26, 2009, 12:52 pm
|
hi which thread is this now lol
|
|
|