MudBytes
Pages: << prev 1, 2, 3, 4 next >>
Crawl delay
David Haley
Wizard






Group: Members
Posts: 7,783
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#1 id:27556 Posted Jun 26, 2009, 7:54 am

The useful conversation got derailed then the thread got locked, so here we are.

The problem statement:
1- Some people don't like having their MUD crawled too often, for a variety of reasons. They want to control the frequency with which a crawler pings the MUD.
2- Some people don't want their MUD crawled at all. A subset of these people (perhaps all of them) don't want to bother with MSSP to tell crawlers to go away.

The solutions to (1) are fairly simple. If people are already implementing MSSP, they can use the a 'crawl delay' field to specify the minimum delay between connections.

The solution to (2) is less clear. If one doesn't want to implement or otherwise deal with MSSP, obviously they cannot use the 'crawl delay' field. Therefore they need to communicate to the crawler in some other way to go away.

Possibilities:
1- "robots.txt". If crawling www.example-mud.net port 4000, first check http://www.example-mud.net/mssp.txt (??) for instructions. Such instructions could say "don't ever", or specify delays, or maybe even times at which to not log on. (??)
2- "social approach". Get in touch with the crawler operator and ask them to please not crawl the game.

I think that, realistically, option 2 is the best here. Setting up the specification for option 1 is a lot of work most of which will be wasted, and besides this only helps a rather small set of people (namely the ones who are bothered by connection attempts in logs, and who for various reasons refuse to solve it on their end). I think that the vast majority of crawler operators will be reasonable, so getting in touch with them and allowing a, well, reasonable delay to fix it seems like it would work just fine. For these reasons I think that option 1 incurs far more cost than gain. Besides, not everybody has web space, so it wouldn't work in general anyhow.

So my proposal is that we do two things:
1- Add the crawl delay variable.
2- Very, very strongly suggest that crawlers allow people to opt out of MSSP scanning, preferably relatively easily and ideally without crawler admin intervention.
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Hades_Kane
Wizard






Group: Moderators
Posts: 889
Joined: May 31, 2006

Go to the bottom of the page Go to the top of the page
#2 id:27557 Posted Jun 26, 2009, 8:08 am

David Haley said:
1- Some people don't like having their MUD crawled too often, for a variety of reasons.


Considering this is part of the log file that I logged into this morning, can you blame me?

Quote:

Read_from_descriptor: Connection reset by peer
Fri Jun 26 02:00:44 2009 :: Sock.sinaddr:  98.20.14.106

EDIT (Asylumius): Trimming this down. It used to show the same IP hitting the MUD every minute over and over.  You get the picture.


That aside, from what I was able to gather from the other thread (and I would appreciate some clarification) this is the policy as I understand it:

-Right now, the only way to opt out is to remove your listing
-Eventually, you'll be able to opt out with your listing
-The crawler will ping every 30 minutes regardless
-Having the ability to customize the frequency of the crawler defeats the purpose of the crawler with regards to tracking player trends and connection info
-If you don't want it crawling every 30 minutes, your option is to opt out (in whatever ways that are available at the time)

Is that an accurate summation of what we are looking at?
.........................
-Diablos

http://www.mudbytes.net/banners/c61df356665d83760cba2a64a9b15c40
eotmud.com : 4000 (or 23)

http://www.eotmud.com
http://www.facebook.com/eotmud

Final Fantasy based MUD opening soon! Looking for players & builders!

Last edited Jun 26, 2009, 9:34 am by Asylumius
David Haley
Wizard






Group: Members
Posts: 7,783
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#3 id:27558 Posted Jun 26, 2009, 8:12 am

Look, we get it. OK. It connects a lot. I don't think we need to say it again, let alone paste so much text in everybody's way. Could you edit that out?

Hades_Kane said:
-Right now, the only way to opt out is to remove your listing
-Eventually, you'll be able to opt out with your listing

Yes, apparently MB combines the connection testing with the MSSP validating, so you can't opt out of MSSP without opting out of the listing as well. But that can change. I don't think we need to dwell too much on how the MB crawler happens to be implemented now; all of this is still in beta, after all.

Quote:
-The crawler will ping every 30 minutes regardless

I don't think it needs to ping 30 minutes if it's just testing connections.

Quote:
-Having the ability to customize the frequency of the crawler defeats the purpose of the crawler with regards to tracking player trends and connection info

Well, sort of. MSSP isn't just about tracking player trends. It's about getting automatically updated listing information. Having frequently updated player information is nice, but it's not the main point.
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Scandum
Wizard






Group: Members
Posts: 1,783
Joined: Aug 8, 2006

Go to the bottom of the page Go to the top of the page
#4 id:27559 Posted Jun 26, 2009, 8:26 am

From what I gathered the primary purpose of the crawler is to check if the mud is up, and secondary purpose is gathering player stats through MSSP.

I'd suggest:

1) Crawl MUDs every 24 hours.
2) If a MUD supports MSSP increase crawl interval to every 3 hours.
3) If a MSSP MUD supports CRAWL DELAY set the crawl interval between 0:30 and 24 hours depending on the value returned using -1 for crawler default.

I don't think a crawler should go with shorter intervals than once an hour, unless specifically requested by a mud returning 0.
.........................
TinTin++ Mud Client - I can't believe it's not butter!

Hades_Kane
Wizard






Group: Moderators
Posts: 889
Joined: May 31, 2006

Go to the bottom of the page Go to the top of the page
#5 id:27563 Posted Jun 26, 2009, 8:31 am


Scandum said:
From what I gathered the primary purpose of the crawler is to check if the mud is up, and secondary purpose is gathering player stats through MSSP.

I'd suggest:

1) Crawl MUDs every 24 hours.
2) If a MUD supports MSSP increase crawl interval to every 3 hours.
3) If a MSSP MUD supports CRAWL DELAY set the crawl interval between 0:30 and 24 hours depending on the value returned using -1 for crawler default.

I don't think a crawler should go with shorter intervals than once an hour, unless specifically requested by a mud returning 0.


I'm with Scandum :)
.........................
-Diablos

http://www.mudbytes.net/banners/c61df356665d83760cba2a64a9b15c40
eotmud.com : 4000 (or 23)

http://www.eotmud.com
http://www.facebook.com/eotmud

Final Fantasy based MUD opening soon! Looking for players & builders!

Banner
Sorcerer






Group: Members
Posts: 391
Joined: Jul 14, 2006

Go to the bottom of the page Go to the top of the page
#6 id:27569 Posted Jun 26, 2009, 9:21 am

I don't get that specific crawler coming to my MUD so I can't complain. However, connecting every 24 hours then defeats the purpose of the variable that tells how many players are on when the crawler connects, which would also be confuzzled if the crawler only connected once at like 1am in the morning when most MUDs will be dead. Why don't we change that variable to an average players variable instead, or instead of a set 24 hours, 30 minutes, 12 hours ect, let MUD admins pick when the crawler connects as previously stated. Perhaps you could tell it to connect not at all, connect at this specific time, or connect every 2 hours or variable'd as determined by the admin of the MUD.

If the MUD doesn't want to be crawled, why still crawl it at a delay? Why can't the crawler just check it and say 'don't crawl thus mud' and don't come back, or maybe ping it every 3-4 days or something.
.........................
Lead Developer,
Star Wars: Galactic Insights
--
sudo apt-get sandwich

David Haley
Wizard






Group: Members
Posts: 7,783
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#7 id:27570 Posted Jun 26, 2009, 9:29 am

Can we please remove that huge log, or at least trim it? It doesn't contribute anything -- we know what the problem is and have already acknowledged it -- and it just gets in the way. Really, we get it, so let's not waste our time and space continually saying how much of a problem it is. :smile: Thanks...

Anyhow. I'm not sure we need to assume that the MB connection testing and MSSP crawler need to be the same program. But if they have to be, what Scandum proposed sounds reasonable. Given that player numbers seem to be important, I would suggest that crawl delay be in minutes after all, not hours, so that you can give easier values like 30 instead of 0:30.
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Scandum
Wizard






Group: Members
Posts: 1,783
Joined: Aug 8, 2006

Go to the bottom of the page Go to the top of the page
#8 id:27572 Posted Jun 26, 2009, 9:35 am

Banner said:
I don't get that specific crawler coming to my MUD so I can't complain. However, connecting every 24 hours then defeats the purpose of the variable that tells how many players are on when the crawler connects, which would also be confuzzled if the crawler only connected once at like 1am in the morning when most MUDs will be dead.

I think the best options are: crawl every 3 hours or crawl every 11 hours.

If you crawl every 11 hours the hour crawled will change with 2 hours per day, so after 6 days you'll have 12 measurements spread out evenly over a 24 hour period.
.........................
TinTin++ Mud Client - I can't believe it's not butter!

Hades_Kane
Wizard






Group: Moderators
Posts: 889
Joined: May 31, 2006

Go to the bottom of the page Go to the top of the page
#9 id:27573 Posted Jun 26, 2009, 9:38 am


David Haley said:
Can we please remove that huge log, or at least trim it?


Asylumius beat me to it :p
.........................
-Diablos

http://www.mudbytes.net/banners/c61df356665d83760cba2a64a9b15c40
eotmud.com : 4000 (or 23)

http://www.eotmud.com
http://www.facebook.com/eotmud

Final Fantasy based MUD opening soon! Looking for players & builders!

KaVir
Wizard






Group: Members
Posts: 2,149
Joined: Jun 19, 2006

Go to the bottom of the page Go to the top of the page
#10 id:27584 Posted Jun 26, 2009, 10:27 am

Scandum said:
From what I gathered the primary purpose of the crawler is to check if the mud is up, and secondary purpose is gathering player stats through MSSP.

TMC already checks if muds are up (and has done so for a long time), and I imagine there are other listing sites that do so as well.  So I agree that checking once every 24 hours should be fine.  If the mud has MSSP then personally I'd be tempted to check once every hour, simply because it would look good on a graph showing the on- and off-peak playerbase.
.........................
KaVir at God Wars II: godwars2.org 3000  Roomless world.  Manual combat.  Endless possibilities.

David Haley
Wizard






Group: Members
Posts: 7,783
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#11 id:27585 Posted Jun 26, 2009, 10:28 am

Agreed. If a crawler insists on doing both MSSP and normal connection pinging, it makes sense to increase frequency only for MUDs that are known to implement MSSP.
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Tyche
Wizard






Group: Members
Posts: 1,702
Joined: May 23, 2006

Go to the bottom of the page Go to the top of the page
#12 id:27587 Posted Jun 26, 2009, 10:44 am

Would you fellas please stop hitting refresh on this page?  I think you you should only refresh and post at most 4 times a day, and then only if you have a new point to make.
This is a turned based game based on the content of your post.  People who take more turns and post the same tired arguments over and over again ruin it for the casual reader.
*kof*


.........................
Proud member of Team Hetero
http://jlsysinc.gotdns.com/ladybug_laugh2.jpghttp://jlsysinc.gotdns.com/teensymud_250x80.pnghttp://jlsysinc.gotdns.com/palin_calendar.jpg
For now we see through a glass, darkly; but then face to face: now I know in part; but then shall I know even as also I am known.

Scandum
Wizard






Group: Members
Posts: 1,783
Joined: Aug 8, 2006

Go to the bottom of the page Go to the top of the page
#13 id:27594 Posted Jun 26, 2009, 11:58 am

So to get things going again:

Code (text):
1
2
3
4
5
6
 
"CRAWL DELAY"        Preferred minimum number of hours between crawls. Send "-1"
                     to use the crawler's default.
 


With the implied understanding that a crawl delay of 0 means the mud doesn't mind crawl delays of less than 1 hour.
.........................
TinTin++ Mud Client - I can't believe it's not butter!

David Haley
Wizard






Group: Members
Posts: 7,783
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#14 id:27595 Posted Jun 26, 2009, 12:04 pm

I think minutes are simpler, and there should be no distinction between "default" and "don't care". -1 should mean "please don't come back for <some long period of time>".
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Cratylus
Wizard






Group: Members
Posts: 1,768
Joined: May 22, 2006

Go to the bottom of the page Go to the top of the page
#15 id:27611 Posted Jun 26, 2009, 12:52 pm


hi which thread is this now lol

Pages:<< prev 1, 2, 3, 4 next >>

Valid XHTML 1.1! Valid CSS!