FW: [Tutor] Writing a web bot.

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: MikeSheldonmsheldon@desertraven.com
Date:  
Subject: FW: [Tutor] Writing a web bot.
Well, that would be nice, except almost all servers are HTTP 1.1 compliant.
And HTTP 1.1 states that connections should be left open for additional
requests unless otherwise specified.

Trust me, I learned this the hard way. Most servers will assume that the
client wishes to send multiple requests unless the client specifies
otherwise.

>From RFC 2616:


8.1.2 Overall Operation

A significant difference between HTTP/1.1 and earlier versions of
HTTP is that persistent connections are the default behavior of any
HTTP connection. That is, unless otherwise indicated, the client
SHOULD assume that the server will maintain a persistent connection,
even after error responses from the server.

...

8.1.2.1 Negotiation

An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
maintain a persistent connection unless a Connection header including
the connection-token "close" was sent in the request. If the server
chooses to close the connection immediately after sending the
response, it SHOULD send a Connection header including the
connection-token close.

An HTTP/1.1 client MAY expect a connection to remain open, but would
decide to keep it open based on whether the response from a server
contains a Connection header with the connection-token close. In case
the client does not want to maintain a connection for more than that
request, it SHOULD send a Connection header including the
connection-token close.

...

8.2.1 Persistent Connections and Flow Control

HTTP/1.1 servers SHOULD maintain persistent connections and use TCP's
flow control mechanisms to resolve temporary overloads, rather than
terminating connections with the expectation that clients will retry.
The latter technique can exacerbate network congestion.

Michael J. Sheldon
Internet Applications Developer
Phone: 480.699.1084
http://www.desertraven.com/
PGP Key Available on Request

-----Original Message-----
From:
[mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
Roark
Sent: Friday, July 07, 2000 20:02
To:
Subject: RE: FW: [Tutor] Writing a web bot.


>From the HTTP 1.0 specification: "Current practice requires that the

connection be established by the client prior to each request and
closed by the server after sending the response."

Certainly cooperating clients and servers can behave otherwise, but the
application in question is a bot, and no such cooperation can be
expected.

-- Rod

On Fri, 07 Jul 2000, Mike Sheldon wrote:
> Actually, HTTP does work that way. You can retrieve multiple files through

a
> single connection.
>
> Michael J. Sheldon
> Internet Applications Developer
> Phone: 480.699.1084
> http://www.desertraven.com/
> PGP Key Available on Request
>
> -----Original Message-----
> From:
> [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> Roark
> Sent: Friday, July 07, 2000 18:13
> To:
> Subject: Re: FW: [Tutor] Writing a web bot.
>
>
> HTTP doesn't work that way. The server is going to kill the connection
> after responding to each request.
>
> -- Rod
> ----------------------------------------------------------------------
> Sunset Systems                           Preconfigured Linux Computers
> http://www.sunsetsystems.com/                      and Custom Software
> ----------------------------------------------------------------------

>
> On Fri, 07 Jul 2000, you wrote:
> > Hi all.
> >
> > It appears I have found myself in a position
> > where I could use some help.
> >
> > The task I am trying to perform is write an
> > internet bot. I was going to use urllib for
> > this project however one of the requirements
> > is for the connection to be continuous during
> > the session.
> >
> > Connect to a site.
> > Get page, parse.
> > Get another page, parse.
> > use POST method, get another page, parse.
> > Disconnect from the site.
> >
> > The connection is not supposed to be dropped
> > between the requests.
> >
> > Is there a simple way to do this task???
> >
> > thanks.


_______________________________________________
Plug-discuss mailing list -
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss