FW: [Tutor] Writing a web bot.

Mike Sheldon msheldon@desertraven.com
Fri, 7 Jul 2000 22:13:43 -0700


Well, that would be nice, except almost all servers are HTTP 1.1 compliant.
And HTTP 1.1 states that connections should be left open for additional
requests unless otherwise specified.

Trust me, I learned this the hard way. Most servers will assume that the
client wishes to send multiple requests unless the client specifies
otherwise.

From RFC 2616:

8.1.2 Overall Operation

   A significant difference between HTTP/1.1 and earlier versions of
   HTTP is that persistent connections are the default behavior of any
   HTTP connection. That is, unless otherwise indicated, the client
   SHOULD assume that the server will maintain a persistent connection,
   even after error responses from the server.

...

8.1.2.1 Negotiation

   An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
   maintain a persistent connection unless a Connection header including
   the connection-token "close" was sent in the request. If the server
   chooses to close the connection immediately after sending the
   response, it SHOULD send a Connection header including the
   connection-token close.

   An HTTP/1.1 client MAY expect a connection to remain open, but would
   decide to keep it open based on whether the response from a server
   contains a Connection header with the connection-token close. In case
   the client does not want to maintain a connection for more than that
   request, it SHOULD send a Connection header including the
   connection-token close.

...

8.2.1 Persistent Connections and Flow Control

   HTTP/1.1 servers SHOULD maintain persistent connections and use TCP's
   flow control mechanisms to resolve temporary overloads, rather than
   terminating connections with the expectation that clients will retry.
   The latter technique can exacerbate network congestion.

Michael J. Sheldon
Internet Applications Developer
Phone: 480.699.1084
http://www.desertraven.com/
PGP Key Available on Request

-----Original Message-----
From: plug-discuss-admin@lists.PLUG.phoenix.az.us
[mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
Roark
Sent: Friday, July 07, 2000 20:02
To: plug-discuss@lists.PLUG.phoenix.az.us
Subject: RE: FW: [Tutor] Writing a web bot.


>From the HTTP 1.0 specification:  "Current practice requires that the
connection be established by the client prior to each request and
closed by the server after sending the response."

Certainly cooperating clients and servers can behave otherwise, but the
application in question is a bot, and no such cooperation can be
expected.

-- Rod

On Fri, 07 Jul 2000, Mike Sheldon wrote:
> Actually, HTTP does work that way. You can retrieve multiple files through
a
> single connection.
>
> Michael J. Sheldon
> Internet Applications Developer
> Phone: 480.699.1084
> http://www.desertraven.com/
> PGP Key Available on Request
>
> -----Original Message-----
> From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> Roark
> Sent: Friday, July 07, 2000 18:13
> To: plug-discuss@lists.PLUG.phoenix.az.us
> Subject: Re: FW: [Tutor] Writing a web bot.
>
>
> HTTP doesn't work that way.  The server is going to kill the connection
> after responding to each request.
>
> -- Rod
> ----------------------------------------------------------------------
> Sunset Systems                           Preconfigured Linux Computers
> http://www.sunsetsystems.com/                      and Custom Software
> ----------------------------------------------------------------------
>
> On Fri, 07 Jul 2000, you wrote:
> > Hi all.
> >
> > It appears I have found myself in a position
> > where I could use some help.
> >
> > The task I am trying to perform is write an
> > internet bot.  I was going to use urllib for
> > this project however one of the requirements
> > is for the connection to be continuous during
> > the session.
> >
> > Connect to a site.
> > Get page, parse.
> > Get another page, parse.
> > use POST method, get another page, parse.
> > Disconnect from the site.
> >
> > The connection is not supposed to be dropped
> > between the requests.
> >
> > Is there a simple way to do this task???
> >
> > thanks.

_______________________________________________
Plug-discuss mailing list  -  Plug-discuss@lists.PLUG.phoenix.az.us
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss