FW: [Tutor] Writing a web bot.

Rod Roark rod@sunsetsystems.com
Sat, 8 Jul 2000 06:59:03 -0700


Perhaps most servers do.  However notice the capitalized word MAY in
your references.  This means that the behavior is not a requirement of
the specification.  Even the SHOULD behavior is not a requirement for
"conditional compliance" with the 1.1 spec.

I suppost the bot app could hope for a persistent connection provided
that it has a fallback mechanism if it fails.

Regards,

-- Rod

On Fri, 07 Jul 2000, you wrote:
> Well, that would be nice, except almost all servers are HTTP 1.1 compliant.
> And HTTP 1.1 states that connections should be left open for additional
> requests unless otherwise specified.
> 
> Trust me, I learned this the hard way. Most servers will assume that the
> client wishes to send multiple requests unless the client specifies
> otherwise.
> 
> >From RFC 2616:
> 
> 8.1.2 Overall Operation
> 
>    A significant difference between HTTP/1.1 and earlier versions of
>    HTTP is that persistent connections are the default behavior of any
>    HTTP connection. That is, unless otherwise indicated, the client
>    SHOULD assume that the server will maintain a persistent connection,
>    even after error responses from the server.
> 
> ...
> 
> 8.1.2.1 Negotiation
> 
>    An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
>    maintain a persistent connection unless a Connection header including
>    the connection-token "close" was sent in the request. If the server
>    chooses to close the connection immediately after sending the
>    response, it SHOULD send a Connection header including the
>    connection-token close.
> 
>    An HTTP/1.1 client MAY expect a connection to remain open, but would
>    decide to keep it open based on whether the response from a server
>    contains a Connection header with the connection-token close. In case
>    the client does not want to maintain a connection for more than that
>    request, it SHOULD send a Connection header including the
>    connection-token close.
> 
> ...
> 
> 8.2.1 Persistent Connections and Flow Control
> 
>    HTTP/1.1 servers SHOULD maintain persistent connections and use TCP's
>    flow control mechanisms to resolve temporary overloads, rather than
>    terminating connections with the expectation that clients will retry.
>    The latter technique can exacerbate network congestion.
> 
> Michael J. Sheldon
> Internet Applications Developer
> Phone: 480.699.1084
> http://www.desertraven.com/
> PGP Key Available on Request
> 
> -----Original Message-----
> From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> Roark
> Sent: Friday, July 07, 2000 20:02
> To: plug-discuss@lists.PLUG.phoenix.az.us
> Subject: RE: FW: [Tutor] Writing a web bot.
> 
> 
> >From the HTTP 1.0 specification:  "Current practice requires that the
> connection be established by the client prior to each request and
> closed by the server after sending the response."
> 
> Certainly cooperating clients and servers can behave otherwise, but the
> application in question is a bot, and no such cooperation can be
> expected.
> 
> -- Rod
> 
> On Fri, 07 Jul 2000, Mike Sheldon wrote:
> > Actually, HTTP does work that way. You can retrieve multiple files through
> a
> > single connection.
> >
> > Michael J. Sheldon
> > Internet Applications Developer
> > Phone: 480.699.1084
> > http://www.desertraven.com/
> > PGP Key Available on Request
> >
> > -----Original Message-----
> > From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> > [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> > Roark
> > Sent: Friday, July 07, 2000 18:13
> > To: plug-discuss@lists.PLUG.phoenix.az.us
> > Subject: Re: FW: [Tutor] Writing a web bot.
> >
> >
> > HTTP doesn't work that way.  The server is going to kill the connection
> > after responding to each request.
> >
> > -- Rod
> > ----------------------------------------------------------------------
> > Sunset Systems                           Preconfigured Linux Computers
> > http://www.sunsetsystems.com/                      and Custom Software
> > ----------------------------------------------------------------------
> >
> > On Fri, 07 Jul 2000, you wrote:
> > > Hi all.
> > >
> > > It appears I have found myself in a position
> > > where I could use some help.
> > >
> > > The task I am trying to perform is write an
> > > internet bot.  I was going to use urllib for
> > > this project however one of the requirements
> > > is for the connection to be continuous during
> > > the session.
> > >
> > > Connect to a site.
> > > Get page, parse.
> > > Get another page, parse.
> > > use POST method, get another page, parse.
> > > Disconnect from the site.
> > >
> > > The connection is not supposed to be dropped
> > > between the requests.
> > >
> > > Is there a simple way to do this task???
> > >
> > > thanks.