FW: [Tutor] Writing a web bot.
Rod Roark
rod@sunsetsystems.com
Sat, 8 Jul 2000 06:59:03 -0700
Perhaps most servers do. However notice the capitalized word MAY in
your references. This means that the behavior is not a requirement of
the specification. Even the SHOULD behavior is not a requirement for
"conditional compliance" with the 1.1 spec.
I suppost the bot app could hope for a persistent connection provided
that it has a fallback mechanism if it fails.
Regards,
-- Rod
On Fri, 07 Jul 2000, you wrote:
> Well, that would be nice, except almost all servers are HTTP 1.1 compliant.
> And HTTP 1.1 states that connections should be left open for additional
> requests unless otherwise specified.
>
> Trust me, I learned this the hard way. Most servers will assume that the
> client wishes to send multiple requests unless the client specifies
> otherwise.
>
> >From RFC 2616:
>
> 8.1.2 Overall Operation
>
> A significant difference between HTTP/1.1 and earlier versions of
> HTTP is that persistent connections are the default behavior of any
> HTTP connection. That is, unless otherwise indicated, the client
> SHOULD assume that the server will maintain a persistent connection,
> even after error responses from the server.
>
> ...
>
> 8.1.2.1 Negotiation
>
> An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
> maintain a persistent connection unless a Connection header including
> the connection-token "close" was sent in the request. If the server
> chooses to close the connection immediately after sending the
> response, it SHOULD send a Connection header including the
> connection-token close.
>
> An HTTP/1.1 client MAY expect a connection to remain open, but would
> decide to keep it open based on whether the response from a server
> contains a Connection header with the connection-token close. In case
> the client does not want to maintain a connection for more than that
> request, it SHOULD send a Connection header including the
> connection-token close.
>
> ...
>
> 8.2.1 Persistent Connections and Flow Control
>
> HTTP/1.1 servers SHOULD maintain persistent connections and use TCP's
> flow control mechanisms to resolve temporary overloads, rather than
> terminating connections with the expectation that clients will retry.
> The latter technique can exacerbate network congestion.
>
> Michael J. Sheldon
> Internet Applications Developer
> Phone: 480.699.1084
> http://www.desertraven.com/
> PGP Key Available on Request
>
> -----Original Message-----
> From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> Roark
> Sent: Friday, July 07, 2000 20:02
> To: plug-discuss@lists.PLUG.phoenix.az.us
> Subject: RE: FW: [Tutor] Writing a web bot.
>
>
> >From the HTTP 1.0 specification: "Current practice requires that the
> connection be established by the client prior to each request and
> closed by the server after sending the response."
>
> Certainly cooperating clients and servers can behave otherwise, but the
> application in question is a bot, and no such cooperation can be
> expected.
>
> -- Rod
>
> On Fri, 07 Jul 2000, Mike Sheldon wrote:
> > Actually, HTTP does work that way. You can retrieve multiple files through
> a
> > single connection.
> >
> > Michael J. Sheldon
> > Internet Applications Developer
> > Phone: 480.699.1084
> > http://www.desertraven.com/
> > PGP Key Available on Request
> >
> > -----Original Message-----
> > From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> > [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> > Roark
> > Sent: Friday, July 07, 2000 18:13
> > To: plug-discuss@lists.PLUG.phoenix.az.us
> > Subject: Re: FW: [Tutor] Writing a web bot.
> >
> >
> > HTTP doesn't work that way. The server is going to kill the connection
> > after responding to each request.
> >
> > -- Rod
> > ----------------------------------------------------------------------
> > Sunset Systems Preconfigured Linux Computers
> > http://www.sunsetsystems.com/ and Custom Software
> > ----------------------------------------------------------------------
> >
> > On Fri, 07 Jul 2000, you wrote:
> > > Hi all.
> > >
> > > It appears I have found myself in a position
> > > where I could use some help.
> > >
> > > The task I am trying to perform is write an
> > > internet bot. I was going to use urllib for
> > > this project however one of the requirements
> > > is for the connection to be continuous during
> > > the session.
> > >
> > > Connect to a site.
> > > Get page, parse.
> > > Get another page, parse.
> > > use POST method, get another page, parse.
> > > Disconnect from the site.
> > >
> > > The connection is not supposed to be dropped
> > > between the requests.
> > >
> > > Is there a simple way to do this task???
> > >
> > > thanks.