FW: [Tutor] Writing a web bot.

The Wolf codewolf@earthlink.net
Sun, 09 Jul 2000 17:29:54 -0700


Rod Roark wrote:
> 
> Perhaps most servers do.  However notice the capitalized word MAY in
> your references.  This means that the behavior is not a requirement of
> the specification.  Even the SHOULD behavior is not a requirement for
> "conditional compliance" with the 1.1 spec.
> 
> I suppost the bot app could hope for a persistent connection provided
> that it has a fallback mechanism if it fails.
> 
> Regards,
> 
> -- Rod
> 
> On Fri, 07 Jul 2000, you wrote:
> > Well, that would be nice, except almost all servers are HTTP 1.1 compliant.
> > And HTTP 1.1 states that connections should be left open for additional
> > requests unless otherwise specified.
> >
> > Trust me, I learned this the hard way. Most servers will assume that the
> > client wishes to send multiple requests unless the client specifies
> > otherwise.
> >
> > >From RFC 2616:
> >
> > 8.1.2 Overall Operation
> >
> >    A significant difference between HTTP/1.1 and earlier versions of
> >    HTTP is that persistent connections are the default behavior of any
> >    HTTP connection. That is, unless otherwise indicated, the client
> >    SHOULD assume that the server will maintain a persistent connection,
> >    even after error responses from the server.
> >
> > ...
> >
> > 8.1.2.1 Negotiation
> >
> >    An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
> >    maintain a persistent connection unless a Connection header including
> >    the connection-token "close" was sent in the request. If the server
> >    chooses to close the connection immediately after sending the
> >    response, it SHOULD send a Connection header including the
> >    connection-token close.
> >
> >    An HTTP/1.1 client MAY expect a connection to remain open, but would
> >    decide to keep it open based on whether the response from a server
> >    contains a Connection header with the connection-token close. In case
> >    the client does not want to maintain a connection for more than that
> >    request, it SHOULD send a Connection header including the
> >    connection-token close.
> >
> > ...
> >
> > 8.2.1 Persistent Connections and Flow Control
> >
> >    HTTP/1.1 servers SHOULD maintain persistent connections and use TCP's
> >    flow control mechanisms to resolve temporary overloads, rather than
> >    terminating connections with the expectation that clients will retry.
> >    The latter technique can exacerbate network congestion.
> >
> > Michael J. Sheldon
> > Internet Applications Developer
> > Phone: 480.699.1084
> > http://www.desertraven.com/
> > PGP Key Available on Request
> >
> > -----Original Message-----
> > From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> > [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> > Roark
> > Sent: Friday, July 07, 2000 20:02
> > To: plug-discuss@lists.PLUG.phoenix.az.us
> > Subject: RE: FW: [Tutor] Writing a web bot.
> >
> >
> > >From the HTTP 1.0 specification:  "Current practice requires that the
> > connection be established by the client prior to each request and
> > closed by the server after sending the response."
> >
> > Certainly cooperating clients and servers can behave otherwise, but the
> > application in question is a bot, and no such cooperation can be
> > expected.
> >
> > -- Rod
> >
> > On Fri, 07 Jul 2000, Mike Sheldon wrote:
> > > Actually, HTTP does work that way. You can retrieve multiple files through
> > a
> > > single connection.
> > >
> > > Michael J. Sheldon
> > > Internet Applications Developer
> > > Phone: 480.699.1084
> > > http://www.desertraven.com/
> > > PGP Key Available on Request
> > >
> > > -----Original Message-----
> > > From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> > > [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Rod
> > > Roark
> > > Sent: Friday, July 07, 2000 18:13
> > > To: plug-discuss@lists.PLUG.phoenix.az.us
> > > Subject: Re: FW: [Tutor] Writing a web bot.
> > >
> > >
> > > HTTP doesn't work that way.  The server is going to kill the connection
> > > after responding to each request.
> > >
> > > -- Rod
> > > ----------------------------------------------------------------------
> > > Sunset Systems                           Preconfigured Linux Computers
> > > http://www.sunsetsystems.com/                      and Custom Software
> > > ----------------------------------------------------------------------
> > >
> > > On Fri, 07 Jul 2000, you wrote:
> > > > Hi all.
> > > >
> > > > It appears I have found myself in a position
> > > > where I could use some help.
> > > >
> > > > The task I am trying to perform is write an
> > > > internet bot.  I was going to use urllib for
> > > > this project however one of the requirements
> > > > is for the connection to be continuous during
> > > > the session.
> > > >
> > > > Connect to a site.
> > > > Get page, parse.
> > > > Get another page, parse.
> > > > use POST method, get another page, parse.
> > > > Disconnect from the site.
> > > >
> > > > The connection is not supposed to be dropped
> > > > between the requests.
> > > >
> > > > Is there a simple way to do this task???
> > > >
> > > > thanks.
> 
> _______________________________________________
> Plug-discuss mailing list  -  Plug-discuss@lists.PLUG.phoenix.az.us
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


We have to assume this one does keep the connection.

This is acutally one of the requirements for the bot.

The problem is with the Python urllib which kills the connection 
after reciving the first page.  I would like to keep that connection.

Thanks.

The Wolf
-- 
The Wolf

"The questions is not if we are paranoid, 
the question is if we are paranoid enough."