I have looked everywhere and found millions of python proxy servers but none do precisely what i would like (i think :s)
I have had quite a bit of experience with python generally, but i’m quite new to the world of the deep dark secrets of the HTTP protocol.
What i think might be useful would be a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it.
Also, i think what has been confusing me is everything the hidden stuff is doing, e.g. if the class inherits from BaseHTTPServer.BaseHTTPRequestHandler what precisely happens when a page is requested, as in many of the examples i have found there is no reference to path variable then suddenly poof! self.path is used in a function. im assuming it’s been inherited, but how does it end up with the path used?
im sorry if that didn’t make much sense, as my idea of my problem is probably scrambled 🙁
if you can think of anything which would make my question clearer please, please suggest i add it. xxx
Also, a link to an explaination of the detailed processes through which the proxy handles the request, requests the page (how to read/modify the data at this point) and passes it to the original requester would be greatly appreciated xxxx
“a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it.” That is practically the definition of an HTTP proxy.
There’s a really simple proxy example here: http://effbot.org/librarybook/simplehttpserver.htm
The core of it is just 3 lines:
class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler): def do_GET(self): self.copyfile(urllib.urlopen(self.path), self.wfile)
So it’s a
SimpleHTTPRequestHandler that, in response to a GET request, opens the URL in the path (a request to a proxy typically looks like “GET http://example.com/“, not like “GET /index.html”). It then just copies whatever it can read from that URL to the response.
Notet that this is really minimal. It doesn’t deal with headers at all, I believe.
path is documented at http://docs.python.org/library/basehttpserver.html. It was set before your
do* method was called.
From the twisted Wiki
from twisted.web import proxy, http from twisted.internet import reactor from twisted.python import log import sys log.startLogging(sys.stdout) class ProxyFactory(http.HTTPFactory): protocol = proxy.Proxy reactor.listenTCP(8080, ProxyFactory()) reactor.run()
proxpy looks rather promising, it’s very simple to tweak requests and responses.