服务器在指定时间内没有应答,抛出 requests.exceptions.ConnectTimeout
requests.get('http://github.com', timeout=0.001)
# 抛出错误
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1b16da75f8>, 'Connection to github.com timed out. (connect timeout=0.001)'))
若分别指定连接和读取的超时时间,服务器在指定时间没有应答,抛出 requests.exceptions.ConnectTimeout
requests.get('http://github.com', timeout=(6.05, 0.01))
# 抛出错误
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='github.com', port=80): Read timed out. (read timeout=0.01)
抛出 requests.exceptions.ConnectionError
requests.get('http://github.comasf', timeout=(6.05, 27.05))
# 抛出错误
requests.exceptions.ConnectionError: HTTPConnectionPool(host='github.comasf', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f75826665f8>: Failed to establish a new connection: [Errno -2] Name or service not known',))
requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded
原因:
解决:
requests ip代理爬虫报错 HTTPSConnectionPool
import time
import random
import requests
USER_AGENTS = [
"Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"
]
headers = {
"User-Agent": ""
}
# 借助上面的USER-AGENT反爬
s = requests.session()
s.keep_alive = False
requests.adapters.DEFAULT_RETRIES = 10
url = "https://baike.baidu.com/item/人工智能/9180?fromtitle=AI&fromid=25417&fr=aladdin"
for i in range(10):
proxys = {
# news_ip是已经读取好的ip 就不放上面代码了
"https": "http://"+ new_ips[i],
"http": "http://" + new_ips[i]
}
headers['User-Agent'] = random.choice(USER_AGENTS)
print(proxys)
print(headers['User-Agent'])
req = requests.get(url, headers=headers, verify = False, proxies = proxys, timeout = 20).content.decode('utf-8')
print(req)
time.sleep(5)
给出的方式,确认不是IP不可行的问题。
后来在知乎上看到有人在传入proxy给proxies的时候,
将字典中的"https" 和 “http” 全部大写了,尝试之后确实可行了将字典中的"https"和"http"全部大写了,尝试之后确实可行了,将字典中的"https"和"http"全部大写了,尝试之后确实可行了
for i in range(10):
proxys = {
"HTTPS": "HTTP://"+ new_ips[i],
"HTTP": "HTTP://" + new_ips[i]
# 在这里全部大写了!
}
headers['User-Agent'] = random.choice(USER_AGENTS)
print(proxys)
print(headers['User-Agent'])
req = requests.get(url, headers=headers, verify = False, proxies = proxys, timeout = 20).content.decode('utf-8')
print(req)
time.sleep(5)
记录一下今晚踩的几个雷:
代理服务器拒绝建立连接,端口拒绝连接或未开放,抛出 requests.exceptions.ProxyError
requests.get('http://github.com', timeout=(6.05, 27.05), proxies={"http": "192.168.10.1:800"})
# 抛出错误
requests.exceptions.ProxyError: HTTPConnectionPool(host='192.168.10.1', port=800): Max retries exceeded with url: http://github.com/ (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce3438c6d8>: Failed to establish a new connection: [Errno 111] Connection refused',)))
requests.exceptions.ProxyError: HTTPConnectionPool
查看代理设置
使用命令查看本机当前代理:
$: env|grep -i proxy
结果:
https_proxy=127.0.0.1:8888
http_proxy=127.0.0.1:8888
socks_proxy=
ftp_proxy=
解决方法
export http_proxy=''
export https_proxy=''
但这个对于我来说,不起作用。然后翻墙去了google,得到了下面的解法:
修改用户目录下的.bashrc文件
查看.bashrc:
$ cat ./bashrc
# 最后四行如下:
export https_proxy='127.0.0.1:8888'
export http_proxy='127.0.0.1:8888'
export socks_proxy=''
export ftp_proxy=''
修改.bashrc文件
$ vi ./bashrc
# 修改成下面的就好
export https_proxy=''
export http_proxy=''
export socks_proxy=''
export ftp_proxy=''
# 退出保存
# 执行下面的命令,使bashrc文件生效
$ source ~/.bashrc
据说这种方式也有效的(禁用代理):
session = requests.Session()
session.trust_env = False
response = session.get('http://ff2.pw')
import os
# os.environ['NO_PROXY']设置为你的目标网址的域名即可
os.environ['NO_PROXY'] = 'stackoverflow.com'
# 如果要设置多个域名,逗号分割
os.environ['NO_PROXY'] = 'stackoverflow.com,baidu.com'
NO_PROXY的意思就是指定某个域名别用代理去处理
代理服务器没有响应 requests.exceptions.ConnectTimeout
requests.get('http://github.com', timeout=(6.05, 27.05), proxies={"http": "10.200.123.123:800"})
# 抛出错误
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='10.200.123.123', port=800): Max retries exceeded with url: http://github.com/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa8896cc6d8>, 'Connection to 10.200.123.123 timed out. (connect timeout=6.05)'))
说明与代理建立连接成功,代理也发送请求到目标站点,但是代理读取目标站点资源超时
即使代理访问很快,如果代理服务器访问的目标站点超时,这个锅还是代理服务器背
假定代理可用,timeout就是向代理服务器的连接和读取过程的超时时间,不用关心代理服务器是否连接和读取成功
requests.get('http://github.com', timeout=(2, 0.01), proxies={"http": "192.168.10.1:800"})
# 抛出错误
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='192.168.10.1:800', port=1080): Read timed out. (read timeout=0.5)
可能是断网导致,抛出 requests.exceptions.ConnectionError
requests.get('http://github.com', timeout=(6.05, 27.05))
# 抛出错误
requests.exceptions.ConnectionError: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc8c17675f8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
你可以告诉 requests 在经过以 timeout 参数设定的秒数时间之后停止等待响应。基本上所有的生产代码都应该使用这一参数。如果不使用,你的程序可能会永远失去响应:
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
并不是整个下载响应的时间限制,而是如果服务器在 timeout 秒内没有应答,将会引发一个异常(更精确地说,是在 timeout 秒内没有从基础套接字上接收到任何字节的数据时)
- 遇到网络问题(如:DNS 查询失败、拒绝连接等)时,Requests 会抛出一个 requests.exceptions.ConnectionError 异常。
- 如果 HTTP 请求返回了不成功的状态码, Response.raise_for_status() 会抛出一个 HTTPError 异常。
- 若请求超时,则抛出一个 Timeout 异常。
- 若请求超过了设定的最大重定向次数,则会抛出一个 TooManyRedirects 异常。
- 所有Requests显式抛出的异常都继承自 requests.exceptions.RequestException 。