Python自动下载最新的chromedriver

时间：12-06来源：作者：点击数：

前言

chromedriver是Web UI自动化必备的一个谷歌Chrome浏览器的驱动文件，需要和自己电脑上安装的Chrome浏览器版本对应才能正常使用。

经常遇到新手装的Chrome浏览器自动更新了，但是chromedriver没更新，导致原本正常的脚本突然不能用了。

因此想做一个脚本自动下载最新的chromedriver。

代码

import requests
import re

url = 'http://npm.taobao.org/mirrors/chromedriver/'
rep = requests.get(url).text

time_list = []          # 用来存放版本时间
time_version_dict = {}  # 用来存放版本与时间对应关系

result = re.compile(r'\d.*?/</a>.*?Z').findall(rep) # 匹配文件夹（版本号）和时间

for i in result:
    time = i[-24:-1]                                # 提取时间
    version = re.compile(r'.*?/').findall(i)[0]     # 提取版本号
    time_version_dict[time] = version               # 构建时间和版本号的对应关系，形成字典
    time_list.append(time)                          # 形成时间列表

latest_version = time_version_dict[max(time_list)]              # 用最大（新）时间去字典中获取最新的版本号
download_url = url + latest_version + 'chromedriver_win32.zip'  # 拼接下载链接

file = requests.get(download_url)
with open("chromedriver.zip", 'wb') as zip_file:                # 保存文件到脚本所在目录
    zip_file.write(file.content)

思路

1、访问页面，观察规律

http://npm.taobao.org/mirrors/chromedriver/

打开页面后，观察发现我们需要的最新的Chromedriver版本，可以通过右边的时间进行对比后找到，但需要排除最下方的几个无用文件的干扰。

可以看出版本目录的链接都是数字开头，/结尾，容易想到用正则表达式提取，规则为\d.*?/

2、查看页面源码（接口返回），选择处理方法

查看接口，发现页面直接返回的是HTML，因此无法使用json处理，考虑使用BeautifulSoup4处理html文件。

然而实践中发现时间并没有包在a标签内，用bs4获取所有的a标签，并没有办法获取到对应的时间，所以也不能考虑用bs4处理。

那么只能尝试使用正则表达式对信息进行提取处理了，选用re库。

3、数据处理

每个a标签的格式为：

<a href="/mirrors/chromedriver/70.0.3538.16/">70.0.3538.16/</a> 2018-09-17T20:50:43.843Z

我们需要的只是后半段就可以了：70.0.3538.16/</a> 2018-09-17T20:50:43.843Z

规律为数字开头，中间是/</a>，结尾是Z，正则为：\d.*?/</a>.*?Z

代码为：results = re.compile(r'\d.*?/</a>.*?Z').findall(html内容)

在提出取来的信息中二次提取版本号：re.compile(r'.*?/').findall(result)

时间也可以用正则提取，不过由于格式比较固定，可以直接用字符串切片result[-24:-1]，即最后的24个字符

接下来就简单了，找到最新的时间，然后把这个时间对应的版本号找到就行了

思路为在遍历每个a标签的同时：

收集所有时间组成一个列表time_list；
同时将时间和版本构造成字典time_version_dict；

最后用max函数找到time_list的最大值，把最大值作为key去找time_version_dict的版本。

PS：挖个坑，后续增加自动解压至指定目录的功能，方便直接替换系统环境内的旧Chromedriver文件。

更新

来填坑了

通过cmd命令查询当前Chromedriver的目录和版本

自动解压到系统环境内替换Chromedriver的代码

import os
import zipfile

outstd1 = os.popen('where chromedriver').read()     # 查询系统内Chromedriver的存放路径
path = outstd1.strip('chromedriver.exe\n')

outstd2 = os.popen('chromedriver --version').read() # 查询系统内Chromedriver的版本，这一步可以省略，只是为了比较一下
version = outstd2.split(' ')[1]

f = zipfile.ZipFile("chromedriver.zip",'r')         # 解压当前目录下的zip到path路径
for file in f.namelist():
    f.extract(file, path)

print(os.popen('chromedriver --version').read())    # 看看版本号变了没有

将两段代码合并一下，加入判断是否已经为最新文件，再封装一下：

import requests
import re
import os
import zipfile

def get_latest_version(url):
    '''查询最新的Chromedriver版本'''
    rep = requests.get(url).text
    time_list = []                                          # 用来存放版本时间
    time_version_dict = {}                                  # 用来存放版本与时间对应关系
    result = re.compile(r'\d.*?/</a>.*?Z').findall(rep)     # 匹配文件夹（版本号）和时间
    for i in result:
        time = i[-24:-1]                                    # 提取时间
        version = re.compile(r'.*?/').findall(i)[0]         # 提取版本号
        time_version_dict[time] = version                   # 构建时间和版本号的对应关系，形成字典
        time_list.append(time)                              # 形成时间列表
    latest_version = time_version_dict[max(time_list)][:-1] # 用最大（新）时间去字典中获取最新的版本号
    return latest_version

def download_driver(download_url):
    '''下载文件'''
    file = requests.get(download_url)
    with open("chromedriver.zip", 'wb') as zip_file:        # 保存文件到脚本所在目录
        zip_file.write(file.content)
        print('下载成功')

def get_version():
    '''查询系统内的Chromedriver版本'''
    outstd2 = os.popen('chromedriver --version').read()
    return outstd2.split(' ')[1]

def get_path():
    '''查询系统内Chromedriver的存放路径'''
    outstd1 = os.popen('where chromedriver').read()
    return outstd1.strip('chromedriver.exe\n')

def unzip_driver(path):
    '''解压Chromedriver压缩包到指定目录'''
    f = zipfile.ZipFile("chromedriver.zip",'r')
    for file in f.namelist():
        f.extract(file, path)

if __name__ == "__main__":
    url = 'http://npm.taobao.org/mirrors/chromedriver/'
    latest_version = get_latest_version(url)
    print('最新的chromedriver版本为：', latest_version)
    version = get_version()
    print('当前系统内的Chromedriver版本为：', version)
    if version == latest_version:
        print('当前系统内的Chromedriver已经是最新的')
    else:
        print('当前系统内的Chromedriver不是最新的，需要进行更新')
        download_url = url + latest_version + '/chromedriver_win32.zip'  # 拼接下载链接
        download_driver(download_url)
        path = get_path()
        print('替换路径为：', path)
        unzip_driver(path)
        print('更新后的Chromedriver版本为：', get_version())

测试一下：

收工。

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:Python操作AD域服务器进行组织和用户的查询和添加下一篇:Python判断软件版本号的大小

InDesign入门教程\|半	Vmware虚拟机三种网络
Git的安装与卸载详细	Photoshop非常实用的