在Python爬虫中,或者使用POST提交的过程中,往往需要提交验证码来验证,除了人工打码,付费的api接口(打码接口),深度学习识别验证码,当然还有适合新人使用的OCR验证码识别库,简单的验证码是可以完全实现自动打码的,比如下面本渣渣分享的通用验证码自动识别库:ddddocr(带带弟弟OCR)!
- pip install ddddocr
想要更快,使用国内镜像安装:
- pip install ddddocr -i https://pypi.tuna.tsinghua.edu.cn/simple
参数名 默认值 说明 use_gpu False Bool 是否使用gpu进行推理,如果该值为False则device_id不生效 device_id 0 int cuda设备号,目前仅支持单张显卡
classification 参数名 默认值 说明 img 0 bytes 图片的bytes格式
- import ddddocr
-
- ocr = ddddocr.DdddOcr()
- with open('code.png', 'rb') as f:
- img_bytes = f.read()
- res = ocr.classification(img_bytes)
- print(res)
url:https://www.feifeidm.com/Play/10919-0-0.html
验证码地址:https://www.feifeidm.com/include/vdimgck.php?r=0.7145461007261535
参考源码:
- import ddddocr
- import requests
-
- headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
- }
- code_url="https://www.feifeidm.com/include/vdimgck.php?r=0.7145461007261535"
- r=requests.get(url=code_url,headers=headers,timeout=5)
- with open('code.png','wb')as f:
- f.write(r.content)
- print("下载验证码成功!")
-
- ocr = ddddocr.DdddOcr()
-
- #with open(r'C:\Users\Administrator\Desktop\验证码识别\code.png', 'rb') as f:
- #img_bytes = f.read()
- img_bytes=r.content
-
- res = ocr.classification(img_bytes)
-
- print(res)
url:http://fankui.help.sogou.com/index.php/web/web/index?type=2
库安装:
- pip install pytesseract
库用法:
- import pytesseract
- from PIL import Image
- text = pytesseract.image_to_string(Image.open(code.png"))
- print(text)
库安装:需要依次安装三个依赖库,安装命令如下,其中shapely库可能会受系统影响安装报错。
- pip install paddlepaddle
- pip install shapely
- pip install paddleocr
库用法:
- ocr = PaddleOCR(use_angle_cls=True, lang="ch")
-
- # 输入待识别图片路径
- img_path = r"code.png"
-
- # 输出结果保存路径
- result = ocr.ocr(img_path, cls=True)
- for line in result:
- print(line)
-
- from PIL import Image
- image = Image.open(img_path).convert('RGB')
- boxes = [line[0] for line in result]
- txts = [line[1][0] for line in result]
- scores = [line[1][1] for line in result]
- im_show = draw_ocr(image, boxes, txts, scores)
- im_show = Image.fromarray(im_show)
- im_show.show()
库安装:
- pip install easyocr
库用法:
- import easyocr
-
- #设置识别中英文两种语言
- reader = easyocr.Reader(['ch_sim','en'], gpu = False) # need to run only once to load model into memory
- result = reader.readtext(r"code.png", detail = 0)
- print(result)
库安装:
- pip install muggle_ocr
库用法:
- import muggle_ocr
-
- # 初始化sdk;model_type 包含了 ModelType.OCR/ModelType.Captcha 两种模式,分别对应常规图片与验证码
- sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)
-
- with open(r"code.png", "rb") as f:
- img = f.read()
-
- text = sdk.predict(image_bytes=img)
- print(text)