先看下这个截图里代码框框:
很明显这是被字体加密了,接下来我们破解这个字体加密,这是一个静态的加密,重新加载字体文件中字体编码不会发生变化:
import re
import requests
import base64
from fontTools.ttLib import TTFont
url = 'http://wojiance.b2b.huangye88.com/company_contact.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
}
page_text = requests.get(url=url,headers=headers).text
这时我们发现他是一个base64加密的,我们用正则提取出来:
ba64 = re.findall('base64,(.*?)\"\)',page_text)[0]
我们使用base64模块的b64decode方法解密:
str_data = base64.b64decode(ba64)
with open('2.woff','wb') as f:
f.write(str_data)
并且把文件保存下来格式为.woff
python有个模块叫TTFont来解析字体的,字体文件看不懂,我们要把他解析成xml,这样我们更容易查看
font = TTFont('2.woff')
font.saveXML('2.xml')
font_map = font['cmap'].getBestCmap()
字体加密的思想是构建映射关系,替换值:
我么来构建映射关系:
new_font_dict = {}
for key in font_map:
if len(str(key)) > 2:
new_font_dict[hex(key)] = font_map[key]
print(new_font_dict)
font_dict = {'zero':0, 'one':1,'two':2, 'three':3, 'four':4, 'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9}
上面为什么创建一个new_font_dict呢,因为我们解析出来的字体没的,所以我们要删掉
new_font_dict是这样的:{‘0x8826f’: ‘zero’, ‘0x88270’: ‘one’, ‘0x88271’: ‘two’, ‘0x88272’: ‘three’, ‘0x88273’: ‘four’, ‘0x88274’: ‘five’, ‘0x88275’: ‘six’, ‘0x88276’: ‘seven’, ‘0x88277’: ‘eight’, ‘0x88278’: ‘nine’}
for key in new_font_dict:
new_font_dict[key] = font_dict[new_font_dict[key]]
print(new_font_dict)
上面这一步主要就是value值替换过来,key为字体编码,value为对应的值
new_font_dict:{‘0x8826f’: 0, ‘0x88270’: 1, ‘0x88271’: 2, ‘0x88272’: 3, ‘0x88273’: 4, ‘0x88274’: 5, ‘0x88275’: 6, ‘0x88276’: 7, ‘0x88277’: 8, ‘0x88278’: 9}
for key in new_font_dict:
page_text = page_text.replace('&#'+str(key)[1:]+';',str(new_font_dict[key]))
我们要遍历我们创建映射关系字典在new_font_dict,
是这样的:{‘0x8826f’: 0, ‘0x88270’: 1, ‘0x88271’: 2, ‘0x88272’: 3, ‘0x88273’: 4, ‘0x88274’: 5, ‘0x88275’: 6, ‘0x88276’: 7, ‘0x88277’: 8, ‘0x88278’: 9}
我们把每个key的0替换成&#就行,我们就拿着去替换就好了
# !/usr/bin python3
# encoding : utf-8 -*-
# @software : PyCharm
# @file : 黄页.py
# @Time : 2021/6/16 8:44
import re
import requests
import base64
from fontTools.ttLib import TTFont
url = 'http://wojiance.b2b.huangye88.com/company_contact.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
}
page_text = requests.get(url=url,headers=headers).text
ba64 = re.findall('base64,(.*?)\"\)',page_text)[0]
print(ba64)
str_data = base64.b64decode(ba64)
with open('2.woff','wb') as f:
f.write(str_data)
font = TTFont('2.woff')
font.saveXML('2.xml')
font_map = font['cmap'].getBestCmap()
new_font_dict = {}
for key in font_map:
if len(str(key)) > 2:
new_font_dict[hex(key)] = font_map[key]
print(new_font_dict)
font_dict = {'zero':0, 'one':1,'two':2, 'three':3, 'four':4, 'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9}
for key in new_font_dict:
new_font_dict[key] = font_dict[new_font_dict[key]]
print(new_font_dict)
for key in new_font_dict:
page_text = page_text.replace('&#'+str(key)[1:]+';',str(new_font_dict[key]))
print(page_text)