先看下这个截图里代码框框:
很明显这是被字体加密了,接下来我们破解这个字体加密,这是一个静态的加密,重新加载字体文件中字体编码不会发生变化:
- import re
-
- import requests
- import base64
- from fontTools.ttLib import TTFont
-
- url = 'http://wojiance.b2b.huangye88.com/company_contact.html'
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
- }
- page_text = requests.get(url=url,headers=headers).text
-
这时我们发现他是一个base64加密的,我们用正则提取出来:
- ba64 = re.findall('base64,(.*?)\"\)',page_text)[0]
-
我们使用base64模块的b64decode方法解密:
- str_data = base64.b64decode(ba64)
- with open('2.woff','wb') as f:
- f.write(str_data)
-
并且把文件保存下来格式为.woff
python有个模块叫TTFont来解析字体的,字体文件看不懂,我们要把他解析成xml,这样我们更容易查看
- font = TTFont('2.woff')
- font.saveXML('2.xml')
- font_map = font['cmap'].getBestCmap()
-
字体加密的思想是构建映射关系,替换值:
我么来构建映射关系:
- new_font_dict = {}
- for key in font_map:
- if len(str(key)) > 2:
- new_font_dict[hex(key)] = font_map[key]
- print(new_font_dict)
-
- font_dict = {'zero':0, 'one':1,'two':2, 'three':3, 'four':4, 'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9}
-
上面为什么创建一个new_font_dict呢,因为我们解析出来的字体没的,所以我们要删掉
new_font_dict是这样的:{‘0x8826f’: ‘zero’, ‘0x88270’: ‘one’, ‘0x88271’: ‘two’, ‘0x88272’: ‘three’, ‘0x88273’: ‘four’, ‘0x88274’: ‘five’, ‘0x88275’: ‘six’, ‘0x88276’: ‘seven’, ‘0x88277’: ‘eight’, ‘0x88278’: ‘nine’}
- for key in new_font_dict:
- new_font_dict[key] = font_dict[new_font_dict[key]]
- print(new_font_dict)
-
上面这一步主要就是value值替换过来,key为字体编码,value为对应的值
new_font_dict:{‘0x8826f’: 0, ‘0x88270’: 1, ‘0x88271’: 2, ‘0x88272’: 3, ‘0x88273’: 4, ‘0x88274’: 5, ‘0x88275’: 6, ‘0x88276’: 7, ‘0x88277’: 8, ‘0x88278’: 9}
- for key in new_font_dict:
- page_text = page_text.replace('&#'+str(key)[1:]+';',str(new_font_dict[key]))
-
我们要遍历我们创建映射关系字典在new_font_dict,
是这样的:{‘0x8826f’: 0, ‘0x88270’: 1, ‘0x88271’: 2, ‘0x88272’: 3, ‘0x88273’: 4, ‘0x88274’: 5, ‘0x88275’: 6, ‘0x88276’: 7, ‘0x88277’: 8, ‘0x88278’: 9}
我们把每个key的0替换成&#就行,我们就拿着去替换就好了
- # !/usr/bin python3
- # encoding : utf-8 -*-
- # @software : PyCharm
- # @file : 黄页.py
- # @Time : 2021/6/16 8:44
- import re
-
- import requests
- import base64
- from fontTools.ttLib import TTFont
-
- url = 'http://wojiance.b2b.huangye88.com/company_contact.html'
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
- }
- page_text = requests.get(url=url,headers=headers).text
- ba64 = re.findall('base64,(.*?)\"\)',page_text)[0]
- print(ba64)
- str_data = base64.b64decode(ba64)
- with open('2.woff','wb') as f:
- f.write(str_data)
- font = TTFont('2.woff')
- font.saveXML('2.xml')
- font_map = font['cmap'].getBestCmap()
-
- new_font_dict = {}
- for key in font_map:
- if len(str(key)) > 2:
- new_font_dict[hex(key)] = font_map[key]
- print(new_font_dict)
-
- font_dict = {'zero':0, 'one':1,'two':2, 'three':3, 'four':4, 'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9}
-
-
- for key in new_font_dict:
- new_font_dict[key] = font_dict[new_font_dict[key]]
- print(new_font_dict)
-
-
- for key in new_font_dict:
- page_text = page_text.replace('&#'+str(key)[1:]+';',str(new_font_dict[key]))
- print(page_text)
-