以个人成绩网页页面为例:
右键查看源代码:
右键另存为单独的html文件,然后代码读取并处理
- import re
-
- f = open("GP.html","r",encoding='utf-8')
- html = f.read()
-
- table = re.findall(r'<table(.*?)</table>', html, re.S)#查找html中table之间的内容
- nowtable = table[0]#前两个表格为成绩信息
- nowtable = nowtable.replace('\t','')#将空格换行等去除
- nowtable = nowtable.replace('\n','')
- nowtable = nowtable.replace(' ','')
- nowtable = nowtable.replace(' ','')
- td0 = re.findall(r'<tdclass="center">(.*?)</td>', nowtable, re.S)#成绩想关的信息都在tdclass="center"td之间
- print("主修课程信息为:\n",td0)
- nowtable = table[1]
- nowtable = nowtable.replace('\t','')
- nowtable = nowtable.replace('\n','')
- nowtable = nowtable.replace(' ','')
- nowtable = nowtable.replace(' ','')
- td1 = re.findall(r'<tdclass="center">(.*?)</td>', nowtable, re.S)
- print("选修课信息为:\n",td1)
- print("选修课信息第一个值为:\n",td1[0])
-
结果:
如果想要计算GPA,字符转换为对应的数值进行计算就行了。