批量修改替换文件夹（包含子文件夹）指定类型文件的编码以及里面的内容（字符串）

时间：10-15来源：作者：点击数：53

批量修改替换文件夹（包含子文件夹）指定类型文件的编码以及里面的内容（字符串）。

如文件夹 G:\cdsy\work\gsh 文件夹又包含子文件夹，整个gsh文件夹里面共含有 18563个html文件。

html的文件编码是gb2312，且文件里有字符串 <META http-equiv="Content-Type" content="text/html; charset=gb2312">

现在想把所有html 文件的编码改成 utf-8，同时替换里面的字符串 <META http-equiv="Content-Type" content="text/html; charset=gb2312"> 为 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

以下代码实现，整个过程用时大约13分钟结束，代码写的匆忙，没注释，如有不懂请到社区bbs.cdsy.xyz 编程栏目发帖探讨。

import os
import re

import chardet
from chardet.universaldetector import UniversalDetector

wpath = 'G:\\cdsy\\work\\gsh'
'''
def get_file_all(path, filetype):
    files = []
    for file in os.listdir(path):
        if file.endswith(filetype):
            temp_path = os.path.join(path, file)
            files.append(temp_path)
    return files
'''
def getFilst(path, filetype):
    filelist = []
    for root, subDirs, files in os.walk(path):
        for fileName in files:
            if fileName.endswith(filetype):
                filelist.append(os.path.join(root, fileName))
    return filelist

def getEncoding(file):
    f = open(file, 'rb')
    detector = UniversalDetector()
    for line in f.readlines():
        detector.feed(line)
        if detector.done:
            break
    detector.close()
    f.close()
    return detector.result

a = 0
files = getFilst(wpath, '.html')
for file in files:
    p = open(file, 'rb+')
    content = p.read()
    encoding = chardet.detect(content)['encoding']
    content = content.decode(encoding).encode('utf8')
    th = b'<META http-equiv="Content-Type" content="text/html; charset=gb2312">'
    th1 = b'<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />'
    content = content.replace(th, th1)
    p.seek(0)
    p.write(content)
    p.flush()
    p.close()
    a = a + 1

print('已完成' + str(a) + '个文件')