基本功能: 双语字幕中英文分离, 各自成一行
避免srtedit在合并某些字幕后中英文各占不止一行, 导致最终生成的字幕占4行的尴尬情况.
基本示例:
源文件:
- 1
- 00:01:06,107 --> 00:01:07,483
- This is a test
- 这是一个测试
-
- 2
- 00:01:12,906 --> 00:01:16,450
- 测试!
- 快测试呀
-
- 3
- 00:01:18,703 --> 00:01:19,953
- 测试好了没有?
- 我问你好了没有?
- Is the test OK?
- What's the result?
处理后:
- 1
- 00:01:06,107 --> 00:01:07,483
- 这是一个测试
- This is a test
-
- 2
- 00:01:12,906 --> 00:01:16,450
- 测试! 快测试呀
-
- 3
- 00:01:18,703 --> 00:01:19,953
- 测试好了没有? 我问你好了没有?
- Is the test OK? What's the result?
基本功能完好,细节有些小毛病…
以下是所有代码:
- #coding:utf-8
- import re
- import sys
- if len(sys.argv)<2: exit();
- filename = sys.argv[1]
-
- f=file(filename,'r')
- a=f.read()
-
- try:
- a=a.decode('utf8')
- except:
- pass
-
- aa=a.split('\n\n')
- f.close()
- k=[]
-
- for a in aa:
- if len(a.split('\n'))<2:
- for m in a.split('\n'): k.append(m.replace('- ','-'))
- continue
- time = "\n".join(a.split('\n')[:2])
- chi = [x for x in a.split('\n')[2:] if x != x.encode('unicode-escape')]
- eng = [x for x in a.split('\n')[2:] if x == x.encode('unicode-escape')]
-
- k.append(time)
- if len(chi)>0: k.append(" ".join(chi).replace('- ','-'))
- if len(eng)>0: k.append(" ".join(eng).replace('- ','-'))
- k.append('')
-
- f=open(filename+"_fixed.srt","w")
- for m in k: f.write(m.encode(sys.getfilesystemencoding())+'\n')
- f.close()