信息熵可以用来判定指定信源发出的信息的不确定性,信息越是杂乱无章毫无规律,信息熵就越大。如果某信源总是发出完全一样的信息,那么熵为0,也就是说信息是完全可以确定的。
本文要点在于演示Python字典和内置函数的用法。
from math import log
from random import randint
def informationEntropy(lst):
#数据总个数
num = len(lst)
#每个数据出现的次数
numberofNoRepeat = dict()
for data in lst:
numberofNoRepeat[data] = numberofNoRepeat.get(data,0) + 1
#打印各数据出现次数,以便核对
print(numberofNoRepeat)
#返回信息熵,其中x/num为每个数据出现的频率
return abs(sum(map(lambda x: x/num * log(x/num,2), numberofNoRepeat.values())))
#功能测试
for i in range(10):
lst = [randint(1,5) for i in range(randint(5,30))]
print('Entropy:', informationEntropy(lst))
print('='*20)
print('Entropy:', informationEntropy([1,1,1,1,1,1]))
某次运行结果为:
{1: 4, 2: 3, 3: 9, 4: 3, 5: 8}
Entropy: 2.1608467607817
====================
{1: 3, 2: 1, 3: 5, 4: 2, 5: 7}
Entropy: 2.057924310831006
====================
{1: 5, 2: 3, 3: 2, 4: 1, 5: 2}
Entropy: 2.1339375660949167
====================
{1: 1, 3: 3, 4: 3, 5: 1}
Entropy: 1.8112781244591327
====================
{1: 3, 2: 4, 3: 1, 4: 3, 5: 2}
Entropy: 2.199687794731328
====================
{1: 1, 2: 2, 3: 5, 4: 3, 5: 3}
Entropy: 2.155968102145908
====================
{1: 1, 3: 2, 4: 2, 5: 1}
Entropy: 1.9182958340544893
====================
{1: 1, 2: 2, 4: 2, 5: 1}
Entropy: 1.9182958340544893
====================
{1: 8, 2: 4, 3: 6, 4: 5, 5: 6}
Entropy: 2.284560633641686
====================
{2: 3, 3: 1, 4: 2, 5: 2}
Entropy: 1.9056390622295662
====================
{1: 6}
Entropy: 0.0