Python产生token唯一值的算法性能比较

<< CentOS安装配置vsftp虚拟用户登录(Berkeley DB+PAM) | Home | Sublime Text一个文件内创建多个代码片段(snippets) >>

Python产生token唯一值的算法性能比较

在很多场合的时候，我们都需要产生不重复的字符串来标志操作的唯一性，比如：HTTP请求中，我们需要产生SessionID，在数据存储的时候，我们也可能需要生成唯一的字符串来作为数据的ID以便我们进行索引。本文的由来是在使用tornado的时候，需要使用Session，Session需要有唯一的ID值。为了尽可能快速的生成安全可用的Session ID，而对Python当前的一些比较通用的生成方法进行了比较。为了方便说明，后继的所有说法均以token作为SessionID，唯一字串等的统一表述。

在网络中比较流行的是使用uuid4().hex的方式来生成token，但另外一种声音是，uuid4().hex的安全性不高，需要使用安全性更高的算法来代替，后继出现了使用os.urandom(24)，或者自行随机生成字符串的形式（uuid4使用的是os.urandom(16)），再到后来，使用OpenSSL和M2Crypto的方式来生成随机数。OpenSSL和M2Crypto需要Python安装pyOpenSSL和M2Crypto。M2Crypto由于接触少，因此没有对M2Crypto进行测试。

测试环境：

CPU: Intel Xeon E3 3.30Hz 3.70Hz

Memory: 16GB

System: Windows 7 64-bits

# times:测试次数
# func: 要测试的函数名称
# 此方法是入口方法
# 各个算法以函数的形式定义，接受times参数即可——By MitchellChu
def crash_testx(times, func):
	import time
	print('\r\n--------------------------------------------')
	print("test function: %s" % func.func_name)
	print("begin time: %s" % time.strftime('%Y%m%d %X'))
	begin_time = time.time()
	(crashed_times, hash_data_len) = func(times)
	print("end time: %s" % time.strftime('%Y%m%d %X'))
	print("take time:%s" % (time.time() - begin_time))
	print("test times: %d, crashed times:%d, hash data length:%d" % (times, crashed_times, hash_data_len))
	print('--------------------------------------------\r\n')

产生方式(generate method)	长度(hash length)	组合范围	耗时(second/10million)	包含包(import packages)
base64.b64encode(os.urandom(24),['_','-'])	32	64^32	65.9289999008	base64, os
base64.b32encode(os.urandom(20))	32	32^32	86.3580000401	base64, os
sha1(os.urandom(24)).hexdigest()	40	16^40	29.6259999275	os, hashlib
''.join(random.choice(alphabet_digits) for _ in range(32))	32	62^32	214.484999895	random,string
binascii.b2a_base64(os.urandom(24))[:-1]	32	64^32	22.5640001297	os, binascii
(binascii.b2a_base64(os.urandom(24))[:-1]).translate(translationstr)^*	32	64^32	24.5349998474	os, binascii
(binascii.b2a_base64(os.urandom(24))[:-1]).translate(translationstr)^*	32	64^32	398.623999834	os, binascii
(binascii.b2a_base64(os.urandom(24))[:-1]).replace('/','_').replace('+','-')^*	32	64^32	27.271999836	os, binascii
uuid4().hex	32	16^32	159.762000084	uuid
binascii.b2a_base64(OpenSSL.rand.bytes(24))[:-1]	32	64^32	61.1059999466	binascii,OpenSSL

* 星号表示的几个使用的都是binascii.b2a_base64来生成64位数据，不同的是：第一个translate中的translationstr是全局生成的，而第二个translate中的translationstr是在每次生成token时生成，测试的时间来看，每次生成需要耗费大量时间。最后为了比较，使用了两个replace来进行对照。

从测试的结果来看，性能最佳的为binascii.b2a_base64：

binascii.b2a_base64(os.urandom(24))[:-1]

其次性能非常棒的是SHA1：

sha1(os.urandom(24)).hexdigest()

从生成的覆盖范围来看，SHA1生成的会少于base64。但base64中有两个特殊字符（这需要注意传入的字节数），因此在有些时候并不适合。

uuid4().hex在测试中看来确实性能不好，仅略优于自定义token生成的方法（random.choice）。

OpenSSL生成的token安全性具体比os.urandom优多少，并无测试，也并不太清楚具体的实现细节，哪位知道，可以说明下。：P

结论：

可以用base64的地方，选择binascii.b2a_base64是不错的选择 —— 根据W3的Session ID的字串中对identifier的定义，Session ID中使用的是base64，但在Cookie的值内使用需要注意“=”这个特殊字符的存在；
如果要安全字符（字母数字），SHA1也是一个不错的选择，性能也不错；

参考：

Tags:

Sunday, November 08, 2015 | 其他技术

文章评论

# re: Python产生token唯一值的算法性能比较
- kjeldahl
- 9/17/2018 7:59 PM
博主，您好。请问您是怎么安装openssl的呢，我这里openssl里没有rand的bytes模块，无法测试binascii.b2a_base64(OpenSSL.rand.bytes(24))[:-1]呢

发表评论

标题*: 给个方向吧.
姓名 *: 怎么称呼您？
Email
网站地址
评论内容 *: 写上些您的评论吧.; Remember Me?

Please add 5 and 6 and type the answer here:

Mitchell Chu's Blog

让自己有迹可寻...
Nothing is impossible for a willing heart.