Redis数据类型与指令详解之集合(t

集合编码方式 redis 集合(set)使用redis_enconding_int与redis_enconding_ht两种编码方式 1、redis_enconding_int: intset.c/intset.h 2、redis_enconding_ht: dict.c/dict.h 第一个添加到集合的元素，决定了创建集合时所使用的编码：如果第一个元素可以表示
集合编码方式redis 集合(set)使用redis_enconding_int与redis_enconding_ht两种编码方式
1、redis_enconding_int: intset.c/intset.h
2、redis_enconding_ht: dict.c/dict.h
第一个添加到集合的元素，决定了创建集合时所使用的编码：如果第一个元素可以表示为 long long 类型值（也即是，它是一个整数），那么集合的初始编码为 redis_encoding_intset ；否则，集合的初始编码为 redis_encoding_ht 。
编码切换：如果一个集合使用 redis_encoding_intset 编码，那么当以下任何一个条件被满足时，这个集合会被转换成 redis_encoding_ht 编码：
1、 intset 保存的整数值个数超过 server.set_max_intset_entries （默认值为 512 ）
2、试图往集合里添加一个新元素，并且这个元素不能被表示为 long long 类型
集合指令实现sadd 指令格式: sadd key member [member...]
将一个或多个member元素加入到集合key当中，由于集合成员不能重复，已经存在于集合key中的member元素将被忽略。
如果key不存在，则创建一个包含被添加的member元素的新集合。
如果key不是集合类型（redis_set）时，则操作出错,redis返回一个错误。
时间复杂度:o(n)void saddcommand(redisclient *c) { robj *set; int j, added = 0; set = lookupkeywrite(c->db,c->argv[1]);//写查找数据库中名为c->argv[1]的集合 if (set == null) {//集合不存在 set = settypecreate(c->argv[2]);//创建一个新的集合 dbadd(c->db,c->argv[1],set);//将该集合添加到数据库中，dict中的key就是集合的名称，value就是集合元素 } else { if (set->type != redis_set) {//判断是否是集合类型 addreply(c,shared.wrongtypeerr); return; } } for (j = 2; j argc; j++) {//添加集合元素 c->argv[j] = tryobjectencoding(c->argv[j]);//尝试使用整型存储数据 if (settypeadd(set,c->argv[j])) added++; } if (added) { signalmodifiedkey(c->db,c->argv[1]);//通知数据库哪些键key变化了，把变化的key存储到watched_keys中，只在事务操作时才用的着 notifykeyspaceevent(redis_notify_set,sadd,c->argv[1],c->db->id);//暂不知道啥用途 } server.dirty += added;//数据库中数据变化的数目 addreplylonglong(c,added);}
lookupkeywrite函数在object.c文件中，用来在数据库中查找指定key的value值。
settypecreate函数在创建一个新的redis_set时，根据添加的元素类型为整型还是字符串会创建不同的存储数据结构
//创建一个集合对象,如果value是整型，那么使用intset，否则使用dictrobj *settypecreate(robj *value) { if (isobjectrepresentableaslonglong(value,null) == redis_ok) return createintsetobject();//intset return createsetobject();//dict}robj *createsetobject(void) { dict *d = dictcreate(&setdicttype,null); robj *o = createobject(redis_set,d); o->encoding = redis_encoding_ht; return o;}robj *createintsetobject(void) { intset *is = intsetnew(); robj *o = createobject(redis_set,is); o->encoding = redis_encoding_intset; return o;}robj *createobject(int type, void *ptr) { robj *o = zmalloc(sizeof(*o)); o->type = type; o->encoding = redis_encoding_raw; o->ptr = ptr; o->refcount = 1; /* set the lru to the current lruclock (minutes resolution). */ o->lru = server.lruclock; return o;}
scard scard key
返回集合key中元素的个数
时间复杂度:o(1)
简单根据集合的编码类型:如果是ht编码,那么直接通过dictsize函数得到字典中元素的个数;如果是intset编码,那么直接通过intsetlen函数得到结果.
sismembersismember key member
判断member元素是否集合key的中的元素
时间复杂度: o(1)
简单根据集合的编码类型:如果是ht编码,那么直接通过dictfind函数查找字典中是否存在该member;如果是intset编码,那么直接通过intsetfind函数查找是否存在该member.
smembers smembers key
返回集合key中的所有元素,不存在的key被视为空集合。
时间复杂度: o(n)smove smove source destination member
将member元素从集合source转移到集合destination，如果集合source中不存在member元素，那么smove指令不执行任何操作，返回0。若存在member元素，将member元素从集合source中删除，并添加到集合destination,如果集合destination中已存在member元素,那么仅仅从集合source中删除元素member。void smovecommand(redisclient *c) {//将member元素从source集合移动到destination集合 robj *srcset, *dstset, *ele; srcset = lookupkeywrite(c->db,c->argv[1]);//从数据库中查找集合source dstset = lookupkeywrite(c->db,c->argv[2]); ele = c->argv[3] = tryobjectencoding(c->argv[3]);//member元素 /* if the source key does not exist return 0 */ if (srcset == null) { addreply(c,shared.czero); return; } /* if the source key has the wrong type, or the destination key * is set and has the wrong type, return with an error. */ if (checktype(c,srcset,redis_set) || (dstset && checktype(c,dstset,redis_set))) return;//类型检查 /* if srcset and dstset are equal, smove is a no-op */ if (srcset == dstset) {//source与dest相同 addreply(c,shared.cone); return; } /* if the element cannot be removed from the src set, return 0. */ if (!settyperemove(srcset,ele)) {//从源集合中删除member元素 addreply(c,shared.czero); return; } notifykeyspaceevent(redis_notify_set,srem,c->argv[1],c->db->id); /* remove the src set from the database when empty */ if (settypesize(srcset) == 0) {//移除member元素后，源集合为空，删除 dbdelete(c->db,c->argv[1]); notifykeyspaceevent(redis_notify_generic,del,c->argv[1],c->db->id); } signalmodifiedkey(c->db,c->argv[1]); signalmodifiedkey(c->db,c->argv[2]); server.dirty++; /* create the destination set when it doesn't exist */ if (!dstset) {//目标集合不存在，则新建 dstset = settypecreate(ele); dbadd(c->db,c->argv[2],dstset); } /* an extra key has changed when ele was successfully added to dstset */ if (settypeadd(dstset,ele)) {//添加member元素到目标集合 server.dirty++; notifykeyspaceevent(redis_notify_set,sadd,c->argv[2],c->db->id); } addreply(c,shared.cone);}
spop spop key
移除并返回集合key中的一个随机元素。
时间复杂度:o(1)
由于是随机删除一个元素，那么对复制与aof肯定有影响，因此该操作后，需要将该指令改变为srem即将该元素删除，参考rewriteclientcommandvector函数。
srandmember srandmember key [count]
[count]参数可选,如果没有count,那么返回集合中的一个随机元素。
如果count为正数，且小于集合元素个数，那么命令返回一个包含count个元素的数组，数组中的元素各不相同,如果count大于等于集合基数，那么返回整个集合;如果 count 为负数，那么命令返回一个数组，数组中的元素可能会重复出现多次，而数组的长度为count的绝对值。srem srem key member [member ...]
移除集合key中的一个或多个member元素，不存在的member元素会被忽略。集合求交算法集合求交的指令有两个:sinter与sinterstore
sinter key [key ...]
返回所有给定集合的交集,不存在的 key 被视为空集,当给定集合当中有一个空集时，结果也为空集.
时间复杂度: o(n * m)，n 为给定集合中元素数目最小的集合，m 为给定集合的个数
sinterstore destination key [key ...]
与sinter指令类似,不同的仅仅将结果集存储到目标集合destination中,如果集合destination已存在,那么将其覆盖
void sintergenericcommand(redisclient *c, robj **setkeys, unsigned long setnum, robj *dstkey) { robj **sets = zmalloc(sizeof(robj*)*setnum); settypeiterator *si;//迭代器 robj *eleobj, *dstset = null; int64_t intobj; void *replylen = null; unsigned long j, cardinality = 0; int encoding; for (j = 0; j db,setkeys[j]) : lookupkeyread(c->db,setkeys[j]); if (!setobj) {//任何一个集合不存在，那么总的交集就为空 zfree(sets); if (dstkey) { if (dbdelete(c->db,dstkey)) { signalmodifiedkey(c->db,dstkey); server.dirty++; } addreply(c,shared.czero); } else { addreply(c,shared.emptymultibulk); } return; } if (checktype(c,setobj,redis_set)) {//类型检查 zfree(sets); return; } sets[j] = setobj; } /* sort sets from the smallest to largest, this will improve our * algorithm's performance */ //按照集合元素个数从小到大排序 qsort(sets,setnum,sizeof(robj*),qsortcomparesetsbycardinality); /* the first thing we should output is the total number of elements... * since this is a multi-bulk write, but at this stage we don't know * the intersection set size, so we use a trick, append an empty object * to the output list and save the pointer to later modify it with the * right length */ if (!dstkey) { replylen = adddeferredmultibulklength(c); } else { /* if we have a target key where to store the resulting set * create this key with an empty set inside */ dstset = createintsetobject(); } /* iterate all the elements of the first (smallest) set, and test * the element against all the other sets, if at least one set does * not include the element it is discarded */ /** 求多个集合交集的算法思想：首先按照集合元素个数对集合进行qsort，然后遍历排序后的第一个集合中的元素，查看该元素在其他集合中是否存在，如果在其他集合中都存在，那么该元素为一个结果 */ si = settypeinititerator(sets[0]); while((encoding = settypenext(si,&eleobj,&intobj)) != -1) { for (j = 1; j encoding == redis_encoding_intset && !intsetfind((intset*)sets[j]->ptr,intobj))//在集合sets[j]中没有找到集合sets[0]的intobj { break; /* in order to compare an integer with an object we * have to use the generic function, creating an object * for this */ } else if (sets[j]->encoding == redis_encoding_ht) {//集合sets[j]编码为ht，sets[0]为intset eleobj = createstringobjectfromlonglong(intobj);//将sets[0]中的intobj转换为sds if (!settypeismember(sets[j],eleobj)) {//如果eleobj不在集合sets[j]中 decrrefcount(eleobj); break; } decrrefcount(eleobj); } } else if (encoding == redis_encoding_ht) {//ht /* optimization... if the source object is integer * encoded and the target set is an intset, we can get * a much faster path. */ if (eleobj->encoding == redis_encoding_int && sets[j]->encoding == redis_encoding_intset && !intsetfind((intset*)sets[j]->ptr,(long)eleobj->ptr)) { break; /* else... object to object check is easy as we use the * type agnostic api here. */ } else if (!settypeismember(sets[j],eleobj)) { break; } } } /* only take action when all sets contain the member */ if (j == setnum) { if (!dstkey) { if (encoding == redis_encoding_ht) addreplybulk(c,eleobj); else addreplybulklonglong(c,intobj); cardinality++; } else {//添加到临时目标集合 if (encoding == redis_encoding_intset) { eleobj = createstringobjectfromlonglong(intobj); settypeadd(dstset,eleobj); decrrefcount(eleobj); } else { settypeadd(dstset,eleobj); } } } } settypereleaseiterator(si); if (dstkey) { /* store the resulting set into the target, if the intersection * is not an empty set. */ int deleted = dbdelete(c->db,dstkey);//覆盖原来的目标集合 if (settypesize(dstset) > 0) { dbadd(c->db,dstkey,dstset); addreplylonglong(c,settypesize(dstset)); notifykeyspaceevent(redis_notify_set,sinterstore, dstkey,c->db->id); } else {//空集 decrrefcount(dstset); addreply(c,shared.czero); if (deleted) notifykeyspaceevent(redis_notify_generic,del, dstkey,c->db->id); } signalmodifiedkey(c->db,dstkey); server.dirty++; } else { setdeferredmultibulklength(c,replylen,cardinality); } zfree(sets);}/*sinter key [key...]*/void sintercommand(redisclient *c) {//计算所有给定集合的交集 sintergenericcommand(c,c->argv+1,c->argc-1,null);}/*sinter destination key [key...]*/void sinterstorecommand(redisclient *c) {//计算所有给定集合的交集，但存储在集合destination中 sintergenericcommand(c,c->argv+2,c->argc-2,c->argv[1]);}
集合求差算法集合求差有两个指令:sdiff与sdiffstore
sdiff key [key ...]
返回所有给定集合之间的差集,不存在的集合key视为空集。
时间复杂度: o(n)，n 是所有给定集合的成员数量之和。
sdiffstore destination key [key ...]
与sdiff 类似，但它将差集的结果保存到集合destination,如果集合destination已经存在，则将其覆盖。
集合求并算法集合求并有两个指令:sunion与sunionstore
sunion key [key ...]
返回所有给定集合的并集,不存在的集合key被视为空集。
时间复杂度: o(n)，n 是所有给定集合的成员数量之和。
sunionstore destination key [key ...]
与sunion类似，但它将并集结果保存到集合destination，如果集合destination已经存在，则将其覆盖。
#define redis_op_union 0#define redis_op_diff 1#define redis_op_inter 2void suniondiffgenericcommand(redisclient *c, robj **setkeys, int setnum, robj *dstkey, int op) { robj **sets = zmalloc(sizeof(robj*)*setnum); settypeiterator *si; robj *ele, *dstset = null; int j, cardinality = 0; int diff_algo = 1; for (j = 0; j db,setkeys[j]) : lookupkeyread(c->db,setkeys[j]); if (!setobj) { sets[j] = null; continue; } if (checktype(c,setobj,redis_set)) { zfree(sets); return; } sets[j] = setobj; } /* select what diff algorithm to use. * * algorithm 1 is o(n*m) where n is the size of the element first set * and m the total number of sets. * * algorithm 2 is o(n) where n is the total number of elements in all * the sets. * * we compute what is the best bet with the current input here. */ //对于sdiff指令选择最优算法 if (op == redis_op_diff && sets[0]) { long long algo_one_work = 0, algo_two_work = 0; for (j = 0; j < setnum; j++) { if (sets[j] == null) continue; algo_one_work += settypesize(sets[0]); algo_two_work += settypesize(sets[j]); } /* algorithm 1 has better constant times and performs less operations * if there are elements in common. give it some advantage. */ algo_one_work /= 2;//算法1可能不需要全部比较，因此除2来降低常数时间 diff_algo = (algo_one_work 1) { /* with algorithm 1 it is better to order the sets to subtract * by decreasing size, so that we are more likely to find * duplicated elements asap. */ //对sets[1]至sets[setnum-1]按照集合元素的个数从大到小排序 qsort(sets+1,setnum-1,sizeof(robj*), qsortcomparesetsbyrevcardinality); } } /* we need a temp set object to store our union. if the dstkey * is not null (that is, we are inside an sunionstore operation) then * this set object will be the resulting object to set into the target key*/ dstset = createintsetobject(); if (op == redis_op_union) {//并集操作，把所有元素不重复的操作即可 /* union is trivial, just add every element of every set to the * temporary set. */ for (j = 0; j db,dstkey,dstset); addreplylonglong(c,settypesize(dstset)); notifykeyspaceevent(redis_notify_set, op == redis_op_union ? sunionstore : sdiffstore, dstkey,c->db->id); } else { decrrefcount(dstset); addreply(c,shared.czero); if (deleted) notifykeyspaceevent(redis_notify_generic,del, dstkey,c->db->id); } signalmodifiedkey(c->db,dstkey); server.dirty++; } zfree(sets);}/*sunion key [key..]*/void sunioncommand(redisclient *c) {//计算所有给定集合的并集 suniondiffgenericcommand(c,c->argv+1,c->argc-1,null,redis_op_union);}/*sunionstore destination key [key..]*/void sunionstorecommand(redisclient *c) {//计算所有给定集合的并集，当将结果存储在destination中 suniondiffgenericcommand(c,c->argv+2,c->argc-2,c->argv[1],redis_op_union);}/*sdiff key [key...]*/void sdiffcommand(redisclient *c) {//计算第一个集合与另外所有集合的差集 suniondiffgenericcommand(c,c->argv+1,c->argc-1,null,redis_op_diff);}/*sdiffstore destination key [key...]*/void sdiffstorecommand(redisclient *c) {//与sdiff类似，但将结果存储在destination中 suniondiffgenericcommand(c,c->argv+2,c->argc-2,c->argv[1],redis_op_diff);}
小结集合是redis中重要的数据类型，其存储使用intset与hash table(dict)两种数据结构，集合的所有指令都比较简单易懂，集合求差算法的两种优化方式可以学习。
集合所有指令的注解http://redis.io/commands#set
感谢黄健宏（huangz1990）的redis设计与实现及其他对redis2.6源码的相关注释对我在研究redis2.8源码方面的帮助。

Redis数据类型与指令详解之集合(t_set)

VIP推荐