memcached LRU过期机制lru_crawler指令可能会错误的清理不应该的slabs的bug
1.4.20 版本在处理 lru_crawler crawl 2 指令的时候,lru_crawler_crawl函数竟然存在变量未定义的bug,从而导致错误的清理了不应该的slabs槽位的数据。
看 lru_crawler_crawl 函数的代码:
enum crawler_result_type lru_crawler_crawl(char *slabs) { //对参数指定的slabs的类进行爬虫扫描,告诉后台线程item_crawler_thread 去清理过期数据 char *b = NULL; uint32_t sid = 0; uint8_t tocrawl[POWER_LARGEST];//需要清理扫描的大小类 //注意上面变了数组没有初始化,下面肯定有问题的。 if (pthread_mutex_trylock(&lru_crawler_lock) != 0) { return CRAWLER_RUNNING; } pthread_mutex_lock(&cache_lock); if (strcmp(slabs, "all") == 0) { for (sid = 0; sid < LARGEST_ID; sid++) { tocrawl[sid] = 1; } } else { for (char *p = strtok_r(slabs, ",", &b); p != NULL; p = strtok_r(NULL, ",", &b)) { if (!safe_strtoul(p, &sid) || sid < POWER_SMALLEST || sid > POWER_LARGEST) { pthread_mutex_unlock(&cache_lock); pthread_mutex_unlock(&lru_crawler_lock); return CRAWLER_BADCLASS; } tocrawl[sid] = 1;//为什么不初始化?如果不是0怎么办 } } for (sid = 0; sid < LARGEST_ID; sid++) { if (tocrawl[sid] != 0 && tails[sid] != NULL) { if (settings.verbose > 2) fprintf(stderr, "Kicking LRU crawler off for slab %d\n", sid); crawlers[sid].nbytes = 0; crawlers[sid].nkey = 0; crawlers[sid].it_flags = 1; /* For a crawler, this means enabled. */ crawlers[sid].next = 0; crawlers[sid].prev = 0; crawlers[sid].time = 0; crawlers[sid].remaining = settings.lru_crawler_tocrawl;//这次要清理这么多 crawlers[sid].slabs_clsid = sid; crawler_link_q((item *)&crawlers[sid]); crawler_count++; } } pthread_mutex_unlock(&cache_lock); pthread_cond_signal(&lru_crawler_cond);//唤起item_crawler_thread STATS_LOCK(); stats.lru_crawler_running = true; STATS_UNLOCK(); pthread_mutex_unlock(&lru_crawler_lock); return CRAWLER_OK; }
如果安装的是这个memcached版本,最好不要轻易使用crawler指令。
查了一下最新的代码,最近的1.4.24版本已经修复了这个问题,增加了如下代码即可:
memset(tocrawl, 0, sizeof(uint8_t) * MAX_NUMBER_OF_SLAB_CLASSES);
近期评论