memcached LRU过期机制lru_crawler指令可能会错误的清理不应该的slabs的bug
1.4.20 版本在处理 lru_crawler crawl 2 指令的时候,lru_crawler_crawl函数竟然存在变量未定义的bug,从而导致错误的清理了不应该的slabs槽位的数据。
看 lru_crawler_crawl 函数的代码:
enum crawler_result_type lru_crawler_crawl(char *slabs) {
//对参数指定的slabs的类进行爬虫扫描,告诉后台线程item_crawler_thread 去清理过期数据
char *b = NULL;
uint32_t sid = 0;
uint8_t tocrawl[POWER_LARGEST];//需要清理扫描的大小类
//注意上面变了数组没有初始化,下面肯定有问题的。
if (pthread_mutex_trylock(&lru_crawler_lock) != 0) {
return CRAWLER_RUNNING;
}
pthread_mutex_lock(&cache_lock);
if (strcmp(slabs, "all") == 0) {
for (sid = 0; sid < LARGEST_ID; sid++) {
tocrawl[sid] = 1;
}
} else {
for (char *p = strtok_r(slabs, ",", &b);
p != NULL;
p = strtok_r(NULL, ",", &b)) {
if (!safe_strtoul(p, &sid) || sid < POWER_SMALLEST
|| sid > POWER_LARGEST) {
pthread_mutex_unlock(&cache_lock);
pthread_mutex_unlock(&lru_crawler_lock);
return CRAWLER_BADCLASS;
}
tocrawl[sid] = 1;//为什么不初始化?如果不是0怎么办
}
}
for (sid = 0; sid < LARGEST_ID; sid++) {
if (tocrawl[sid] != 0 && tails[sid] != NULL) {
if (settings.verbose > 2)
fprintf(stderr, "Kicking LRU crawler off for slab %d\n", sid);
crawlers[sid].nbytes = 0;
crawlers[sid].nkey = 0;
crawlers[sid].it_flags = 1; /* For a crawler, this means enabled. */
crawlers[sid].next = 0;
crawlers[sid].prev = 0;
crawlers[sid].time = 0;
crawlers[sid].remaining = settings.lru_crawler_tocrawl;//这次要清理这么多
crawlers[sid].slabs_clsid = sid;
crawler_link_q((item *)&crawlers[sid]);
crawler_count++;
}
}
pthread_mutex_unlock(&cache_lock);
pthread_cond_signal(&lru_crawler_cond);//唤起item_crawler_thread
STATS_LOCK();
stats.lru_crawler_running = true;
STATS_UNLOCK();
pthread_mutex_unlock(&lru_crawler_lock);
return CRAWLER_OK;
}
如果安装的是这个memcached版本,最好不要轻易使用crawler指令。
查了一下最新的代码,最近的1.4.24版本已经修复了这个问题,增加了如下代码即可:
memset(tocrawl, 0, sizeof(uint8_t) * MAX_NUMBER_OF_SLAB_CLASSES);

近期评论