Memcached 缓存过期机制源码分析

首页 > Memcached > Memcached 缓存过期机制源码分析

Memcached 缓存过期机制源码分析

2015年10月7日 kulv 发表评论阅读评论 19196次阅读

之前的文章说了 memcached源码学习-线程框架和 get操作处理代码，其实最重要的是他的缓存过期策略，以及内存管理机制，篇幅过长分几篇文章写，这里写一下memcached的缓存过期策略。

说到缓存过期策略，网上有一大堆帖子会介绍说“LRU, 最近最少使用 ” 算法，而且源码中间也是大量的含lru字样的函数，变量，比如lru_crawler_crawl 等，很多人也就信了，可事实却不是这样的。

零、前言

实际上memcached的缓存策略是LU, 没有R，也就是没有最少使用这回事，能想到，也就是说memcached没有记录一个key最近被访问过几次这回事，他只是记录了最近访问时间，以及过期时间（如果设置了的话）。来看一下一个key-value的内存表示形式：

typedef struct _stritem {
    struct _stritem *next;
    struct _stritem *prev;
    struct _stritem *h_next;    /* hash chain next */
    rel_time_t      time;       /* least recent access *///最后一次访问时间
    rel_time_t      exptime;    /* expire time *///主动设置的过期时间
    int             nbytes;     /* size of data */
    unsigned short  refcount;
    uint8_t         nsuffix;    /* length of flags-and-length string */
    uint8_t         it_flags;   /* ITEM_* above */
    uint8_t         slabs_clsid;/* which slab class we're in */
    uint8_t         nkey;       /* key length, w/terminating null and padding */
    /* this odd type prevents type-punning issues when we do
     * the little shuffle to save space when not using CAS. */
    union {
        uint64_t cas;
        char end;
    } data[];
    /* if it_flags & ITEM_CAS we have 8 bytes CAS */
    /* then null-terminated key */
    /* then " flags length\r\n" (no terminating null) */
    /* then data with terminating \r\n (no terminating null; it's binary!) */
} item;

上面time是最近访问时间，exptime是主动设置的过期时间， refcount是内部使用的引用计数，data为一个联合体，为了节省内存。 time的更新时机为 do_item_update 函数调用的时候。一共有如下几个：

存储的时候设置为current_time；
每次get访问的时候，调用item_update 更新时间（process_get_command)；
do_store_item 去存储一个item的时候，发现已经有了，那就更新时间；
执行touch命令的时候，更新time时间（process_touch_command）；

关于过期时间exptime字段，只会在主动设置的时候才会有效，否则无效，另外有个小坑是如果设置了过期时间超过30天，那么实际上的效果是：这个key是1秒后过期的，用的时候注意，否则就玩死人了。可以看下如下代码：

static rel_time_t realtime(const time_t exptime) {
    /*expiretime参数要么是个大于28天的秒数的时间戳，并且大于服务器启动的时间戳，也就是说，确保这个是个时间戳，
     * 或者为最大30天的秒数，代表从现在起，多少秒过期
        *
     * */
    /* no. of seconds in 30 days - largest possible delta exptime */

    if (exptime == 0) return 0; /* 0 means never expire */

    if (exptime > REALTIME_MAXDELTA) {
        /* if item expiration is at/before the server started, give it an
           expiration time of 1 second after the server started.
           (because 0 means don't expire).  without this, we'd
           underflow and wrap around to some large value way in the
           future, effectively making items expiring in the past
           really expiring never */
        if (exptime <= process_started)
            return (rel_time_t)1;//1秒后过期
        return (rel_time_t)(exptime - process_started);//返回的是服务器启动后多少秒过期
    } else {
        return (rel_time_t)(exptime + current_time);//在这个时间戳之后过期
    }
}

下面说下其缓存的过期机制，和redis不同，memcached不会记录一个即将过期的key列表，并且其过期机制属于“懒惰过期” ， key过期后不会立即删除，而是会在下次访问，或者下次有内存需要分配的时候，等一堆时机上会有过期回收的机会。整体可以分为如下几部分：

do_item_get 查询的时候，如果过期了，删除，返回空。
do_item_alloc 分配一个新的大小类似的数据的时候，看setting.oldest_live 和exptime 两部分来过期一部分item；
执行flush_all 命令的时候，do_item_flush_expired 清空所有数据的时候；
lru_crawler enable 进行后台爬虫扫描过期。

下面一个个介绍其实现原理。

一、查询时检查过期

这个最简单了，因为设置的时候记录了访问时间，所以查询的时候，只要判断一下时间就行了。如下代码：

item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) {
    item *it = assoc_find(key, nkey, hv);
//---------
    if (it != NULL) {
        if (settings.oldest_live != 0 && settings.oldest_live <= current_time &&
            it->time <= settings.oldest_live) {
            //flush_all指令可以控制要不要刷掉所有时间在这之前的数据, 这算懒惰过期机制的一个
            do_item_unlink(it, hv);
            do_item_remove(it);
            it = NULL;
            if (was_found) {
                fprintf(stderr, " -nuked by flush");
            }
        } else if (it->exptime != 0 && it->exptime <= current_time) {
            do_item_unlink(it, hv);//进行懒惰过期清空
            do_item_remove(it);
            it = NULL;
            if (was_found) {
                fprintf(stderr, " -nuked by expire");
            }
        } else {
            it->it_flags |= ITEM_FETCHED;//搞定，找到了
            DEBUG_REFCNT(it, '+');
        }
    }
//----------
}

注意上面的代码中oldest_live相关的逻辑是跟flush_all指令有关系的，这个后面介绍。

二、申请新item的时候，顺便检查过期

也就是说，当memcached接到新的set类似命令的时候，需要申请一个新的item用来存储新数据，此时他会检查跟这个item大小类似的这个slabs组里面的过期数据。

我们知道memcached对于内存池的管理，是使用注明的slab内存池管理机制。所有的slab大小快都存放在static slabclass_t slabclass[MAX_NUMBER_OF_SLAB_CLASSES]; 的数组里面，具体的实现机制后面的文章再写。

当接到update命令时，process_update_command 函数被调用，进而调用item_alloc 申请一个新的item，

static void process_update_command(conn *c, token_t *tokens, const size_t ntokens, int comm, bool handle_cas) {
    //解析 "set aaa 0 0 5" 指令，申请内存用来后续存放数据部分的内容， 注意这里只是申请没存，没做其他操作
    //处理完后悔设置连接的状态为conn_nread， 因此继续读取数据到指定地方, 这样节省了内存的拷贝
//----
    it = item_alloc(key, nkey, flags, realtime(exptime), vlen);
//-----
}
item *item_alloc(char *key, size_t nkey, int flags, rel_time_t exptime, int nbytes) {
    item *it;
    /* do_item_alloc handles its own locks */
    it = do_item_alloc(key, nkey, flags, exptime, nbytes, 0);
    return it;
}

而do_item_alloc去分配新item的时候，为了效率，这里会有一些优化措施。它首先计算存进来的数据需要多大的内存空间，从而用slabs_clsid(ntotal) 函数计算应该放到哪个slab类里面；

然后顺便进行了最多5次扫描，从该类slab的尾部倒序扫描，看有没有要过期的数据，如果有，那就果断复用这个过期的item，从而既达到了懒惰清除过期key的效果，又搞笑的使用了已经申请的内存。看如下代码：

item *do_item_alloc(char *key, const size_t nkey, const int flags,
                    const rel_time_t exptime, const int nbytes,
                    const uint32_t cur_hv) {
    //申请一个item，方法有：
    //1. 从当前这个slabs_clsid里面扫描最旧的item，看有没有过期的；
    //2. 用slabs_alloc 申请一个新的；
    //3. 如果slabs_alloc 申请失败，且evict_to_free开关打开了(默认打开)，就强制删掉一个最旧的
    uint8_t nsuffix;
    item *it = NULL;
    char suffix[40];
    //算一下需要多长的内存，用来决定slab的桶. 大小为： sizeof(item) + nkey+1 + " %flags %nbytes-2\r\n" + nbytes
    //其实就是把命令转为字符串
    size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix);
    if (settings.use_cas) {
        ntotal += sizeof(uint64_t);
    }

    unsigned int id = slabs_clsid(ntotal);//根据数据大小，找一个合适的slab槽
    int tries = 5;
    search = tails[id];
    /* We walk up *only* for locked items. Never searching for expired.
     * Waste of CPU for almost all deployments */
    for (; tries > 0 && search != NULL; tries--, search=search->prev) {//从这个slab链表的尾部扫描，看有没有要过期的，正好用它
//-----------
        /* Expired or flushed */
        if ((search->exptime != 0 && search->exptime < current_time)
            || (search->time <= oldest_live && oldest_live <= current_time)) {//找到了一个已经过期的it了，准备用它
            itemstats[id].reclaimed++;
            if ((search->it_flags & ITEM_FETCHED) == 0) {
                itemstats[id].expired_unfetched++;
            }
            it = search;
            slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal);//更新统计数据
            do_item_unlink_nolock(it, hv);//从hash和slab链表里面都删除这个it
            /* Initialize the item block: */
            it->slabs_clsid = 0;//临时清空一下，下面还是会设置为id的
        } else if ((it = slabs_alloc(ntotal, id)) == NULL) {
            //这里很巧妙，如果申请成功了，自然不会进去，如果失败了，就进去用当前这个item，清空他，然后使用
            tried_alloc = 1;
            if (settings.evict_to_free == 0) {
                itemstats[id].outofmemory++;
            } else {//干掉一个本slabs_clsid里面最旧的item
                itemstats[id].evicted++;
                itemstats[id].evicted_time = current_time - search->time;//记录我回收到了几秒之前访问的数据了，用来评估回收的程度
//------------------
}

三、执行flush_all 命令的时候，do_item_flush_expired 懒惰清空所有数据

flush_all 指令专门用来清空所有的数据，最开始这个指令没有参数，由于这个操作比较重，后来就增加了一个时间戳的整数参数，用来控制清空的即时性，或者说提高flush_all的响应时间。这个时间默认是current_time-1秒，会设置到settings.oldest_live 上面。处理函数是item_flush_expired 。

void item_flush_expired() {
    //处理flush_all指令，用懒惰机制清空所有数据
    mutex_lock(&cache_lock);
    do_item_flush_expired();
    mutex_unlock(&cache_lock);
}
/* expires items that are more recent than the oldest_live setting. */
void do_item_flush_expired(void) {
    //处理fflush_all 指令，功能是把oldest_live之后访问的数据立即删掉，
    //再老的数据不管，因为等到do_item_alloc需要分配新item时，会复用那些老于oldest_live 的数据的
    //这样就保证了速度了，所以如果想立即'清空'所有数据, 那就不要带个1或者很小的参数，这样都会调用到do_item_unlink_nolock
    //否则设置一个很大的时间戳或者不设置，这样就不会真正去do_item_unlink_nolock所有数据了，而是采用懒惰做法。
    //注意就算设置了1的参数，unlink了所有数据，实际上memcache不会真正清空内存给操作系统，知识放到自己的空闲item列表里面
    int i;
    item *iter, *next;
    if (settings.oldest_live == 0)
        return;
    for (i = 0; i < LARGEST_ID; i++) {
        /* The LRU is sorted in decreasing time order, and an item's timestamp
         * is never newer than its last access time, so we only need to walk
         * back until we hit an item older than the oldest_live time.
         * The oldest_live checking will auto-expire the remaining items.
         */
        for (iter = heads[i]; iter != NULL; iter = next) {
            /* iter->time of 0 are magic objects. */
            if (iter->time != 0 && iter->time >= settings.oldest_live) {
                next = iter->next;
                if ((iter->it_flags & ITEM_SLABBED) == 0) {
                    do_item_unlink_nolock(iter, hash(ITEM_key(iter), iter->nkey));
                }
            } else {
                /* We've hit the first old item. Continue to the next queue. */
                break;
            }
        }
    }
}

从上面的代码中可以看出，实际上do_item_flush_expired 并没有真的清空所有数据，相反，他只是清空过来最近的一部分数据，然后依靠其他时机去用标记的时间oldest_live清理数据。

上的代码是什么意思呢？实际上只清空了访问时间在settings.oldest_live 以后的数据，再老的数据依靠访问时，或者申请内存时进行清理，具体看注释。由此联想到上面第一，二点中，关于 settings.oldest_live 的检查逻辑了。

四、lru_crawler enable 进行后台爬虫扫描过期

有了上面的机制，基本能保证需要申请内存的时候，只要有过期的内存，我能申请到，那么，如果非要立即能清理过期的数据怎么办呢？答案是：lru_crawler指令。

lru_crawler enable 命令或者启动时指定参数，可以让memcached 启动后台crawler线程，也就是所谓的爬虫线程，运行函数是item_crawler_thread，但不会进行爬虫清理；lru_crawler crawl 命令可以让memcached 立即进行爬虫清理，这个命令可以设置我要清理的是哪个slab class；

来看一下crawl 参数触发的lru_crawler_crawl 函数：

enum crawler_result_type lru_crawler_crawl(char *slabs) {
    //对参数指定的slabs的类进行爬虫扫描，告诉后台线程item_crawler_thread 去清理过期数据
    char *b = NULL;
    uint32_t sid = 0;
    uint8_t tocrawl[POWER_LARGEST];//需要清理扫描的大小类
    //注意上面变了数组没有初始化，下面肯定有问题的。不过1.4.24版本已经修复这问题了.  memset(tocrawl, 0, sizeof(uint8_t) * MAX_NUMBER_OF_SLAB_CLASSES);
    if (pthread_mutex_trylock(&lru_crawler_lock) != 0) {
        return CRAWLER_RUNNING;
    }
    pthread_mutex_lock(&cache_lock);

    if (strcmp(slabs, "all") == 0) {
        for (sid = 0; sid < LARGEST_ID; sid++) {
            tocrawl[sid] = 1;
        }
    } else {
        for (char *p = strtok_r(slabs, ",", &b);
             p != NULL;
             p = strtok_r(NULL, ",", &b)) {

            if (!safe_strtoul(p, &sid) || sid < POWER_SMALLEST
                    || sid > POWER_LARGEST) {
                pthread_mutex_unlock(&cache_lock);
                pthread_mutex_unlock(&lru_crawler_lock);
                return CRAWLER_BADCLASS;
            }
            tocrawl[sid] = 1;//为什么不初始化？如果不是0怎么办
        }
    }

    for (sid = 0; sid < LARGEST_ID; sid++) {
        if (tocrawl[sid] != 0 && tails[sid] != NULL) {
            if (settings.verbose > 2)
                fprintf(stderr, "Kicking LRU crawler off for slab %d\n", sid);
            crawlers[sid].nbytes = 0;
            crawlers[sid].nkey = 0;
            crawlers[sid].it_flags = 1; /* For a crawler, this means enabled. */
            crawlers[sid].next = 0;
            crawlers[sid].prev = 0;
            crawlers[sid].time = 0;
            crawlers[sid].remaining = settings.lru_crawler_tocrawl;//这次要清理这么多
            crawlers[sid].slabs_clsid = sid;
            crawler_link_q((item *)&crawlers[sid]);
            crawler_count++;
        }
    }
    pthread_mutex_unlock(&cache_lock);
    pthread_cond_signal(&lru_crawler_cond);//唤起item_crawler_thread
    STATS_LOCK();
    stats.lru_crawler_running = true;
    STATS_UNLOCK();
    pthread_mutex_unlock(&lru_crawler_lock);
    return CRAWLER_OK;
}

这里实际上使用信号量进行线程间通信的，用来告诉爬虫线程，你要爬去的数据在crawlers[sid]结构里面，因此爬虫线程便会按照参数去扫描是否有过期的item。执行函数是item_crawler_thread 。

static void *item_crawler_thread(void *arg) {
    //lru_crawler enable 命令，或者启动时指定crawler
    //start_item_crawler_thread创建线程，执行这里，进行后台定期清理过期key
    int i;

    pthread_mutex_lock(&lru_crawler_lock);
    if (settings.verbose > 2)
        fprintf(stderr, "Starting LRU crawler background thread\n");
    while (do_run_lru_crawler_thread) {
        pthread_cond_wait(&lru_crawler_cond, &lru_crawler_lock);//等待lru_crawler_crawl，stop_item_crawler_thread的指令，这里不会自动运行下去，得手动触发

        while (crawler_count) {//是tocrawl里面有效的slabs组个数，也就是crawler_link_q 的长度
            item *search = NULL;
            void *hold_lock = NULL;

            for (i = 0; i < LARGEST_ID; i++) {//轮着一个个来，知道轮完所有的,类似归并的感觉
                if (crawlers[i].it_flags != 1) {
                    continue;
                }
                pthread_mutex_lock(&cache_lock);
                search = crawler_crawl_q((item *)&crawlers[i]);//存进去的是一个跟item类似的crawler 结构体
                if (search == NULL ||
                        (crawlers[i].remaining && --crawlers[i].remaining < 1)) {
                    if (settings.verbose > 2)
                        fprintf(stderr, "Nothing left to crawl for %d\n", i);
                    crawlers[i].it_flags = 0;
                    crawler_count--;//终于轮完一个了
                    crawler_unlink_q((item *)&crawlers[i]);
                    pthread_mutex_unlock(&cache_lock);
                    continue;
                }
                uint32_t hv = hash(ITEM_key(search), search->nkey);
//----
                item_crawler_evaluate(search, hv, i);
//----
}

item_crawler_thread 函数用类似堆排序的轮流扫描的机制来均匀的清理不同class的数据，将其返回到可用的slabs item列表中以备复用内存。

真正的清理函数是item_crawler_evaluate，函数很简单，判断 oldest_live 和exptime 是否能匹配，如果能，那就调用do_item_unlink_nolock 解除item的关联关系，并且do_item_remove 返回内存给slab管理器。

static void item_crawler_evaluate(item *search, uint32_t hv, int i) {
    //item_crawler_thread定期清理过期的key
    rel_time_t oldest_live = settings.oldest_live;
    if ((search->exptime != 0 && search->exptime < current_time)
        || (search->time <= oldest_live && oldest_live <= current_time)) {
        itemstats[i].crawler_reclaimed++;

        if (settings.verbose > 1) {
            int ii;
            char *key = ITEM_key(search);
            fprintf(stderr, "LRU crawler found an expired item (flags: %d, slab: %d): ",
                search->it_flags, search->slabs_clsid);
            for (ii = 0; ii < search->nkey; ++ii) {
                fprintf(stderr, "%c", key[ii]);
            }
            fprintf(stderr, "\n");
        }
        if ((search->it_flags & ITEM_FETCHED) == 0) {
            itemstats[i].expired_unfetched++;
        }
        do_item_unlink_nolock(search, hv);//减除关联关系
        do_item_remove(search);//返回内存给slab管理器
        assert(search->slabs_clsid == 0);
    } else {
        refcount_decr(&search->refcount);
    }
}

由上可知，memcached的key过期机制不是LRU，而是LU而已，并且其懒惰过期机制性能高效，基本能做到按需过期，懒惰过期，并且提供flush_all和crawl 爬虫的指令让管理员可以手动强制的进行过期清理。

不过需要注意的是，memcached到此为止实际上清理的内存并没有返回给操作系统，而是由slabs_free 函数返回给了slab管理器的可用item列表中了，以备后续使用。后面再介绍下返回过去的内存，memcached是怎么管理的，如何回收和整理内存的。

分类: Memcached 标签:

2025 年 2 月 15 日《灰度决策》读书笔记 (0)

评论 (0) Trackbacks (0) 发表评论 Trackback

本文目前尚无任何评论.

本文目前尚无任何 trackbacks 和 pingbacks.

Redis源码学习-AOF数据持久化原理分析(1) memcached LRU过期机制lru_crawler指令可能会错误的清理不应该的slabs的bug

趁着年轻