Before talking about Least Recently Used (LRU) cache and Least Frequently Used (LFU) cache, let us do a quick review of another standard linear structure: linked list. Compared with the array, which needs a contiguous block in memory to store data, a linked list makes use of scattered blocks of memory connected with pointers.

Applying for an array of large size brings high requirements since there may not be enough contiguous storage area in memory, even though the total amount of free space left is large enough. This awkward situation can be correctly handled by a linked list, which stores data in scattered blocks and links them together.

Singly Linked List (SLL)

SLL has the simplest form: each node in SLL has a data field and a next pointer which points to the address of the next node. By convention, we call the first node of an SLL head and the last node tail. We can iterate through the entire SLL starting from its head and the next pointer of tail points to an empty address nullptr.

  • insert or delete: O(1)
  • random access or find: O(n)

Doubly Linked List (DLL)

DLL is more commonly used in practice, although it requires extra memory. In each node, it provides two pointers: next and prev, to help us move forward/backward conveniently. By construction, we can find the previous node in O(1) time with DLL compared to O(n) of SLL.

  • insert or delete: O(1)
  • random access or find: O(n)

Explanation about insertion and deletion: Not actual O(1) time complexity

It seems the time complexity of DLL is the same as SLL, but why would we bother using DLL? This is because of the gap between theory and practice. Let us zoom into the deletion first, and the analysis for insertion is similar.

In real practice, deleting a node from a linked list falls in two cases:

  • deleting a node if it contains data whose value matches a target value
  • deleting a node given the pointer that points to the node

In the first case, both SLL and DLL will take O(n) time to find the node first and then delete it in O(1) time. Therefore, it will cost O(n) time in total. However, in the second case, things become different for SLL and DLL. We already have the pointer that points to the target node. In order to delete it successfully, we still need to know its previous node. For SLL, this takes O(n) lookup time, but for DLL this only takes O(1) since it always has a prev pointer to complete this task conveniently.

In the same sense, if we aim to insert a node before some target node, DLL will only cost O(1) time while SLL still costs O(n). This is the reason that DLL is widely used in practice in spite of the extra usage of memory.

When to use an array or linked list?

Array and linked lists have their advantages and disadvantages, and we should never determine which one to use only based on their complexity analysis. It depends on the specific situation in different projects. If the project we are working on has limited memory resources, then an array is more suitable since the linked list costs extra memory to store pointers. Additionally, frequent insertions or deletions in linked list result in a significant amount of memory allocation and release, which can easily cause a lot of memory fragmentation.

Trading space for time: LRU/LFU cache

The above discussion on DLL reflects one important design idea; that is, we are sacrificing space to improve time efficiency. The cache is using the same approach, which stores information of recently used or frequently used data in a memory location (costs memory space) while it makes data queries very fast and efficient (save time). Just remember that, in some application scenarios where memory is limited, we also need to trade time for space, e.g., development of mobile apps.

We can implement a simple LRU cache algorithm by maintaining a DLL. In fact, we are using std::list of C++ STL here. We assume the head node stores the most recently accessed data, while the tail node contains the oldest data information. It supports the following two operations:

  • Insertion: every time a new data item comes in, we are trying to insert it at the beginning of the SLL and make it be the new head
    • if the data item is already in the cache, we move it to the beginning of the list
    • else if there is free space in the cache, we directly insert a new node of the data item at the beginning of the list
    • else we delete the tail node first and then insert a new node of the data item at the beginning of the list
  • Look up:
    • if the target data item is not in the cache, then return
    • else we return the data value and move the corresponding node to the beginning of the list since it contains the most recently used data

Note that in order to decrease the time complexity for insertion and look up to O(1), we need to use hash table to store node information, which helps to locate the node in constant time; otherwise both of the operations will cost O(n) due to the nature of a linked list. Following is a simple implementation in C++, and in fact, it is one great LeetCode problem, see LRU Cache.

class LRUCache {
public:
    LRUCache(int cap) {
        capacity = cap;
    }
    
    int get(int key) {
        if (!key_to_iter.count(key)) return -1;
        auto& it = key_to_iter[key];
        key_value_list.push_front({it->first, it->second});
        key_value_list.erase(it);
        key_to_iter[key] = key_value_list.begin();
        return key_value_list.front().second;
    }
    
    void put(int key, int value) {
        if (key_to_iter.count(key)) {
            key_value_list.erase(key_to_iter[key]);
        } else if (key_value_list.size() == capacity) {
            key_to_iter.erase(key_value_list.back().first);
            key_value_list.pop_back();
        }
        key_value_list.push_front({key, value});
        key_to_iter[key] = key_value_list.begin();
    }
    
private:
    int capacity;
    list<pair<int, int>> key_value_list;
    unordered_map<int, list<pair<int, int>>::iterator> key_to_iter;
};

LFU Cache is another popular choice among the cache replacement strategies. Instead of just removing the least recently used data item, it makes more sense since it keeps a record of how many times a data item is queried in the history based on which the least frequently used one is removed.

Naturally, the implementation of LFU cache is a little bit more complex than that of LRU. There is no need to explicitly maintain a linked list to store all the data items this time since the order does not mean anything. However, the order matters among the data items of the same frequency because it is the least recently used item that would be evicted. Hence, we need to use a list to store the data items which have the same frequency in the past. The corresponding LeetCode problem is LFU Cache. It puts a forward higher requirement that both insertion and lookup take O(1) time complexity. No surprise, we have to resort to the hash table and come up with a more sophisticated design. Following is an implementation of LFU Cache in C++:

class LFUCache {
public:
    LFUCache(int cap) {
        capacity = cap;
    }
    
    int get(int key) {
        if (!key_to_value_and_fre.count(key)) return -1;
        int fre = key_to_value_and_fre[key].second;
        fre_to_key_list[fre].erase(key_to_iter[key]);
        fre_to_key_list[fre+1].push_front(key);
        key_to_iter[key] = fre_to_key_list[fre+1].begin();
        key_to_value_and_fre[key].second++;
        if (fre_to_key_list[minFre].empty()) minFre++;
        return key_to_value_and_fre[key].first;
    }
    
    void put(int key, int value) {
        if (capacity <= 0) return;
        if (get(key) != -1) {
            key_to_value_and_fre[key].first = value;
            return;
        } else if (key_to_value_and_fre.size() == capacity) {
            int delete_key = fre_to_key_list[minFre].back();
            key_to_iter.erase(delete_key);
            key_to_value_and_fre.erase(delete_key);
            fre_to_key_list[minFre].pop_back();
        }
        key_to_value_and_fre[key] = {value, 1};
        fre_to_key_list[1].push_front(key);
        key_to_iter[key] = fre_to_key_list[1].begin();
        minFre = 1;
    }

private:
    int capacity;
    int minFre;
    unordered_map<int, pair<int, int>> key_to_value_and_fre;
    unordered_map<int, list<int>> fre_to_key_list;
    unordered_map<int, list<int>::iterator> key_to_iter;
};

In the above code:

  • key_to_value_and_fre maps each key to a {value, frequency} pair
  • fre_to_key_list maps frequency to a list of data items that are queried with this number of times in the past, and the newest queried item is inserted at the beginning of the lists
  • key_to_iter maps each key to its iterator in the list fre_to_key_list[key] (we have to do this to achieve O(1) time complexity in locating a key in a list)

 


- End -