LRU Cache

Problem

Source: LRU Cache

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and set.

  • get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.

  • set(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Idea

  • Use HashMap to cache key and value.
  • Use PriorityQueue to mark LRU item as the hottest. Drop coldest element when over capacity.

A Java Solution

class Item {
  int key;
  int value;
  Item prev;
  Item next;
  public Item(int key, int value) {
    this.key = key;
    this.value = value;
    this.prev = null;
    this.next = null;
  }
}
class LRUList {

  Item head = null, tail = null;
  int current_capacity = 0;

  public LRUList() {
    head = new Item(-1, -1);
    tail = new Item(-1, -1);
    head.next = tail;
    head.prev = null;
    tail.next = null;
    tail.prev = head;
  }

  public void update(Item item) { // SHIT: public update(Item item)
    Item orig_prev = item.prev;
    Item orig_next = item.next;
    orig_prev.next = orig_next;
    orig_next.prev = orig_prev;
    this.insert(item);
  }

  public void insert(Item item) {
    Item first = head.next;
    head.next = item;
    item.prev = head;
    first.prev = item;
    item.next = first;
  }

  public void delete(Item item) {
    Item orig_prev = item.prev;
    Item orig_next = item.next;
    orig_prev.next = orig_next;
    orig_next.prev = orig_prev;
    item = null;
  }

  public void deleteTail() {
    this.delete(tail.prev);
  }
}

public class LRUCache {
  Map<Integer, Item> map;
  LRUList list;
  int total;
  int capacity;

  public LRUCache(int capacity) {
    this.map = new HashMap<Integer, Item>(capacity);
    this.list = new LRUList();
    this.total = 0;
    this.capacity = capacity;
  }

  public int get(int key) {
    Item item = this.map.get(key);
    if (item == null) {
      return -1;
    }
    this.list.update(item);
    return item.value;
  }

  public void set(int key, int value) {
    Item item = this.map.get(key);
    if (item == null) {
      item = new Item(key, value);
      this.list.insert(item);
      this.map.put(key, item);
      this.total++;
      if (this.total > this.capacity) {
        Item last = this.list.tail.prev;
        this.list.delete(last);
        this.map.remove(last.key);
        this.total--;
      }
    } else {
      item.value = value;
      this.list.update(item);
    }
  }


}

Scaffold

It’s difficult to participate in Clojure’s abstraction for lacking of documentations. Anyway, a helper function from clojure programming is really useful for us to dive into it.

(defn scaffold
"Given an interface, returns a 'hollow' body suitable for use with `deftype`." [interface]
(doseq [[iface methods] (->> interface
.getMethods
(map #(vector (.getName (.getDeclaringClass %))
(symbol (.getName %))
(count (.getParameterTypes %)))) (group-by first))]
(println (str " " iface))
(doseq [[_ name argcount] methods]
(println
(str " "
(list name (into '[this] (take argcount (repeatedly gensym)))))))))

By looking at the output of the ancestors and methods of them, we can determine what to do next:

user=> (scaffold clojure.lang.IPersistentMap)
 clojure.lang.IPersistentMap
 (assoc [this G__809 G__810])
 (without [this G__811])
 (assocEx [this G__812 G__813])

 java.lang.Iterable
 (iterator [this])

 clojure.lang.Associative
 (entryAt [this G__814])
 (assoc [this G__815 G__816])
 (containsKey [this G__817])

 clojure.lang.IPersistentCollection
 (empty [this])
 (equiv [this G__818])
 (count [this])
 (cons [this G__819])

 clojure.lang.Seqable
 (seq [this])

 clojure.lang.ILookup
 (valAt [this G__820])
 (valAt [this G__821 G__822])

 clojure.lang.Counted
 (count [this])
nil

To implement a LRU Cache, we need to implement those methods. Some trival knowledge:

  • cons is the name that backs conj;
  • equiv is similar to equals, but ensures sane equivalence semantics when applied to numerics
  • There is two assoc definition, but only 1 is available. The one in clojure.lang.Associative will be replaced by clojure.lang.IPersistentMap. That means we only need to implement the one in clojure.lang.Associative.
  • Nothing ever uses assocEx anymore. It is a remnant of an earlier time. If you are writing your own map type, you can implement (assocEx [m k v] (throw (Exception.))).1

Solution

(defprotocol CacheProtocol
  "This is the protocol describing the basic cache capability."
  (lookup [cache e]
          [cache e not-found]
   "Retrieve the value associated with `e` if it exists, else `nil` in
   the 2-arg case.  Retrieve the value associated with `e` if it exists,
   else `not-found` in the 3-arg case.")
  (has?    [cache e]
   "Checks if the cache contains a value associated with `e`")
  (hit     [cache e]
   "Is meant to be called if the cache is determined to contain a value
   associated with `e`")
  (miss    [cache e ret]
   "Is meant to be called if the cache is determined to **not** contain a
   value associated with `e`")
  (evict  [cache e]
   "Removes an entry from the cache")
  (seed    [cache base]
   "Is used to signal that the cache should be created with a seed.
   The contract is that said cache should return an instance of its
   own type."))

(deftype LRUCache
  [cache lru limit]

  clojure.lang.ILookup
  (valAt
   [this key]
   (lookup this key))
  (valAt
   [this key not-found]
   (if (has? this key)
     (lookup this key)
     not-found))

  clojure.lang.IPersistentMap
  (assoc [this key value]
    (miss this key value))
  (without [this key]
    (evict this key))

  clojure.lang.Associative
  (containsKey
   [this key]
   (has? this key))
  (entryAt
   [this key]
   (when (has? this key)
     (clojure.lang.MapEntry. key (lookup this key))))

  clojure.lang.IPersistentCollection
  (cons
   [this elem]
   (seed this (conj cache elem)))
  (empty
   [this]
   (seed this (empty cache)))
  (equiv
   [this other]
   (= other cache))

  clojure.lang.Seqable
  (seq [this]
       (seq cache))

  CacheProtocol
  (lookup
   [this key]
   (get cache key))
  (lookup
   [this key not-found]
   (get cache key not-found))
  (has?
   [this key]
   (contains? cache key))
  (hit
   [this key]
   (LRUCache. cache
              (if (contains? cache key)
                (for [k lru :when (not= k key)] k)
                lru)
              limit))
  (miss
   [this key value]
   (if (>= (count lru) limit)
     (let [miss-key (if (some #{key} lru)
                      key
                      (last lru))]
       (println (-> cache (dissoc miss-key) (assoc key value)))
       (LRUCache. (-> cache (dissoc miss-key) (assoc key value))
                  (conj (for [k lru :when (not= k miss-key)]  k) key)
                  limit))
     (LRUCache. (assoc cache key value)
                (conj lru key)
                limit)))
  (evict
   [this key]
   (let [v (get cache key ::miss)]
     (if (= v ::miss)
       this
       (LRUCache. (dissoc cache key)
                  (for [k lru :when (not= k key)] k)
                  limit))))
  (seed
   [this key]
   (LRUCache. cache
              (clojure.lang.PersistentQueue/EMPTY)
              limit))

  Object
  (toString
   [this]
   (str cache \, \space lru \, \space limit)))

(defn lru-cache-factory
  [limit]
  (LRUCache. {} (clojure.lang.PersistentQueue/EMPTY) limit))

Improvements

  • LRU remove: O(N). A better solution is using PriorityDataStructure to reduce to O(1).
  • Use Memcached or Redis to replace clojure map and list for better scalable.
  • Why not rebuild a wheel? https://github.com/clojure/core.cache/wiki/LRU