Digital Magpie

Ooh, ooh, look - shiny things!

Squeeze!

One of the neat features of Clojure is the sequence abstraction — it makes solving a whole host of data processing tasks much easier, simply get you data into a sequence and you’ve got a huge toolbox available to work on it. Of course being a guy I’m firmly of the belief that more tools are better, with that in mind let’s add another one to our toolbox. Given a sequence the squeeze function returns another sequence with any adjacent items which match a supplied predicate merged together using a supplied function. It’s probably easier to illustrate by example, suppose I have a sequence of strings and I want to merge them together when the trailing string starts with whitespace, I can squeeze them like this:

1
2
3
(squeeze #(and %2 (re-matches #"\A\s.*" %2))
         #(apply str (apply concat %&))
         ["hello" " world." "foo" " bar"])

Another example, given a sequence of characters (read from an InputStream for example), I could group them into words by squeezing then thusly (the first line is just to remind you that calling seq on a string produces a sequence of characters):

1
2
3
4
5
6
7
8
user=> (seq "Cheers, chars!")
(\C \h \e \e \r \s \, \space \c \h \a \r \s \!)

user=> (map str/trim
         (squeeze #(and %2 (not= \space %2))
                  #(apply str (apply concat %&))
                   (seq "I'm sorry Dave, I can't let you do that.")))
("I'm" "sorry" "Dave," "I" "can't" "let" "you" "do" "that.")

So how does it work? Well, here’s the interface:

1
2
3
(defn squeeze
  [pred merge-fn coll]
  (squeeze- pred merge-fn coll nil))

And here’s the actual function that does the work, it’s declared private because I don’t want to expose the matched parameter to the outside world.

1
2
3
4
5
6
7
8
9
10
11
(defn- squeeze-
  ([pred merge-fn coll matched]
    (lazy-seq
      (when-let [s (seq coll)]
        (let [f (first s)
              s (second s)
              rest (rest coll)]
          (if (pred f s)
            (squeeze- pred merge-fn rest (cons f matched))
            (let [next (if matched (merge-fn (cons f (reverse matched))) f)]
              (cons next (squeeze- pred merge-fn rest nil)))))))))

I should probably point out that all of this playing around with sequences was inspired by Sean Devlin’s excellent proposal for some new sequence functions for Clojure 1.2. The full code for this is available here (it’s just the above, but with an added doc comment on the squeeze function definition).