Samson's Machete

8 essential patterns you should know about functional programming in C++14

Posted by NalaGinrut 31 October 2019 9:14 AM

To C++ folks, if the Functional Programming is still an academic theory or confusing concept to you, then you're out. It's already used a lot in our daily product development today. I'd like to share you 8 essential patterns to help you grab this powerful weapon quickly in a practical way. It's not hard, and I hope you're familiar with C++14 features.

This article is not for mathematicians, or any deeper PLT (Programming Language Theory) researchers. It's for you, daily professional programmers.

"Hey, if it's yet another math-textbook-copying article to waste my time, I'll close it at once!"

I promise no. I'll show you standard C++ code, not formulas. Let's talk about functional programming like a real daily programmer. The aim is to help your daily programming, not for researching PLT. If some important math concepts are not so useful in real programming experiences, then I'll skip them in this article.

Please find you a C++14 compiler, I'd recommend GCC-6.0+ or Clang-7.0+.

No, you don't need any pre-knowledge of Haskell, or Lisp/Scheme.

The provided example code is not optimized for product, on the contrary, I intended wrote them without further optimizations for better understanding. Anyway, I hope you get the idea, not just copy to your work.

Before we start

I know there're many articles tried to explain some basics about functional programming. I'm not going to judge any of them in this article, however, if you find my explanation looks different from other articles, don't be surprised. ;-)

Unfortunately, we have to learn a little about Category Theory although you may dislike mathematics. Fortunately, I'm here to give you just one line explanation:

In practical daily programming, we mainly care about FinSet which is a subset of Category, a typical FinSet is an array or list.

OK, that's enough for a non-CS background programmer. You don't even need to read FinSet on Wikipedia (it's useless even you do).

Easy, right?

So why it's so important? Well, it's important because you can "learn once and do it everywhere", say, the ability of higher-level abstraction. Of course, you don't have to use it, however, people still wants to gain more expressiveness for productivity. That's why people always have been trying to study the higher-level abstraction.

(The Category is also used to abstract a database schema, which is out of the scope of this article.)

Closure

In functional programming, the most basic concept is closure.

"Wait, I've heard it should be lambda!"

Closure in functional programming has nothing to do with the closure in automata theory.

Closure can be thought as the technical implementation of the mathematical concept "lambda". Usually, it contains two essential parts, "environment" and "entry", if you're from the imperative programming world, I may replace them with "symbol table of the function" and "entry pointer of the function". Enough, you don't need more CS for closure here anymore.

I guess you've already learned how to use lambda in C++, it's the simplest concept in this article. If you don't, then I'll give you an example:

#include <functional>
using fn_t = std::function<int(int)>;
fn_t func(int x)
{
  return [=](int y){ return x + y; };
}

"Hey, I know it's curried function, and it could be done with std::bind!"

OK, OK, you know more. But it's just an example for showing closure.

Anyway, the point is that you can create/pass/return a first-class-function. That's important for a functional programming featured language.

Lazy

Lazy means delay to evaluate the result till it's necessary.

First we introduce thunk. A thunk is nullary function, say, a function without any parameter:

using thunk_t = std::function<int(void)>;

The return type is trivial, the point is "nullary".

Why the "nullary" matters? You have closure, so you can capture the values you need, parameters are unnecessary for you. You may realize that thunk may help you to unify the interface.

Why the thunk matters for lazy? In a brief, the thunk captures the value you need, but it never compute the function body until you call it. This feature helps you to implement lazy evaluation.

#include <functional>
#include <iostream>
int main()
{
  int x = 5;
  auto thunk = [x](){
                 std::cout << "now it run" std::endl;
                 return x + 10;
               };

  std::cout << "Thunk will not run before you run it" << std::endl;
  std::cout << thunk() << std::endl;
  return 0;
}

Lazy is sometimes useful. Years ago, I've ever written a message-passing framework in product. And I used lazy for performance. Usually, you create a message by encapsulating the computed result. However, some messages may require bigger computation, unfortunately, the receiver may drop the messages, then you wasted the computation. If you use lazy, the computation occurs when you called the thunk iff the message was not dropped.

Map (as high order function)

First, the explanation of Map procedure seems largely confusing by Haskell programming language, because Haskell has fmap which is mapping between Categories of Functors. However, it doesn't mean every kind of Map can be used to map between Functors. The general Map is only guaranteed to map between Categories by its own usage. And not all Categories are Functors.

I'm not here to blame Haskell, but you should aware of the non-trivial difference between Map and fmap, in case you lost yourself in my article.

Personally, I like the alternative name in C++, std::transform, which is easier for newbies to understand what it's used for. Obviously, it transforms an iterate-able data structure to another by given transform function. This is exactly what we need to construct a Functor class, I'll explain it later.

#include <algorithm>
#include <iostream>
#include <list>
int main()
{
  std::list<int> c = {1, 2, 3};
  std::transform (c.begin (), c.end (), c.begin (), 
                  [](int i){ return '0' + i; });
  for(char i : c) std::cout << i << std::endl;
  return 0;
}

Functors

"You said Array is a Category, but I've heard that an Array is a functor!"

Not exactly, ADT (abstract data type) could be a Functor under certain condition, but Array is usually referred to a data structure which is not a mapping at all. ADT is unnecessary to be a Functor, if you're really interested in this topic, please read <<When is an abstract data type a functor?>> from Pablo Nogueira.

In mathematics, a Functor is a special map from one Category to another. "Special" here means its mathematical-structure MUST be preserved after the mapping. This could be even easier, if two Categories are both FinSet, then the Functor between them is actually a Function. Remember what I said previously? Most containers you use in daily programming are FinSet.

So if you try to convert an array to another array by a certain function, you're actually constructing a Functor. If you can abstract you operation to be more general, you've done a Functor class.

There're many code examples to construct a Functor class if you Google it. However, some code examples are a bit outdated. Let me share you how to do it more elegantly with Lambdas.

Here's our Functor class:

// functor.hh
#include <algorithm>
#include <functional>
using namespace std;
template <class from, class to> class Functor
{
public:
  Functor (function<to (from)> op) : op_ (op){};
  ~Functor (){};
  template <class T> T operator()(T c)
  {
    transform (c.begin (), c.end (), c.begin (), op_);
    return c;
  };
private:
  function<to (from)> op_;
};

In theory, this Functor class should map any Category to another, depends on the given surjection between objects, say, the /op/ function here.

Then let's try to map a list:

// test1.cpp
#include <iostream>
#include <list>
#include <array>
#include "functor.hh"


using namespace std;


int main ()
{
  list<int> a = {1, 2, 3};

  // <int, int> implies "int -> int" type annotation
  auto f = Functor<int, int> ([](int x) { return 2 * x; });
  auto g = Functor<int, int> ([](int x) { return 10 * x; });
  auto z = Functor<int, int> ([](int x) { return x + 1; });
  // Function composition preseving
  auto result1 = g(f(a));
  auto result2 = f(g(a));
  cout << "Check: " << (result2==result1?"yes":"no") << endl;
  // We use Functor p to print out the final result
  auto p = Functor<int, int> ([](int x) {
                              cout << x << endl;
                              return x;
                             });
  p(result1);
  p(result2);
  return 0;
}

You may uncomment the array line to checkout the array as well. The Functor class is general enough for the infinite enumerable collections which are the so-called FinSet, as we mentioned previously in this article, a special subset of general Category. In fact, it should cover any enumerable data structure which provides iterator. I'll leave it to you to polish it for product-level development.

You may realize that result1 equals to result2, that's the so-called "function composition", say, after Functor mapping, the order of calling functions is independent.

You may realize that function g(z(a)) != z(g(a)), however, you may add a transformation to let them equivalent. In math, such kind of transformation is called natural transformation.

Exercise: Try a "Byte -> char" Functor to map an integer collection to chars, and filter out all the non-printing chars. HINT: first you may need to define a Byte class.

Monads

Many years ago, the most confusing, powerful concept in functional programming is Continuation. But nowadays, the hottest one is Monad.

Both of them are not easy to understand. Oh well, now that it's so hot...

First thing you need to know is that Monad in programming is different from Monad in math, although they're related. That's a good news, since you don't have to learn math to understand it.

Let me ask you a question:

using vp = std::shared_ptr<int>;
using box_t = struct box { vp val; };
using box_tp = std::shared_ptr<box_t>;

int sum(box_tp x, box_tp y, box_tp z)
{
  int xv = (x && x->val)? *x->val : 0;
  int yv = (y && y->val)? *y->val : 0;
  int zv = (z && z->val)? *z->val : 0;
  return xv + yv + zv;
}

Do you see the problem? Each time you define a function to use box_tp, you have to check the null pointer for every argument. Unless you want to let users check it. All these validations are tiny but non-trivial code snippet, and they're in a certain repeated form.

"It sounds like what Maybe Monad does in Haskell!"

Good, that's the point.

We expect such kind of Macros in C++:

/* Fake Code! */
#define SUM(v1, v2, ...) sum(check(v1), check(v2), ...)

So that it will expand recursively to add "check" to each argument. What an elegant language!

"I know Scheme macros can do that!"

Hmm...but your boss knows nothing about Scheme. Let's back to C++.

BTW, this kind of repeated but non-trivial code snippet is called *boilerplate code*. Monads can help to eliminate them elegantly. Monads were used to embed an object into an object with a richer structure. The critical point is that such kind of embedding will not change the original semantics, and this is guaranteed by mathematics. This will be used to verify you program.

You may wonder how to eliminate, it's very easy: wrap the boilerplate code into a Functor then bind it to the function you want to apply. Please remember that your function implementation doesn't require to do any validation anymore:

int sum2(int x, int y, int z)
{
  return x + y + z;
}
// box_tp -> int
auto check = [](box_tp b){ return ((p && p->val) ? *p->val : 0); };
auto safe_sum = BIND(check, sum2);

The purpose is to bind our preferred function sum2 to check function, so that each time when we call sum, the validation code was injected when applying the raw sum2 function, even if we pass a nullptr, it won't crash. Please notice that we wrote the validation code in check just once, no matter how many arguments need to validate.

"So what is BIND?"

Good question, hold my beer...

#include "functor.hh"
#include <functional>
#include <iostream>
#include <memory>
#include <vector>

using vp = std::shared_ptr<int>;
using box_t = struct box { vp val; };
using box_tp = std::shared_ptr<box_t>;
box_tp make_box (int x)
{
  box b{ std::make_shared<int> (x) };
  return std::make_shared<box_t> (b);
}


// An over simplified Monad, but it's enough to show the usage.
class Monad
{
public:
  Monad (){};
  ~Monad (){};
  /* I embedded the boilerplate nullptr validation code inside,
     so this Monad become a Maybe Monad.
     NOTE: Must be static for lambda capture.
   */
  static int Maybe(box_tp p){
    return ((p && p->val) ? *p->val : 0);
  }

  /*NOTE: The orginal >>= should pass Maybe as an argument but
          here we use Object-Oriented to use an static public
          method to define Maybe.
  */
  std::function<box_tp(box_tp, box_tp, box_tp)>
  operator>>= (std::function<int(int, int, int)> fn)
  {
    return [this, fn](box_tp x, box_tp y, box_tp z) {
             std::vector<box_tp> args = {x, y, z};
             // ugly, will be better in C++17 
             std::vector<int> safe_args = {0, 0, 0};
             std::transform (args.begin (), args.end (),
                             safe_args.begin (), Maybe);
             /* I know it's stupid to unwrap the safe_args here,
                but the more elegant std::apply appears in C++17.
                return std::apply(fn, args_in_tuple);
              */ 
             return make_box(fn (safe_args[0], safe_args[1],
                                 safe_args[2]));
           };
  };
};


int main ()
{
  auto m = Monad ();
  // The original function you defined
  auto sum = [](int x, int y, int z) { return x + y + z; };

  // Bind to the Monad for validation
  auto safe_sum = m >>= sum;

  auto a = make_box (1);
  auto b = make_box (2);
  auto c = make_box (3);

  /* We have to use Monad::Maybe to get the available result,
     because according to Monad laws, we have to make safe_sum
     return box_tp type either.
   */
  std::cout << "1 + 2 + 3 = "
            << Monad::Maybe(safe_sum (a, b, c))
            << std::endl;
  std::cout << "1 + 2 + nullptr = "
            << Monad::Maybe(safe_sum (a, b, nullptr))
            << std::endl;

  return 0;
}

Have you realized that I renamed bind to >>= which is the bind operator in Haskell? I guess you've seen it somewhere before.

Alright, I confess it's not so nice for general typing, but it's C++, there's only limited type-inference in C++14, and std::any is in C++17.

I have to tell you, the Monad in math is useless for daily programming, so you don't have to spend much time on its original math concept. (unless you love math just like me).

As I wrote in the comment, the nullptr validation code was inlined into Monad class, so it's actually a Maybe Monad. Although C++17 have added std::optional, I hope you can understand the basic principle of Monad: embedding boilerplate code by binding to a special pre-called function (Functor, actually).

This is not a complete Monad implementation, because it lacks of many mathematical attributes. All these attributes are critical for program verification. In a language like Haskell, all the attributes are implemented in the language. If your code can pass the compiling, then that means some code snippets were guaranteed to be correct by mathematics.

The Monad we talk in today's programming that is a refined and constrained concept from Category theory. Thanks Phil Wadler, his famous paper <<Comprehending Monads>> made this significant contribution. So that we can use Monad for enhancing our programming practice.

A Maybe Monad is the simplest case of Monad. I may write deeper article for Monad in the future. IMHO, the ability of Monad is very similar to Continuation, it can be used for parser, interpreter, control flow, exceptions handling, etc. If you're familiar with CPS (Continuation-Passing-Style), then you may realize that they're also the ability of Continuation. That is to say, both Monad and CPS can be used for IR (Intermediate Representation) in compiler development.

Exercise: It seems >>= can be implemented with Functor, and in Category theory, it should. Try to implement >>= with Functor described previously. You may need to consider to change the implementation of Functor a little.

Exercise: Actually, box_tp is a Monad too. From elegant perspective, it's better to define it properly so everything obey Monad laws. I spent too much time to write this article so that I don't want to write these code anymore. Do you want to challenge it?

Question: Do you think the current C++ (14 or higher) is good enough to do functional programming elegantly? Elegantly here means less redundant or confusing code.

To Mathematicians and Categorists:

1. "m >>= eta(x)" should be either "Monad::Maybe >>= [](box_tp x){ return x; }", the Monad laws are hold.

2. "m >>= sum(args)" is actually "m.bind(Monad::Maybe(x), f)", the Monad laws are hold.

3. We can composite Monads: "(m >>= sum(args)) >>= print(args)", this requires to improve the Monad class to be more complete. But I don't want to spend much time to write mathematical-elegant code in C++, so irritated...after all, C++ is an industrial engineering language.

Fold and Reduce

Wait, fold is not fold-expression which appears in C++17. And std::reduce appears in C++17. In this article, we only talk about C++14.

Fold and Reduce are very similar high-order-function, actually they're same function defined by different interfaces.

Fold is a little different from the Functor that we introduced, the Functor will give you another Category, but Fold usually give you a value. Of course, a Fold could be a Functor, depends on what you return finally.

In C++, it's called std::accumulate.

#include <iostream>
#include <numeric>
#include <vector>

int main()
{
  std::vector<int> v{1, 2, 3, 4, 5};
  int result = std::accumulate(v.begin(), v.end(), 0,
                              [](int x, int y){ return x + y; });
  std::cout << result << std::endl;
  return 0;
}

In this case, we compute 1+2+3+4+5. You may see the pattern, and use it in your cases.

Exercise: Modify the Functor class to implement a Fold-Functor.

Multple return values (MRV)

Originally, MRV is implemented for passing multiple values conveniently between continuations. Today, many languages are not continuation-based. So IMHO, MRV is only useful for compacted coding style. In Python, it's relative explicit:

def func():
 return 1, 2

a, b = func()

In C++, there's no syntactic MRV support. You have to use both std::tie and std::tuple:

#include <tie>
#include <iostream>
std::tuple<int, int> func()
{
  return {1, 2};
}

int main()
{
  int a, b;
  std::tie(a, b) = func();
  std::cout << "a: " << a << ", b: " << b << std::endl;
  return 0;
}

"Why can't I use struct to return multiple values?"

MVR just likes anonymous struct for returning multiple values, you don't have to define a specific struct for a function. And no explicit assignment for each value. As I said, compacted coding style.

Dynamic Wind (RAII Guard)

You may be familiar with std::lock_guard which is helpful to release the locks automatically, when you get out of certain scope. However, it's only locks. What if you want to do similar clean work for other data structure? Say, you want to restore your integer to a safe value in a similar context?

// guard.hh
#include <functional>
#include <exception>
#include <iostream>

using namespace std;
using thunk_t = function<void()>;
using GUARD =
  struct my_guard
  {
    my_guard (){};
    my_guard (thunk_t init, thunk_t clean) : clean_ (clean) 
    { init (); }
    ~my_guard () { clean_ (); };
    thunk clean_;
  };

Here's an example. Assume we have a function named work which accepts two int a and b, and it will change the values of the int variables. We need to make sure the values of a and b be kept back to a safe value when there's any exception happened. In this example, 0 is the safe value. The expected good result is that a and b should be 0 after calling work().

// test2.cc
#include "guard.hh"
using namespace std;

void work (int &a, int &b) noexcept
{
  auto init = [&]() { a = 1; b = 2; };
  auto clean = [&]() { a = 0; b = 0; };
  GUARD guard (init, clean);

  try
    {
      cout << "Start working, a = " << a << ", b = " << b << endl;

      /* Unfortunately, error happend!
        (of course, we trigger it intendedly, it's just an example)
       */
      throw runtime_error ("I was shot!");


      /* The rest code will never be excecuted because the
         exception happend before.
         So we have to rely on the guard.
      */
    }
  catch (const exception &e)
    {
      cout << "Error happened: " << e.what () << endl;
      /* Sometimes we need to return or throw to the upper level
         here, so that we don't have chance to keep value safe
         by unwinding the stack of catch in C++.
       */ 
      return;
    }

  cout << "The rest lines have no chance to be excecuted!" << endl;
}

int main ()
{
  int a = 0, b = 0;
  cout << "In the beginning, a = " << a << ", b = " << b << endl;
  work (a, b);
  cout << "work done!" << endl;
  cout << "b' is " << b << ", it's "
       << (b ? "dangerous" : "safe") << endl;
  return 0;
}

Conclusion

All these are very basic concepts in functional programming. Fortunately, the more complex things are constructed with these basic blocks. I've skipped many mathematical concepts to simplify this article, so that it may make the article not so precise and formal. But anyway, I hope you can understand these concepts from C++ programming perspective.

Thinking:

1. What's the benefit to use functional programming patterns rather than putting a bunch of redundant procedures together?

2. Is it necessary for a small program?

3. What about the small core but flexible and scalable system?

4. What about a huge serious long term maintainable system?

Enjoy!