New Actuaries Must Know About Machine Learning

Machine Learning (ML) is an inescapable topic — and it’s also caused a stir within the actuarial industry. Machine Learning can be defined simply as “the science (and art) of programming computers so they can learn from data”, courtesy of A. Géron in his 2019 book. It’s proved itself as an invaluable tool that companies with deluges of data can use to extract insights to enhance their products and services.

The story is the same for actuaries. Actuaries are occasionally described as being the first data scientists, thanks to their long history of working with data and modelling techniques. There’s clearly a significant overlap between these two fields of study — and that’s why an increasing number of actuaries are finding themselves applying Machine Learning tools in their day-to-day work. Whether it’s for the purposes of unlocking the potential from data that has traditionally been overlooked (eg extracting insights from text fields or images, rather than just cold, hard numeric data) or whether it’s to leverage the data they already have with more powerful techniques and algorithms, one thing is clear — Machine Learning is here to stay.

There’s just one problem

Although some Machine Learning algorithms have been working quietly in the background since the 1990s, it seems like the world has quickly become more and more about AI. Business use cases are abundant and AI and ML is in all kinds of products and services (powering our phones, our favourite websites, and the industrial production processes that we depend on every day). But ML also has applications in actuarial fields: analysing mortality experience to find new trends, pricing insurance products, forecasting financial data…

And yet the material required to be taught to actuaries-in-training remained static. The world was moving on, and this educational disconnect was growing. The actuarial syllabus looked like it was in danger of becoming anachronistic.

The solution

That is, until 2019. The syllabus was revamped — with fresh new material being introduced and old material of dwindling importance being retired. Lo and behold, one of the new additions to the Institute and Faculty of Actuaries’ (IFoA) Subject CS2 exam was Machine Learning — along with the closely related Data Analysis topic in Subject CM1. These changes were effected “to ensure that the curriculum is relevant, up-to-date and reflects the skills, knowledge and attributes required of actuaries in an ever-changing global business environment.” That sounds like a good enough rationale — but does the new material live up to this claim?

According to the new syllabus, Machine Learning now makes up 10% of the CS2 exam and covers five learning objectives, namely:

Branches of ML and the types of problems it addresses
High-level concepts concerning learning from data
Descriptions and examples of key techniques
Applying ML techniques to simple problems
Understanding the perspectives of other non-actuarial quantitative researchers (data scientists, statisticians…)

Sounds like quite a well-rounded introduction to the subject. Let’s take a look at what key themes are addressed in the core reading — the testable material for the CS2 exam.

Definitions and scope of ML

Quite reasonably, we start with some basic definitions and explanations as to what kind of problems ML is useful for. There’s not much point developing a fancy ML algorithm for a situation where a classical method will do — and equally, ML has nothing to offer if there are no genuine patterns to detect in the data.

After starting with some concrete examples (such as targeted advertising, forecasting elections, predicting defaults on loans) the material becomes more formal — describing the Machine Learning process as approximating a target function that maps a set of measureable variables onto an output. The material isn’t afraid to reveal the mathematical nature of algorithms and problems — this could be a welcome change to those who have only heard ML thrown around in imprecise terms (or have only been repeatedly bashed over the head with the various related buzzwords).

Overview of key concepts

Next up comes a discussion of key concepts in the field. In particular:

Loss functions
Methods of evaluating models (accuracy, precision, recall, confusion matrix, and so on)
Parameters and hyperparameters
Train, validation, and test sets
Overfitting
Model regularisation

The concepts are described well — but it’s just that: a description. Machine Learning is one of those fields where you need to get your hands dirty to really get a good grasp of the concepts and why they’re important. It may be advisable to use some of the excellent ML online resources or books out there to see some end-to-end implementations of these concepts and really take your understanding to the next level.

Branches of Machine Learning

The essential distinctions between the different types of ML algorithms are made here, covering supervised, unsupervised, semi-supervised and reinforcement learning techniques. This section also tackles regression versus classification and generative versus discriminative models, along with additional theory, examples and actuarial applications. Again, it’s a great survey of the topics but you’ll want to do some further research to solidify some of the concepts. Watch videos, read books and blogs — try and make the connections for yourself to get the most out of the material.

The Machine Learning Process

The general steps of an ML project are given a reasonable amount of real estate in the notes, and rightly so. Machine Learning isn’t just developing, training and evaluating models — there’s a lot more that goes into the successful deployment of an ML solution. Here we talk about:

Collecting data
Exploratory data analysis
Data preparation
Model training
Validation and testing
Improving model performance
The importance of documentation and reproducibility

Actuarial students will feel at home with the mathematics of the models they encounter in the course — but it’s vital to know that this is only one chapter in the ML story. If you want to go on to apply your knowledge to real-world business use cases, you need to have a solid grasp of the big picture.

Key algorithms

Next up, some key algorithms are discussed — notably:

Penalised generalised linear models
Naïve Bayes classification
Decision trees
K-means clustering

There’s a great deal of material on these, which is a good thing — it builds on some of the concepts introduced earlier. What’s more, it builds on some of the topics covered in other parts of the course and beyond. Those who are familiar with Regression, Generalised Linear Models and Principal Components Analysis from CS1 and Proportional Hazards Models from elsewhere in CS2 will be glad to see some of these concepts extended.

Perspectives of other quantitative researchers

Finally, we discuss how the perspectives of an ML practitioner differ from perhaps more traditional researchers and modellers, such as statisticians. This section talks about the difficulties of communication between groups — particularly when different terms may be used for what are essentially the same concepts, or when the two groups may be interested in different aspects of the analysis. Machine Learning is absolutely a cross-disciplinary field of study, so practitioners will be frequently communicating with professionals from other backgrounds. “Zooming out” to discuss some of the more general, but equally important, issues that arise when developing and applying ML solutions in the real world nicely rounds off the material, in that we emerge from the theoretical depths to end on a pragmatic note.

So what’s actually made it onto the exam?

Any student (who’s brave enough to admit it) will know that more effort tends to be put towards material that is actually tested. Strictly, everything in the reading is examinable, but this is what has come up so far in the two sittings since the new material was added.

April 2019 — short question on supervised versus unsupervised learning and examples of each. Longer question on the calculation and interpretation of model performance metrics.

September 2019 — one longer question discussing the train-validation-test approach. Assessment of whether Machine Learning is appropriate in a given example scenario. A discussion of the pros and cons of a more complex model with a higher number of parameters.

Where’s the code?

It may seem like a missed opportunity to not to engage students in some more practical, end-to-end ML practice projects in Python or R — especially now that data analysis and actuarial statistics with R is now makes up a part of not just one but two IFoA exams. Even the Institute’s brand-new Certificate in Data Science doesn’t require the writing of any code to complete. On the one hand this may seem to be completely bonkers — after all, the best way to learn Data Science or Machine Learning is arguably to immerse yourself in it and start playing around with a dataset that interests you.

However, it’s important to note that the IFoA is not trying to create data scientists. Rather, they are aiming to “help actuaries, at whatever stage of their career to gain a basic understanding of data science tools and techniques, and how they can be applied, through examples and case studies to actuarial practice” (found here, along with other FAQs about the Certificate in Data Science). Essentially, you don’t have to become an expert in Machine Learning — but you do need to be aware of what it is and how it is used as it becomes increasingly common in industry and society at large.

The outlook

The actuarial role is evolving. IFoA President John Taylor has presided over a rapid Data Science expansion within the profession in the last year. It’s clear that this is just the first step — updating an entire profession is certainly no easy task, but it’s undeniably the right thing to do. The formalisation of Data Science and Machine Learning techniques in the IFoA’s student curriculum is a move that acts to consolidate actuaries’ positions as professionals that can add real value for clients — and one that will only prove more and more useful as actuaries continue to work in industries further afield.

More info and credits

Andrew Hetherington is an actuary and data enthusiast working in London, UK. All views are Andrew’s own and not those of his employer.

Connect with me on LinkedIn.
See what I’m tinkering with on GitHub.

Photos by Štefan Štefančík, CoWomen and Headway.