This was exactly the situation that our Data Science Team faced during the Covid-19 crisis in March 2020. Find out how we created an ML quiz engine with no data or questions.

To find out how we did, try it for yourself!


image source

In the first week of the UK’s lockdown, a co-worker shared this article from the Washington Post. I was inspired by the great simulations of how diseases spread and the demonstrations of how some social distancing measures can help to reduce the spread.

In response to reading the article, I decided to recreate the simulations and add far more additional detail to them. In this article I will discuss some particular scenarios of interest. …


Image source: quantlabs.net

A beautiful data visualisation speaks a thousand words. Inspired by a talk I saw by David Spiegelhalter last year I’ve created an easy to use python package to create visually stunning dot plots.

Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.

Visualising Covid-19 Symptoms

Due to the global pandemic currently effecting every part of the world I thought…


I’ve previously discussed how term-frequency inverse document-frequency (tf-idf) can be used to calculate content association metrics between individual pieces of BBC content. By using our customer browsing data we can study the overlap in users between each pair of content items and deduce how similar they are. One challenge with such an approach is that popular content items will share a user-base with many other content items. Using tf-idf weights mitigates this problem by penalising the weights associated with popular content.

With the added application of Louvain clustering we can identify collections of different content enjoyed by the same people…


The BBC produces fantastic content that appeals to a mass audience such as Killing Eve and Bodyguard on BBC iPlayer, the Danger Mouse games for CBBC, and Match of the Day on the BBC Sport, to name a few. While it’s great that we can produce content that is enjoyed by so many different people it does create data science challenges around understanding the personalities and preferences of individuals. Episode 1 of Killing Eve, for example, attracted a whopping 26% of the TV audience during first transmission.

Term frequency-inverse document frequency (tf-idf) is a metric most associated with text analysis…


BANs or big ass numbers are a visualisation technique that are frequently overlooked in favour of unnecessarily complex graphs. While some stakeholders might need to know sales figures going back 18 months and need lots of stats about how last week performed, for others a top level stat that can identify good or bad weeks will suffice and have more impact. In this post I’m going share a nice BAN-based doughnut plot code that I’ve developed in python to do just that and I’ll apply it to 2 common datasets.

Air Passengers Dataset

Let’s start with the air passengers dataset. This set contains…


So you’ve done some clustering using k-means; you’ve scaled your data, applied PCA, checked the clusters using the elbow and the silhouette method and you’re pretty confident that your model is giving you the absolute best clustering you can get. But just how much trust can you place in the classification of individual data points? That’s what I’ll be discussing here.

Most metrics for assessing a clustering algorithm are measures of the global success of your model. But what if you’re not looking to classify every data point but are looking to identify specific data points that definitely fit into…


Most of us data scientists go into the industry because we love data (whatever that means? No, I don’t know either!). The ability to create easily readable plots is often an afterthought. Most job descriptions will mention that being able to visualise data is important but I have never had a sensible conversation with anyone either at interview or on the job about best practices around visualisation. Given how many bad plots are out there it is most definitely not because we’re all experts at it!

There’s no point in spending 2 months doing some analysis to then have all…


Love Island 2018 was all about the “geek” – Wes was an engineer, Megan just wanted people to see her geekiness… it was everywhere!

Now, 2019 is all about the science and – I’ve got to be honest – I’m loving it! It’s providing me with endless hours of entertainment. But do our Love Island contestants 2019 know their Newton’s law from their Jude Law or are their heads just going round in circles trying to work out what a 560 degree head spin looks like?

In this series I’m going to be coupling up the contestants’ attempts at science…

Matt Crooks

Data Scientist at the BBC

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store