What's the most popular car in the UK?

The Office of National Statistics is a brilliant source of interesting data sets. This page lists recently released datasets. Check regularly and you’ll soon notice these data sets feed into news articles by the Economist, BBC news and more.

One I recently thought to explore detailed the count of car makes and models in the UK. I wonder what’s the most popular car make and model in the UK?

library(tidyverse)
library(readxl)
library(stringr)

path <- "~/Learning/R/Datasets/vehicle-statistics-make-model.xlsx"
car_data <- read_excel(path = path, skip = 8, col_names = FALSE, sheet =  1)

Great, we’ve got our data. But what does it look like?

X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 X__10 X__11 X__12 X__13 X__14 X__15 X__16 X__17 X__18 X__19 X__20 X__21 X__22 X__23 X__24 X__25 X__26 X__27 X__28 X__29 X__30 X__31 X__32 X__33 X__34 X__35 X__36 X__37 X__38 X__39 X__40 X__41 X__42 X__43 X__44 X__45 X__46 X__47 X__48 X__49 X__50 X__51 X__52 X__53 X__54 X__55 X__56 X__57 X__58 X__59 X__60 X__61 X__62 X__63
ABARTH 124 SPIDER MULTIAIR AUTO 72 71 59 58 36 35 26 25 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ABARTH 500 4456 4382 4447 4370 4490 4411 4525 4448 4568 4487 4550 4469 4580 4500 4574 4494 4548 4467 4462 4387 4421 4347 4263 4094 3977 3834 3782 3622 3535 3387 3296 3148 2982 2752 2630 2417 2293 2055 1909 1677 1358 1039 706 456 232 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ABARTH 500 C 608 596 608 598 603 592 614 602 615 604 614 603 631 619 627 615 638 624 640 626 640 626 640 648 642 628 604 587 560 520 499 455 401 342 316 239 190 127 74 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ABARTH 500 CUSTOM 367 362 363 358 374 370 382 378 378 373 380 376 380 376 373 369 329 326 196 196 16 16 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ABARTH 500 CUSTOM S-A 112 110 112 110 114 112 113 111 111 109 89 87 57 55 26 26 18 18 15 15 2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ABARTH 500 S-A 90 88 87 85 92 90 92 90 94 92 96 94 95 93 95 93 90 88 82 80 78 76 71 62 55 48 43 35 24 15 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
oh.

oh no.

It’s not as we’d like it, but we can certainly handle it.

We’ll need to tidy it up a little before the real fun begins.

names <- read_excel(path = path, skip = 6, col_names = F, n_max = 1)
names[2] <- "Model"

names(car_data) <- names

tidy_cars <- car_data %>%
  gather(key = Period, value = Count, -Make, -Model) %>%
  extract(Period, c("Year", "Quarter", "Region"), "(\\d{4}) Q(1|2|3|4) (UK|GB)") %>%
  mutate(Year = as.integer(Year),
         Quarter = as.integer(Quarter),
         Make = str_trim(Make),
         Model = str_trim(Model),
         YQ = lubridate::yq(paste(Year,"/",Quarter))) %>%
  filter(stringr::str_length(Make) < 20,
         !is.na(Count))
Make Model Year Quarter Region Count YQ
ABARTH 124 SPIDER MULTIAIR AUTO 2017 2 UK 72 2017-04-01
ABARTH 500 2017 2 UK 4456 2017-04-01
ABARTH 500 C 2017 2 UK 608 2017-04-01
ABARTH 500 CUSTOM 2017 2 UK 367 2017-04-01
ABARTH 500 CUSTOM S-A 2017 2 UK 112 2017-04-01
ABARTH 500 S-A 2017 2 UK 90 2017-04-01

Much better!

Now that we have our tidy dataset, lets explore the latest counts in the UK, from Q2 2017.

tidy_cars %>% 
  filter(Year == 2017, Quarter == 2, Region == "UK") %>%
  count(Make, wt = Count) %>%
  arrange(desc(n)) %>%
  top_n(n, n = 30) %>%
  ggplot(aes(reorder(Make, n), n)) + 
  geom_bar(stat = "identity") + 
  labs(x = "Make", y = "Count", 
       title = "The 30 most popular car makes in the UK as of Q2 2017", 
       subtitle = "Ford, Vauxhal and Volkswagen are significantly more popular than other makes") +
  coord_flip() + scale_y_continuous(labels = scales::comma)