For busy readers, here’s a potential approach to choose a dental practice:

- Look at Care Quality Commission reports
- Look at practice level survey information but note that for many practices the sample size will be small and there’s a large range of variability
- Consider the proportion of different treatment courses carried out. More Band 1 (non-urgent) treatments seem to be good
- Look at the proportion of adults at the practice and the 3-month revisit rate. See below for more details but in contrast to the NHS’s view, a high revisit rate seems to be a good thing.

You may find my combined measure described below useful for combining some of these different measures.

There’s a huge range in sizes of practices. We want to focus on practices that primarily offer NHS dentistry rather than just as an add-on to private dentistry. There’s no clear way to do this in the data. Arbitrarily we’ll filter to practices that have more than 2000 NHS dental courses in a year; that suggests they’re doing about 8 or more NHS dental courses in a day.

The NHS Dental Services survey a small number of patients per practice. They get asked “How satisfied are you with the NHS dentistry you received?” and get to answer “Completely satisfied”, “Fairly satisfied”, “Fairly dissatisfied” or “Very dissatisfied”. Most answers are “Completely satisfied” at 79% of responses. Our aim will be to use these responses to understand the properties of a practice that leads to these “Completed satisfied” responses. In other words what leads to “happy” patients?

A challenge to overcome is that there are very few surveys per practice. A typical practice has 7 patients surveyed (median) and on average there are 14 patients surveyed (mean). Given these small samples we can not score a practice by the proportion of positive responses as such an estimate would be subject to significant noise.

So we turn to all the other data about a practice released by NHS Dental Services about delivery of dental courses (FP17s). We’re looking at practices with more than 2000 FP17s so such measures are likely to be reasonably accurate about underlying proportions. The question is then can we use these measures to model practice patient happiness.

As a starting point for quality of dental treatment NHS Dental Services report: “The number of FP17s involving an adult for the same patient identity (surname, initial, gender and date of birth) where the previous course of treatment for that patient identity at the same contract ended 3 months or less prior to the most recent course of treatment. … In general, a patient who has completed a course of treatment that renders him or her “dentally fit” should not need to see a dentist again within the next three months. A high rate would indicate that further treatment has been provided outside the recall interval but could include urgent treatment etc.” We can create a logistic regression model to see how this factor (and the equivalent for children) influences happiness. As we might expect this produces a model that tells us that an increasing rate of prompt revisits does lead to lower happiness.

They also publish many other metrics. Of particular interest to me is proportion of “band 1” minor treatments. “A Band 1 course of treatment covers an examination, diagnosis (including X-rays), advice on how to prevent future problems, a scale and polish if needed, and application of fluoride varnish or fissure sealant.” It seems likely that people will be happier if their dentist can constrain themselves to minor work. A logistic regression model shows this to be the case – an increasing proportion of band 1 treatments leads to increasing happiness.

What’s then interesting is that if we create a model with both proportion of band 1 treatments and proportion of prompt revisits then prompt revisits then becomes a positive effect – more revisits make people happier. So people like the ability to make regular checkups.

So there’s interesting interactions going on between variables. There are also some extreme values that might make predictions from a linear modelling suspect. To produce our final model we turn to regression trees (formally gradient boosted machines) using the GBM package in R. We include the following parameters based on courses of treatment:

- Proportion Band 1 treatment
- Proportion Band 2 treatment
- Proportion Band 3 treatment
- Proportion Band 1 urgent treatment
- Proportion Prescription
- Proportion Arrest of Bleeding
- Proportion Adult
- Proportion adult revisits within 3 months
- Proportion child revisits within 3 months

We starting by showing partial dependence plots for the variables ordered by relative influence (left-to-right, top-to-bottom):

The central range of 95% of the data is shown by red lines. Some of the variables are approximately linear in this range but many non-linear sections can be seen.

In terms of happiness there are negative effects with treatment bands 2 and 3 and positive effects of treatment band 1. Adult and child revisits within 3 months are positive factors (the same potential surprise as we saw above). Note there’s interesting behaviour with the Proportion of Adults – it seems that 86% adults is ideal.

We can calculate model predictions for any particular practice by creating bootstrap models and then taking out-of-bag samples.

To see the quality of fit, we divide the practices into deciles by predictions from the model. We then report the mean of surveyed happiness. Firstly we do that weighted equally by practice and secondly weighted equally by response. We’re generally hitting the right prediction range for the latter case but are less accurate for the former. This suggests there may be some unexplained variability in practices with small number of responses (these tend to be smaller practices).

Decile Prac Resp 1 [0.572,0.740) 0.709 0.707 2 [0.740,0.765)0.7180.745 3 [0.765,0.778) 0.775 0.774 4 [0.778,0.787)0.7640.782 5 [0.787,0.795) 0.789 0.792 6 [0.795,0.803) 0.8010.7937 [0.803,0.810)0.7980.803 8 [0.810,0.817)0.8250.817 9 [0.817,0.825)0.8260.822 10 [0.825,0.859] 0.843 0.841

A quick glance at the data geographically (not shown) suggests that practices with low-end predictions are concentrated towards cities.

Despite knowing there’s unexplained variability in the data I found the model predictions useful to help me choose a practice. In case you do too, the 95% bootstrap confidence intervals from our model by practice are here.

Q: What else can we do with the data? Are there any other data sources so we can factor out patient demographics in some way? Are there some measures for patient healthiness we can use rather than happiness?

*Disclaimer: This post only addresses correlations not causal relationships. The properties observed could be due to demographics of the patients rather than anything to do with the practice. There is unexplained variation in the data this model doesn’t capture. Happiness is not the same as healthiness. Do look at other sources of information in particular the Care Quality Commission reports on inspections. *

*Data licence: “NHSBSA DS Data Warehouse, NHSBSA Copyright 2014” This information is licenced under the terms of the Open Government Licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3*

]]>

The first plot I made of the data was number of rainy 15 minute periods during May (shown with and without a base map):

There are many interesting features here. Firstly a few probably true observations about rain:

- You can clearly distinguish land and sea – there is more rain over land.
- You can see hilly regions with yet more rain.

However there are many other features that I guess aren’t true observations about rain:

- You can see the two shipping lanes in the English Channel with the separation zone between them.
- You can see the individual radar sites don’t give uniform coverage in all directions.
- There are other linear features of which I’m not sure of the origin – any ideas?

All these features suggest that I can’t treat this data as perfect truth data for nowcasting.

]]>

We model participants as meeting randomly for one-on-one fights to the death. This isn’t a perfect model but hopefully it’s a useful model. The random meeting assumption seems OK as it is hard for an individual to find a particular individual in the Hunger Games arena. We imagine that group activity can be broken down to steps of one person administering the coup-de-grace to one victim. This makes the Hunger Games look like a random binary tree with 24 leaf nodes.

We consider two ways of determining who wins a one-on-one fight to the death: “random” and “deterministic”. In “random” we flip a coin to determine the victor – this randomness seems realistic and could include the risk of death by infection or any of the other uncertainties that are in the arena. “Deterministic” is the opposite extreme and we know in advance a strength order for individuals which determines the winner of any fight; this model may more closely model the “Careers” approach.

I couldn’t find a closed form solution for the outcome of these models so I simulated it in R. I show below the probability mass function and the cumulative distribution function for the number of fights that the victor took part in (and won).

You can see that in the “random” case the modal number of fights is pretty low at 3. 95% of outcomes lie in the range [1,6]. The “deterministic” case typically gives more fights with a model number of fights of 5 and 95% of outcomes lie in the range [3,9]. In terms of a random binary tree these two models can be thought as the number of branches from the root to take at random until you reach a leaf node and the depth of a random node respectively.

Anyway Suzanne Collins seems to have got things looking statistically typical. You could count Katniss’s encounters in various ways but to me they all lie in the 95% confidence intervals of both models. Did Suzanne Collins have a statistician on board when writing the books?

Q: Can you advise on how to solve this problem analytically rather than by simulation?

]]>

So watching a lot of films from the US is probably to be expected but are you up enough on European and Japanese cinema?

[For info I used the Rcartogram package to make this plot.]

]]>

We can use the cyclic nature of the colour wheel to view the month and direction of travel. The key is in the middle of the plot. By month, red means the month of January and cyan July. By direction, red is north, west is dark purple, south is cyan and east is greeny-yellow.

So what can we see?

In both the Spanish and French shipping you can see the effect of the trade winds which mean you want to go near the poles to get a westerly wind to drive ships east to home. You can also see that ships seem to set off from the West Indies around June/July (does that correspond with harvests out there?). You can also see the Spanish ships reaching into South America unlike the French ships.

The Dutch and British shipping reach out east too. Similarly you can see the effect of the trade winds as ships go far south to go east but take the shortest path to come back west. British shipping is doing a lot with India and east Africa whereas the Dutch shipping is concentrated out to Dutch East Indies. The time of year story looks a little less clear but it looks like Dutch ships come home around January.

Also on the Dutch shipping you can clearly see triangular trade from Europe, down to Africa over to the West Indies and back to Europe.

]]>