I just added an example of simple model construction to my textbook, Statistical Inference for Everyone. It's a process I don't think I've ever seen in an intro stats book, but is common in scientific work. The idea is that you start off with a simple model, collect data, then notice where your simple model breaks, propose a new more complex model, and do the analysis again.
The entire data set I use is here, where I have the mass of US Pennies for several years:
Single "True" Value Model
One starts this analysis loading the first part (earlier than 1975), and applying a model which states that there is a single "true" value. The best estimate of this value is the sample mean, and the posterior distribution is normal. A plot of this looks like
If you apply it to all the data, you get something that clearly looks ridiculous:
It is then that it makes sense to change the model to a two "true" values model.
Double "True" Value Model
With this model, we have separate means for the pre- and post-1975 data, and can look at the overlap of the credible intervals, or the posterior distribution of the difference, both of which clearly show a statistically significant difference.
This approach has several advantages over the typically methods used to teach this topic:
- it progresses systematically from simple to complex
- it shows the benefits and limitations of the simple models
- it connects the procedures of the complex models to the earlier ones, so they don't seem like disjoint unrelated topics.