A program for resampling by Michael Wood

A program for resampling

Michael Wood. Department of AMS, University of Portsmouth, Locksway Road, Portsmouth, PO4 8JF, UK.
email: michael.wood@port.ac.uk

This program selects random resamples (with or without replacing them before taking the next resample) from a list of numbers (ie your data). It will work with the mean, median, range, sum, interquartile range, standard deviation, variance or any specified percentile of the resamples. It will then produce a frequency diagram and histogram of the resample statistics, and work out percentiles, confidence intervals, probabilities and various standard statistics of the distribution of resamples.

It can be used for many things - including estimating confidence intervals for any of the statistics above, estimating binomial probabilities, estimating control limits for most types of Shewhart control charts, demonstrating the ubiquity of the normal distribution, simulating the hypergeometric distribution, simulating sampling errors with small and large populations, and so on.

The notes on this page are about how the program works, not about the principles of resampling and how to get the program to do all these things - although some of this should be obvious once you've tried it.

It is a very simple program without a proper Windows interface – it does not use the mouse. The graphics and editing facilities, in particular, are crude.

Save program to disc (You may need to use the right hand mouse button.)

Running the program

When you've downloaded it, click on the filename, resample.exe, and it should work.

The program should be self-explanatory. If in doubt press Enter to choose the highlighted default.

There are four main menus entitled:

Entry and editing data
Resampling scheme set up (ie decide the resample size and whether you want to work with the mean, median, etc)
Do some resampling and display histogram
Analyse resamples

An example

I would suggest running the program with some simple data to see how it works. For example the following are the weights (in grams) of 20 apples from a tree in my garden:
133 193 134 155 99 145 182 149 169 157 156 152 163 166 139 143 158 183 160 146

Choose Key in data from the first menu and enter this data. Press Enter after each number, and Q and then Enter after the last number.

Choose Resampling scheme set up.

Change the default settings to (say) the Mean of resamples of 6 with replacement. Each number is replaced after it is chosen by the resampling procedure. This is like choosing samples from a very large, or infinite, population, since removing a particular number from such a population will make no practical difference to what is left.

Choose Do some resampling and display histogram.

The default number of resamples will be 1. Press Enter to accept this. Check the next screen carefully to make sure you see what the program has done.

If you want to, do another 1 resample and check the output again. And perhaps again.

Then replace 1 by 10000 to do this number of resamples. Study the output carefully.

Choose Analyse and choose the analysis you would like to do. (If in doubt choose the highlighted default.) Again, make sure that you can see what the program is doing. The standard percentiles will give you the range within which 95% of the means of samples of 6 will lie.

Running the simulation again with means of resamples of 20 (with replacement) will show how much these vary. (Note that these resamples of 20 will not - usually - be the same as the original sample because some numbers will be selected more than once and others may not be selected at all. The resampling process - with replacement - is like sampling from a large population with the same distribution as the sample of data.) This is the way bootstrap confidence intervals for the mean are worked out.

If you try resampling with no replacement, you should see that the numbers are never repeated in the resamples. If you set the the resample size equal to the size of the sample of data, you will find that all the resamples are the same since there is only one possible resample. Resampling with no replacement gives a direct simulation of the process of taking a sample from a small population.

Now I would suggest browsing through the menus to see what other possibilities there are.

The program deals with three different types of "samples":

the real data (in the example above the size of this sample is 20 individual numbers),
the individual resamples (the size of each resample is 6),
the collection of simulated resamples (about 10000 of them if you have followed the instructions above)

Using "attribute" (yes/no) data

Use 0 and 1 for the data. For example, a sample of 40 components 10 of which are defective would be entered as (press Enter after each number):
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(When keying in these data, it is easier to answer Yes to the question about using the previous number as a default.)

Coding the data like this, the Sum of a sample (10 in this example) is the number of defectives and the mean (0.25) is the proportion of defectives.

Loading and saving data files

You can load data from a file. This file should be a text file, which can be produced by any spreadsheet or word processor. If there is more than one number on a line they should separated by spaces.

Before running the Resample program note exactly what your data file is called as I have not got round to telling the program to remind you. You can also edit and resave your data file (see the first menu), although I would suggest using a spreadsheet or word processor for most editing.

Printing and recording results

All results appearing on the screen are saved to a text file called Resample.out. This can be edited and printed with any word processor.

22 December 2002.