There are several different types of insights you can obtain from the information you collect. Outlined below are some of these:
- What is the central tendency of the data?
- What is the variability of the data?
- What is the distribution of the data?
- Are there any trends in the data?
- Are there any apparent cause and effect or other relationships between different variables?
- How stable is the data?
The central tendency of the data refers to single number estimates that can be used to represent the entire data set. The most typical measures for central tendency are:
- Mean – the average value of the data
- Median – the middle number when the data is ranked from lowest to highest value
- Mode – the most frequent number encountered in the data
In nearly every case, you will want to know the central tendency of the data. In most cases, this will be the mean. However, if the data has some values which are extraordinary high or low, you may want to use the median as a measure of central tendency.
The variability of the data refers to how divergent the data is from the central value. In general, the more variability there is in a system, the more problems there are with the system. Variability is often one of the most critical data insights you can develop.
Essentially there are two measures of variability:
- Variance – a statistical measure of the degree by which the data is spread from the mean. (The square root of the variance, the standard deviation, is often used as well.)
- Range – this gives simply the spread in the data from the high value to the low value. The range is a quick estimate of the variability but is not very useful.
The distribution of the data represents a plot of the frequency of different observations. Your analysis of the data distribution can take one of two forms. You may want to simply plot the data to get a sense of what the distribution looks like. In some cases, you may want to fit the data to a known distribution (e.g. normal, exponential). A more formal analysis is needed for some of the modeling tools you will be using (e.g. simulation, queuing models, quality control applications).
Should your analysis be more informal, you may want to ask yourself some of the following questions.
- Does the distribution have one peak, two, or more peaks? When there is more than one peak, you need to understand why this is the case. Systems with multiple peaks often suggest a need to diagram multiple versions of the system.
- Does the distribution have an unexpected number of data points at one or both of the tails? If this is the case, you need to think about what is causing these unusual values. Often these extreme values can be a major concern in how a system operates.
- What is the shape of the distribution? Is it symmetrical or is it skewed to one end or the other? While you can’t draw general conclusions from the shape, you do need to take the shape into account in your design.
Trends in the data are easier to detect than they are to define. Trends can be of a number of varieties:
- Growth or decline
- Cause and effect
- Single variable or multiple variables
Growth and decline trends are the most common trends. Typically these involve a plot of a variable versus time. A regression equation is fit to the data to see if there is a statistical significance to the trend.
Cause and effect trends are not as easy to identify. Typically in these cases, you are looking at anomalies (the effect) in the data to see if there is an apparent cause for these. Suppose that we had incidences of extraordinary long surgical operations. Is there another variable that suggests why this is the case (e.g. doctor, patient condition)?
Trends can also be single variable and multi-variable. Single variable trends are common. Multi-variable trends are harder to detect.
Relationships in the data refer to whether there is an apparent connection between the different variables. This is often done in one of two ways:
- Graphical plots
Graphical plots look at data points along two dimensions.
The graphical plot only gives you a sense of the relationship. Correlations give you statistical measure. Whenever possible, correlations should be done.
When we look at data, we often assume that the data is stable for a number of situations.
Checking for stability allows you to determine if a parameter you are measuring is the same for different situations. You can compare these situations by doing the analysis separately for each situation.
Should the situation turn out to be different, you may need to think about why these differences exist. Often these differences can be the most significant part of your analysis.