How statistical interpretation can cause data to appear misleading

Twain (Date Unknown, cited in h2g2, 2003; Taflinger, 1996) stated that “there are three ways to not tell the truth: lies, damn lies and statistics”, with Wright (Date Unknown, cited in h2g2, 2003) claiming that “47. 3% of all statistics are made up on the spot”.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Through this report the author will be discussing how statistical interpretation can cause data to appear misleading, by covering five main points: these include how data is presented, how data is gathered, the affect that the size of the sample has on the analysis, how samples can have a built in bias, and how a correlation between two variables is not proof that one causes the other.

Bonoma (1985, cited in h2g2, 2003) claimed that there are different kinds of data that can be obtained from different sources; by collecting data from a variety of sources in a variety of ways, this provides the researcher with a wider range of coverage of the statistics: resulting in a fuller picture. When collecting the data, two main styles of research can be undertaken: quantitative and qualitative research. Quantitative research refers to the numbers in which would then be analysed to establish whether a correlation between the variables existed.

Thus the amount of customers attending two separate gyms would be researched, and the figures would be transferred into a graph to recognise any existing relationship. However, qualitative research would research why these customers attend their gym, or what would make their attendance stop; this research aims to summarise qualities and to understand concepts (Gratton and Jones (a), 2004). Interviewing, surveys or questionnaires and observations are the three main methods of collecting data for analysis.

Although interviewing allows the respondents to talk about their own opinions, the reliability of the data is dependent upon the individual’s responses (Gratton and Jones (e), 2004). Eggert (2007) notes that there is research to suggest that interviewing is one of the most unreliable techniques; this could be because the respondent may recall the information inadequately, have incorrect knowledge on the question in hand or may misinterpret the question.

Another reason for interviewing to favour so low could be because the majority of people, including managers have not been trained efficiently in interviewing skills and may even add in a unintentional bias, “as the spoken word is always as a residue of ambiguity, no matter how carefully we word the questions” (Fontana and Frey, 1998, cited in Gratton and Jones, 2004 p. 143), from facial expressions and head nodding during some answers.

Questionnaires need to be formulated with care, to ensure that the questions are understandable to all populations and to the lowest level of education possible to prevent non-response bias. Gratton and Jones ((d), 2004) noted that questions can perform errors in five main methods: incorrectly pre-coding closed questions, for example “how many times a week do you train? Never, 1-2, 2-3, 3+”; this question can raise invalidity as the respondent could fill in two or none of the available boxes, as one week they may train only once, with the next week training five times.

Leading questions such as “do you agree that..? ” and double barrelled questions for example “do you agree that smoking should be banned because it can cause cancer? ” can provide misleading results, as with the latter question the individual may agree that smoking should be banned, but not because it can cause cancer but for another factor such as they do not like the smell of the fumes; this may make the respondent feel pressured to agree as the wording of the questions indicates that the researcher agrees with the question and may feel threatened to answer accordingly.

The final method is incorrectly operationalising concepts (Gratton and Jones (d), 2004), for example a study may be carried out to find an association between the training commitments of a player and the amount of match play they receive; this study may not take into account external factors. For instance the individual may attend every training session and train outside of scheduled times, but may not play regularly due to family commitments.

Another player may train very infrequently but may play the entire match, every match; this may be due to a lack of players available for the specified position. Both examples do not prove that the more the player trains, the more match play they receive. This therefore supports McWalters (1999) and Williams and Wragg’s ((g), 2004) idea that correlation studies are one of the weakest experimental designs if trying to establish cause and effect, as a relationship does not always imply causation.

For another example of how an association between two variables is not evidence that one causes the other, see Appendix 1. “Data can be presented in three forms: text, tables and figures” (Williams and Wragg (d), 2004 p. 102). Figures are used to establish trends and patterns, although they can be very misleading if not interpreted incorrectly; regularly the title is not specific to the results in the figure and so the viewer does not know what they are actually looking at; the scale of units can be manipulated and may cause the viewer to misread the results (h2g2, 2003) (see Appendix 4).

William and Wragg ((e), 2004) and Simon ((a), 2009; (b), 2009) claimed that there are four main figures that can be used in statistical analysis, these include bar/column charts that are used to establish the difference between variables; scatterplots to determine the correlation between two variables; pie charts to view the relative portions through percentages amongst variables; and histograms are used to review and display a rough estimation of the frequency distribution of the process data set.

“Statistics should be interpreted with caution as they can be misleading; they can both lie and tell the truth. ” (Joey0744, 2008). The truth and lies can both be uncovered in one finding on statistical research: the question to ask is what is the data being compared to and are the comparisons relevant and valid to the research. Thus, a study may show that gun crime is higher in New York than in London.

Evidently this claim is true as it would be stated clear in a table to show the figures; however this study does not take into account the reality that New York laws allow the handling of firearms, whereas United Kingdom laws do not allow the handling of firearms. Therefore the results of this study would show to be misleading as it had not been compared to a city/state similar. To prevent a built in bias the researcher should have carried out the study against another state similar to New York. It can be said the many researchers carry out studies with a built in bias to persuade the audience to one side.

For example, with the debate of whether smoking causes cancer, many studies have been carried out; Martin (2005) claims that smoking causing about ninety percent of lung cancer deaths in males and virtually eighty percent of lung cancer deaths in females. The question to ask here is who was being studied? Did the researcher ensure the study was high in internal validity; the researcher may have been very selective in their sampling process, and only studying individuals who smoke and have lung cancer, or ensuring that there was more smoker and lung cancer suffers that those without lung cancer.

Therefore this study would not represent the population as a whole, only the population that the study was taken upon. Percentages must not always be trusted as the viewer will not know if only ninety percent of male smokers with lung cancer and eighty percent of female smokers with lung cancer asked; if this claim is true then the statistics would prove to be right that out of the ninety percent of males asked, ninety percent did have lung cancer.

Therefore this test would have lead to the creation of artificial sampling conditions that do not reflect the population as whole, which can be known as low external validity, producing misleading results (Ellis-Christensen, Date Unknown; Taflinger, 1996; Williams and Wragg (a), 2004; Williams and Wragg (b), 2004; Williams and Wragg (f), 2004). “Garbage in, garbage out” (h2g2, 2003). When reading research, the interpreter must remember that statistics can be found to support just about any idea and in every way of viewing it; the interpreter must ask themselves who carried out the study: someone for or against the hypothesis?

If the question was specific then it is likely that the statistics are correct, however if the question was vague then no meaningful information can come from the raw data (h2g2, 2003; Taflinger, 1996). For example, a question of “What female running trainers are better? Asics or Reebok? ” can provide misleading results, as this question is very unspecific. The question does not state whether it is referring to the how comfortable the trainer is, the design, colours available, price, performance etc. Therefore the statistical analysis of this data will be deceptive as it is not identifying any specific factor.

Gratton and Jones ((b), 2004) stated that there are four techniques in sampling for statistical analysis, these include random, stratified random, cluster and systematic. Each technique will be preferable to the specific aim of the study and population groups required, with random sampling being the ideal technique as it provides a representative sample of the population as a whole. However all four of the techniques can result in misleading statistics if not performed with care; more information on what each of the techniques entail can be found in Appendix 2.