Home | | Medicine Study Notes | Biostatistics

# Biostatistics

Avoiding systematic errors: o Trying to use sample mean to estimate the population mean o Statistic = parameter + bias + confounding bias + chance

Biostatistics

·        Choosing study subjects: o  Populations may be people, institutions, records or events

o  Sampling frame is a complete list of individuals in the accessible population

o  A sampling procedure is used to select a representative intended sample from the sample frame

·        Avoiding systematic errors:

o  Trying to use sample mean to estimate the population mean

o  Statistic = parameter + bias + confounding bias + chance

o  Statistic: summary measure in a sample

o  Parameter: underlying value in the target population

o  Bias or Confounding ® ¯internal validity

o  External validity is whether the study can be applied to the population I‟m interested in – is it similar enough to the study population?

·        Bias:

o  Bias = systematic deviation between the statistic and the parameter, due to defect in the design, conduct or interpretation

o  Occurs predominantly in design and data collection

o   Selection bias: systematic error due to those who were selected and those who were not, so sample not representative of the defined population

o   Information bias: a flaw in measurement exposures or outcomes that results in a differential quality of information between sub-groups/individuals.

§  Misclassification bias: Subjects erroneously categorised. If a random bias then ¯ association in results and odds ratio moves towards 1

§  Interviewer bias: systematic difference in soliciting, recording and interpreting of responses (¯ by training the interviewers – always check this has been done)

§  Recall bias: should be < 2 weeks for health events. Diet recall ~ 24 hours. If not random (eg case-control studies) then biased (eg if cases taken from records then there is variability in what was asked and recorded, verses uniform questionnaire for controls)

o   Response bias: systematic error due to differences between those who volunteer and those who do not (eg bias from drop-outs and non-responders)

o   See Topic: EBM Glossary, for further examples of bias

o   Confounding bias:

§  A measure of the effect of an exposure on the risk of an outcome is distorted by an association of the exposure with other factors that influence the outcome

§  Standard ones: age, gender, ethnicity, socio-economic status, obesity, smoking, alcohol

§  As long as you collect data about the confounding factor, you can do something about it

§  Can control for confounding using matching, logistic regression or stratifying data

·        Chance effect:

§  Standard error: §  Quantifies the precision with which the sample mean estimates the population mean

§  Says NOTHING about variability in the data

o   Confidence interval:

§  Turns standard error into something we can interpret: sample mean +/- 1.96 * standard error

§  95% sure the underlying value lies in the range

§  Width is dependent on:

·        Variation in observed data

·        The sample size (larger sample ® narrower confidence interval ® more precise estimate)

·        Degree of confidence we want

§  Accuracy depends on presence or absence of bias

o   Tests of significance:

§  Tests of significance are a tool for statistical inference

§  Test compatibility of a set of data with the null-hypothesis: assume there is no difference between the means – what is the probability we would observe a difference as big by chance

§  P value: the probability of getting a value at least as extreme as the observed statistic. Threshold usually 0.05

§  Most common test statistics are chi-squared and t-statistic (compares two means). Both depend on degrees of freedom

o   Power: = probability that the study will find a statistically significant difference if a true difference of a given size exists

·        Data:

o   Qualitative: not numeric (eg hair colour)

o   Quantitative: can be continuous or discrete

o   Measurement scales can be nominal (categorical and unordered), ordinal (categorical and ordered) or interval (continuous)

o   Data description:

§  Categorical and discrete date: bar graphs, frequency distributions

§  Continuous data: histograms, frequency polygons

§  Central tendency: Mean or median (best measure of central tendency if skewed distribution)

§  Spread/variability: Standard deviation, percentiles or inter-quartile range

§  Correlation co-efficiency – degree of clustering around a straight line

§  If two variables are categorical and unordered then use relative risks and odds ratios

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Medicine Study Notes : Public Health : Biostatistics |