Biostatistics

Data analysis is a five step process. The first step is to define your research question clearly during the study design. The second step is to determine how you are going to measure each component of your question. The third step is to collect data that meets your measurement definition and prepare it appropriately. The fourth step is to analyze your data using an appropriate analysis method. Statistical analysis should be performed on quantitative data. Qualitative methods such as thematic analysis and content analysis should be done on qualitative data. The fifth step is to interpret your results, which is part of your research paper or manuscript.

If you are a student, with your supervisor's permission, we can help you compute your sample size, determine the appropriate data to collect for your outcome and predictor variables, guide your data analysis approach, and help  you interpret your results. We cannot assist you with course work. 

If you are a faculty member, we can assist with research design, data collection, data analysis, and interpreting results.

A good rule of thumb to follow is to obtain 25% of the sample size needed for the full research study.

The sample size calculation depends on your hypothesis test, the significance level (usually set as 5%), and the power and results from your pilot study. There are many formulas available for different research situations. Please submit an online service request for assistance.

For surveys, Qualtrics has a sample size calculator.

The appropriate analysis will depend on the type of data that you collected and the hypothesis or research objective that you want to answer. Please submit an online service request for assistance.

Most of the statistical analyses require the normality of the data, such ANOVA or linear regression. But there are other statistical tests that are parametric-independent and do not need the normality assumption to be valid. You can also perform data transformations, but that is trickier to do. When your sample size is relatively large, then the assumption of normality may be relaxed.

When the sample size is sufficiently large (>200 per group), the normality assumption can be relaxed. The Central Limit Theorem ensures that the distribution of the noise or disturbance term will approximate normality. When you have very small samples, it is important to check for a possible violation of the normality assumption.

Level of significance, α, is the pre-set level of error that you want to commit in your research, determined before your data collection. It is usually set at 0.05 or 0.01, more rarely 0.10.  P-value is the actual level of error found when you perform the statistical test. When p-value < α, then it supports the evidence against the null hypothesis (no effect) and your results are ‘statistically significant’. The smaller the p-value, the stronger the evidence.

It is a good idea to support a p-value with a confidence interval for the estimate or effect size being tested. The ‘estimate of the effect’ you found is only for the sample data you collected. The ‘true effect’ for the whole population may be different, but we can be assured that 95% of the time it will fall within the confidence interval.

Power is the ability of the statistical test to detect differences or effects that lead to rejection of the null hypothesis. It depends on the sample size. The larger the sample size, the bigger the power. It is important to calculate the sample size to have sufficient power before you begin your data collection. When your sample size is small, your study might not be able to detect the difference or effect, even when it is real, because of lack of power.

Statistical significance simply means that we reject the null hypothesis (no effect). For example, a clinical trial may enrol hundreds of thousands of patients to compare a new anti-hypertension drug with the current one. Because of the large sample, the test may reject the null hypothesis that the two drugs are equivalent. However, in practice, the difference between them may be relatively small and have no real clinical significance. The clinician should not just blindly follow the results, but should combine professional judgement with statistical evidence.