Differential-Expression Analysis of Biological Data

Background. A wetlab conducted a series of biological measurements to discover, which biological markers can be used to detect and estimate the prognosis of coronary artery disease. You task is to do first selection of genes, metabolites and lipids, which should be studied further. In particular, you should find, which biomarkers have significantly different expression levels for different groups. For that, you should use an appropriate statistical test, such as t-test or Wilcoxon test, and apply appropriate correction method for multiple hypotheses testing. There is also a way to prune the set of hypotheses before testing. Namely, a biological background information suggest that errors in measurement procedure and non-condition specific natural biological variability can cause fluctuations up to 20--30% of the base levels.

Description of data. Data consist of four tables. The first table patients.csv contains a list of patient ID and corresponding attributes Gender, High.Blood.Pressure, Coronary.Artery.Disease. You should split the measurement data according to these attributes. You can use the fact that high blood pressure is known to be a major risk enhancer for coronary artery disease in your analysis. Three remaining data sets contain expression levels of genes, metabolites and lipids: gene-expression.csv, metabolite-expression.csv, lipid-expression.csv. To be precise, each row in a file contains patient ID, measurement date and corresponding expression levels. You can use merge command in GNU R to join patient data together with measurements. The latter significantly simplifies group formation.

Restrictions on reports. Write up to single A4 sheet of paper what kind of pre-processing and test you did, what kind of p-value adjustment methods did you use, and what biomarkers should study further according to you opinion. Besides the text you should provide illustrations and visualisations to support your selection. For that you can use up to 3 pages. The size of figures should be such that they remain readable after I have photocopied them two times. In particular, they should remain understandable in black and white.