ECTS credits: 3 ECTS
Course parameters:
Language: English
Level of course: PhD course
Time of year: 5 October 2020 to 9 October 2020. Please note that course dates may change due to the covid-19 situation.
No. of contact hours/hours in total incl. preparation, assignment(s) or the like: Lectures and preparation 15 hours, exercises 12 hours, mentoring and reporting and mentoring 45 hours.
Capacity limits: Due to the covid-19 situation, a maximum of 12 students is currently allowed. Remaining students will be put on a waiting list, and allowed into the course if the covid-19 situation allows it.
Objectives of the course:
The objective of the course is to introduce medium level skills of the “R programming language” and its IDE “RStudio”. We will review three topics, considering I) data wrangling where the tydiverse approach will be considered for data transformation and visualization, ii) exploratory data analysis necessary to conduct proper statistical tests as well as the basic assumptions of linear models and iii) statistical analysis for general and generalized linear and additive models, considering an overview of mixed effects models.
Learning outcomes and competences:
At the end of the course, the student should be able to:
Compulsory programme:
The course consist of four modules, where the student will work with real data and also in their own projects. The schedule is taught to be implemented in four days with enough time at the end of the course to discussion about the techniques used along the course but also to solve some particular questions students can have.
For ECTS to be awarded, PhD students must take active part in all parts of the course.
Course contents:
Module I. Introduction to programming in R and RStudio
Introduction to workflow management of .R, .RData and .Rproject
Data Wrangling I: “tidyverse – dplyr” package
Single table verbs (I)
Single table verbs (II)
Doble table verbs
Grouped operations
Piping
Functional programming
Data Wrangling II: “tidyverse – ggplot2” package
Advances features of graphical package “ggplot2”
Module II. Exploratory Data Analysis (EDA)
Variation and Co-Variation
Outlier detection
Outliers in one dimension
Outliers in two dimensions
Assumptions of Linear Models
Normality
Homogeneity of variance
Zero-Inflation
Collinearity
Relationship between y and x(s)
Interactions
Independence (Spatial and Temporal)
Module III. Univariate Statistical Analysis
General Linear Models (LM)
Simple & Multiple Linear Regression
Analysis of variance (ANOVA) & Co-Variance (ANCOVA)
Generalized Linear Models (GLM)
Generalized Linear Mixed Models (GLMM)
Additive and Generalized Additive Models (GAM)
Generalized Additive Mixed Models (GAMM)
Module IV. Students Work and Consultancy
Student work on (their own) projects
Prerequisites:
Basic skills in R, including basic notions of R programming and familiarity with the packages ggplot2. Some experience with statistical analysis.
Antonio Canepa (OneMind-DataScience & University of Burgos)
Type of course/teaching methods:
Lectures and practical “hands-on” exercises with a final presentation and discussion session.
Literature:
All the necessary material and the references for each chapter will be given by Antonio Canepa during the course.
Course homepage:
Course assessment:
PhD students will be evaluated based on their active participation in all course elements and on the final discussion of the results.
Provider:
OneMind-DataScience
Special comments on this course:
Students are expected to bring their own computer with latest version of R and RStudio installed.
Time and schedule:
Place:
Department of Bioscience, Frederiksborgvej 399, 4000 Roskilde
Registration: Please register by sending an e-mail to Niels Martin Schmidt (nms@bios.au.dk)