1 Deutsche Version

1.1 Links

1.2 Hinweise zur Bearbeitung

Bitte beantworten Sie die Fragen in einer .Rmd Datei. Sie können Sie über Datei > Neue Datei > R Markdown... eine neue R Markdown Datei erstellen. Den Text unter dem Setup Chunk (ab Zeile 11) können Sie löschen. Unter diesem Link können Sie auch unsere Vorlage-Datei herunterladen (Rechtsklick > Speichern unter…).
Informationen, die Sie für die Bearbeitung benötigen, finden Sie auf der Website der Veranstaltung
Zögern Sie nicht, im Internet nach Lösungen zu suchen. Das effektive Suchen nach Lösungen für R-Probleme im Internet ist tatsächlich eine sehr nützliche Fähigkeit, auch Profis arbeiten auf diese Weise. Die beste Anlaufstelle dafür ist der R-Bereich der Programmiererplattform Stackoverflow
Auf der Website von R Studio finden Sie sehr hilfreiche Übersichtszettel zu vielen verschiedenen R-bezogenen Themen. Ein guter Anfang ist der Base R Cheat Sheet

1.3 Ressourcen

Da es sich um eine praktische Übung handelt, können wir Ihnen nicht alle nützlichen Befehle einzeln vorstellen. Stattdessen finden Sie hier Verweise auf sinnvolle Ressourcen, in denen Sie für die Bearbeitung unserer Aufgaben nachschlagen können.

Ressource	Beschreibung
Field, Kapitel 4	Buchkapitel, das Schritt für Schritt in die Arbeit mit `ggplot2` einführt. Große Empfehlung!
ggplot2 Cheat Sheet	Übersicht über die meisten gängigen ggplot2-Befehle
Peters ggplot2-Referenz	Peter pflegt eine große Sammlung von Beispielen für Plots mit dem dazugehörigen Code: Eine Ressource zum Nachschlagen

1.4 Tipp der Woche

Mit der Tastenkombination ctrl + shift + m (Windows) oder cmd + shift + m (Mac) können Sie per Knopfdruck das Pipe- Symbol %>% einfügen.

1.5 Daten einlesen

Setzen Sie ein sinnvolles Arbeitsverzeichnis für den Übungszettel (in der Regel der Ordner, in dem Ihre .Rmd liegt). Fügen Sie eine passende Code-Zeile an den Anfang ihres .Rmd-Dokuments ein.
Laden Sie den Datensatz mpg.csv herunter und speichern ihn in Ihrem Arbeitsverzeichnis (idealerweise haben Sie noch den Ordner vom letzten Übungszettel - speichern Sie den Datensatz im Unterordner /data).
Öffnen Sie mpg.csv mit einem einfachen Texteditor und schauen Sie, mit welchem Zeichen die einzelnen Spalten getrennt sind.
Laden Sie die Pakete des tidyverse. Fügen Sie eine passende Code-Zeile an den Anfang ihres .Rmd-Dokuments.
Nutzen Sie den Befehl read_delim(), um mpg.csv unter dem Namen mpg_data einzulesen. read_delim() ist eine verallgemeinerte Form von read_csv(), in der sie mit dem Argument delim = "<Trennzeichen>" manuell angeben können, durch welches Zeichen die Spalten in Ihrem Datensatz getrennt sind.

1.6 Grundlegendes zu Grafiken

Zum Erstellen von Grafiken nutzen wir das Paket ggplot2. Da das Paket zum tidyverse gehört, brauchen Sie es nicht extra laden, wenn Sie bereits das tidyverse geladen haben (Siehe Aufgabe 1.4).

1.6.1 Objekt erstellen

Grafiken mit ggplot2 werden eigentlich immer nach dem gleichen Muster erstellt. Zunächst nutzen Sie den Befehl ggplot(), um ein Grafik-Objekt zu erzeugen:

example_plot <- ggplot(data, aes(x = variable_1, y = variable_2))

Sie sehen: Das erste Argument in der Klammer ist der Datensatz, den Sie verwenden möchten. Anschließend legen Sie in dem Argument aes() fest, welche Variable auf der x-Achse und welche auf der y-Achse dargestellt werden soll.

1.6.2 Grafische Ebene hinzufügen

Im nächsten Schritt legen Sie eine Ebene aus grafischen Elementen (sog. geom) auf dieses Objekt. Dadurch wird die Grafik “gebaut”. Sie können beliebig viele Ebenen aufeinanderlegen, diese werden immer mit einem + verknüpft.

example_plot + geom_point()

1.6.3 Abspeichern

Es gibt zwei Varianten:

1.6.3.1 Über den Viewer

Sie können über die “Export” Schaltfläche im Viewer arbeiten:

1.6.3.2 Über die Syntax

Wenn Sie einen Plot über die Syntax speichern möchten, dann müssen Sie zunächst die Ebenen, die Sie hinzugefügt haben, mit abspeichern:

exmaple_plot <- example_plot + geom_point()

Dann verwenden Sie ggsave(), um den Plot zu speichern. Er wird in Ihrem Arbeitsverzeichnis abgelegt. Sie geben in dem Befehl zunächst den gewünschten Dateinamen in Anführungszeichen an (Endung nicht vergessen!), und geben als nächstes den Plot an, den Sie speichern möchten.

ggsave("example_plot.png", example_plot)

1.7 Scatterplot

Nutzen Sie ?mpg, um sich eine Beschreibung des Datensatzes anzeigen zu lassen.
Erstellen Sie mit ggplot() ein Objekt namens displ_plot, in dem Sie als Datensatz mpg_data spezifizieren. Auf der x-Achse sollten Sie den Hubraum (engine displacement) des Modells eingeben. Auf der y-Achse sollte stehen, wie viele Meilen pro Gallone Treibstoff (city miles per gallon) gefahren werden können.
Nutzen Sie geom_point(), um zu diesem Objekt nun eine Ebene aus Punkten hinzuzufügen.
Nutzen Sie geom_smooth()mit den Argumenten method = "lm" und se = FALSE, um dem Plot zusätzlich eine Regressionsgerade hinzuzufügen.
Probieren Sie aus, was passiert, wenn man bei geom_smooth() die beiden Argumente aus der vorherigen Aufgabe weglässt.

1.8 Barplot

Erstellen Sie ein neues Plot-Objekt namens drv_plot mit folgenden Spezifikationen: Der Datensatz sollte wieder mpg_data sein, auf der x-Achse sollte die Art des Antriebs (front, rear oder 4wd) dargestellt sein, und auf der y-Achse wieder der Treibstoffverbrauch in Miles per Gallon.
Fügen Sie nun eine summary-Ebene mit dem Befehl stat_summary() zu dem Plot hinzu. Nutzen Sie in stat_summary() das Argument fun.y = mean, um ggplot mitzuteilen, dass Sie den Mittelwert der Beobachtungen auf der y-Achse darstellen möchten. Nutzen Sie außerdem in stat_summary() das Argument geom = "bar", um sich einen Barplot ausgeben zu lassen.
Fügen Sie dem Befehl aes() bei der Definition des Objektes drv_plot das Argument fill = drv hinzu, um den Barplot farblich aufzuhübschen. Führen Sie den Befehl aus der vorherigen Aufgabe danach erneut aus, um den Plot zu aktualisieren.
Fügen Sie dem Plot nun eine Beschriftungs-Ebene mit dem Befehl labs() hinzu. Vergeben Sie in labs() mithilfe der Argumente x = "text", y = "text", title = "text"und fill = "text" sinnvolle Namen für die x-und y-Achse, geben Sie dem Plot einen Titel und eine Beschriftung für die Farben.
Verliehen Sie den Säulen einen schwarzen Rand, indem Sie im Befehl stat_summary() das Argument color = "black" hinzufügen.

1.9 Linienplot

Bauen Sie den folgenden Plot nach. Hinweis: Die Punkte stellen die Mittelwerte dar.

## Warning: `fun.y` is deprecated. Use `fun` instead.

Nutzen Sie den Befehl stat_summary(), um dem Plot eine Ebene mit einer Linie hinzuzufügen, die die Punkte verbindet. Tipp: Nutzen Sie das Argument aes(group = 1) in stat_summary(), um ggplot mitzuteilen, dass alle Punkte in einer Gruppe gruppiert werden.
Nutzen Sie den Befehl stat_summary() mit den Argumenten fun.data = mean_cl_normal und geom = "errorbar", um dem Plot eine Ebene mit Fehlerbalken hinzuzfügen, die die 95%-Konfidenzintervalle der Mittelwerte angeben.
Nutzen Sie das Argument width = 0.2 in stat_summary(), um die Breite der Konfidenzintervalle einzustellen.
Fügen Sie dem Plot sinnvolle Achsenbeschriftungen und einen Titel hinzu.

1.10 Histogramm

Nutzen Sie geom_histogram(), um sich ein Histogramm der Miles per Gallon (Stadt) anzeigen zu lassen.
Auf Seite 2 des ggplot2-Cheatsheets (unten, rechts mittig) finden Sie eine Übersicht über verschiedene Themes, mit denen Sie Ihre Plots verschönern können. Geben Sie Ihrem Histogramm ein Theme, das Ihnen gefällt.

1.11 Plot nachbauen

Bauen Sie mit dem Datensatz text_messages.dat den unten angezeigten Plot nach. Sie müssen ihn dafür zunächst einlesen und dann vom wide ins long Format bringen. Der Datensatz enthält die Testergebnisse in einem Grammatik-Test zu einem Baseline-Zeitpunkt und einem 6-Monats-Follow up. Es gab zwei Gruppen, die in der Variable Groups definiert sind.

## Warning: `fun.y` is deprecated. Use `fun` instead.

## Warning: `fun.y` is deprecated. Use `fun` instead.

Tipps:

Der Datensatz enthält Spalten, die mit Tabstopps getrennt sind. Diese werden in R durch "\t" kodiert.
Sie müssen den Datensatz nicht nur einlesen, sondern auch vom wide- ins long Format bringen. Denken Sie dazu an den Befehl gather().
Denken Sie daran, bei den Ebenen passende Gruppierungen anzugeben. Sie können Variablen zum Gruppieren benutzen.
Werfen Sie einen Blick auf das ggplot2-Cheatsheet zu den Befehlen geom_point() und geom_line(). Deren Argumente können Sie auch in stat_summary() verwenden, wenn Sie dort geom = "point" oder geom = "line" spezifiziert haben.

1.11.1 Als Barplot

Versuchen Sie nun die gleiche Information, inklusive Fehlerbalken, als Barplot darzustellen.

Tipps:

Bei Barplots gibt fill die Farbe der Säule (die Füllung) an, und color die Farbe des Rahmens.
Das Argument position = "dodge" kann helfen, wenn die Säulen sich überlappen.
Das Argument position = position_dodge(width = .90) kann helfen, schmale Ebenen an breiteren Ebenen auszurichten.

1.12 Rendern

Rendern, bzw. knitten Sie nun das Dokument über die Tastenkombination strg + shift + k (Windows) oder cmd + shift + k. Wenn das funktioniert: Top gemacht! Wenn nicht: Schauen Sie sich die Fehlermeldung an, und betrachten Sie insbesondere die Zeilen Ihrer Syntax, die in der Fehlermeldung auftauchen. Suchen Sie nach dem Fehler und probieren Sie es erneut!

1.13 Literatur

Anmerkung: Diese Übungszettel basieren zum Teil auf Aufgaben aus dem Lehrbuch Dicovering Statistics Using R (Field, Miles & Field, 2012). Sie wurden für den Zweck dieser Übung modifiziert, und der verwendete R-Code wurde aktualisiert.

Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: SAGE Publications Ltd.

2 English Version

Exercise sheet in PDF format for printing sheet_plots.pdf

2.1 Some hints

Please give your answers in a .Rmd file. You may generate one from scratch using the file menu: ‘File > new file > R Markdown …’ Delete the text below Setup Chunk (starting from line 11). Alternatively you may use this sample Rmd by donloading it.
You may need the informations from the start page of this course.
Don’t hesitate to google for solutions. Effective web searches to find solutions for R-problems is a very useful ability, professionals to that too … A really good starting point might be the R area of the programmers platform Stackoverflow
You can find very useful cheat sheets for various R-related topics. A good starting point is the Base R Cheat Sheet.

2.2 Ressources

This is a hands on course. We cannot present you all the useful commands in detail. Instead we give you links to useful ressources, where you might find hints to help you with the exercises.

Ressource	Description
Field, Chapter 4	Book chapter with a step for step introduction to `ggplot2`. Recommendation!
ggplot2 Cheat Sheet	Overview of most common ggplot2 commands
Peters ggplot2-Referenz	Peter offers a big collection of plots with source code to generate them. A resource to find running examples.

2.3 Tip of the week

We can insert the pipe symbol %>% using the shortcut ctrl + shift + m (Windows) oder cmd + shift + m (Mac).

2.4 Read data

Define an appropriate working directory for this exercise sheet. This should usually be the folder, where your Rmd-file is located. Insert a command to do that in your Rmd-document.
Load the data mpg.csv and store it in your working directory. You might still have the folder you used for your last sheet - then store the data in a data subdirectory.
Open the file mpg.csv in a text editor and check, which character separates the content in the lines to form the future columns.
Assure, that the package tidyverse are loaded. Insert a code line for that in the beginning of your Rmd-file.
Use the command read_delim() and define a data object named mpg_data using the data in file mpg.csv. read_delim() is a generalized version of read_csv(), with which you define manually the separator used in your data file by setting the argument delim = "<separator>". C. f. Tabulator is specified by ".

2.5 Plot basics

We use package ggplot2 for creating plots. This package is part of tidyverse and we don’t have to load it separately, when we already have loaded tidyverse (see task 1.4).

2.5.1 Create plot objekt

Generating plots works always the same. We create a plot-object using the command ggplot()

example_plot <- ggplot(data, aes(x = variable_1, y = variable_2))

As we can see, the first argument is the data-object to use. With aes() we define, which variable assigned to x-axis and y-axis respectively.

2.5.2 Add a layer

Next step is to add a layer of graphical elements, a so called geom to the object. This is, how plots are constructed. We may add as much layers, as we like. They are always connected using +.

example_plot + geom_point()

2.5.3 Saving

There are two variants:

2.5.3.2 Via syntax

To save a plot object with all its layers we have to first store the complete plot-object:

exmaple_plot <- example_plot + geom_point()

Then we use ggsave() to save the plot-object. It is stored in your working directory. Add a name for the plot-file in quotation marks (including extension) as first parameter. Next parameter is the plot-object to store.

ggsave("example_plot.png", example_plot)

2.6 Scatterplot

Use the command ?mpg to get a description of the data we use.
Create a ggplot-object named displ_plot, which uses mpg_data as dataset. X-axis should show engine displacement of the model. Y-axis should visualize the city miles per gallon.
Use geom_point() to add a layer to the above object that shows points at the coordinates of the two variables.
Use geom_smooth() with the arguments method = "lm" and se = FALSE to add an additional layer with a regression line.
Try what happens if you don’t add the above two arguments you set in your geom_smooth() command.

2.7 Barplot

Crate a new plot-object named drv_plot with the following specifications: The dataset should be mpg_data again. X-axis should be type of drive drv (front, rear or 4wd). Y-axis should be consumption Miles per Gallon.
Add a summary-layer using stat_summary(). In stat_summary() use argument fun.y = mean to tell ggplot that you want to show the mean of the observed data on the y-axis. Use also the argument geom = "bar" to get a barplot.
Add argument fill = drv while defining object drv_plot to colorize the barplot. Rerun the command of the subtask above to update the plot.
Add a label-layer to the plot using the command labs(). In labs() set arguments x = "text" and y = "text" and fill = "text" to give meaningful informations. Add also a title and an information of the meaning of the colors used.
Add a black margin to the bars using the argument color = "black" in the command stat_summary().

2.8 Lineplot

Create the following plot.

Hint: points show mean values.

## Warning: `fun.y` is deprecated. Use `fun` instead.

Use the command stat_summary() to add a layer that connects the points with a line. Tip: use argument aes(group = 1) in stat_summary() to tell ggplot that all points should be grouped in one single group.
Use the command stat_summary() with the arguments fun.data = mean_cl_normal and geom = "errorbar" to add a layer with errorbars, that show the 95% confidence interval around the means.
Use the argument width = 0.2 in stat_summary() to configure the width of the whiskers of the confidence interval indicator.
Add meaningful labels for the axes and an appropriate title.

2.9 Histogram

use geom_histogram() to get a histogram of Miles per Gallon (city).
On page 2 of the ggplot2-Cheatsheets (below, mid right) you find an overview of various Themes that are useful to make your plots more beautiful. Apply a Theme you like to your histogram.

2.10 Reconstruct plot

Use dataset text_messages.dat to reconstruct the plot below. You have to first read in the data and then convert them from long to wide format. The data are test results of a grammer test from a baseline and a 6 month follow up. There are two groups that are defined in variable Groups.

## Warning: `fun.y` is deprecated. Use `fun` instead.

## Warning: `fun.y` is deprecated. Use `fun` instead.

Tips:

The delimiters used in the data file are tabs. In R you code tabulators using "\t".
You don’t have to simply read in the data, you also have to convert it from wide to long format. Remember the command gather() in this context.
Remember to set adequate grouping to the layers. You may use variables to group.
See ggplot2-Cheatsheet with regard to the commands geom_point() and geom_line(). Their arguments can also be used in stat_summary() if you specified geom = "point" or geom = "line" there.

2.11 As Barplot

Try to visualize the same information including errorbars as barplot.

Tips:

Command barplot() understands parameter fill that specifies the color of the column (filling) and color refers to the frame color.
The argument position = "dodge" may help, if the columns overlap.
The argument position = position_dodge(width = .90) may help to adapt smaller layers to larger layers.

2.12 Rendering

Render or knit your Rmd file using the shortcut strg + shift + k (Windows) or cmd + shift + k. If that works: Well done! If not, look at the error message, in special the lines in the syntax, where the error occured. Correct the error and start over again.

2.13 Literature

Annotation: This exercise sheet is based in part on the exercises from the textbook Dicovering Statistics Using R (Field, Miles & Field, 2012). We modified it for the purpose of this exercise and actualized the R-Code.

Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: SAGE Publications Ltd.

Version: 16 April, 2021 22:38

Übungszettel Plots und Grafiken - Exercise sheet plots and graphics

M.Psy.205, Dozent: Dr. Peter Zezula

Johannes Brachem (johannes.brachem@stud.uni-goettingen.de)

1 Deutsche Version

1.1 Links

1.2 Hinweise zur Bearbeitung

1.3 Ressourcen

1.4 Tipp der Woche

1.5 Daten einlesen

1.6 Grundlegendes zu Grafiken

1.6.1 Objekt erstellen

1.6.2 Grafische Ebene hinzufügen

1.6.3 Abspeichern

1.6.3.1 Über den Viewer

1.6.3.2 Über die Syntax

1.7 Scatterplot

1.8 Barplot

1.9 Linienplot

1.10 Histogramm

1.11 Plot nachbauen

1.11.1 Als Barplot

1.12 Rendern

1.13 Literatur

2 English Version

2.1 Some hints

2.2 Ressources

2.3 Tip of the week

2.4 Read data

2.5 Plot basics

2.5.1 Create plot objekt

2.5.2 Add a layer

2.5.3 Saving

2.5.3.1 Via menu

2.5.3.2 Via syntax

2.6 Scatterplot

2.7 Barplot

2.8 Lineplot

2.9 Histogram

2.10 Reconstruct plot

2.11 As Barplot

2.12 Rendering

2.13 Literature