32 Home Exercises

Instructions

The at-home exercises should be completed using Posit cloud, Home Exercise 6. Create a new .Rmd file (use File -> New File -> R Markdown.

I advise saving the new file straight away. Because you’ll submit this file as an assignment, use a standardised name: lastname_firstname5. Regularly re-save the file.

When you have finished the exercise (or part of it), ‘knit’ the file. Export the .Rmd and the html as a .zip file, and upload this to the assignment area.

Task

Using the metadata dataset from the movie dialogue dataset we used previously (meta_data7.csv), chart the average gross per year using an appropriate geom.
1. Limit the x axis of the chart to the years 1990 - 2000. Do this using the scale, and not by filtering the data first.
2. Set the axis labels on your chart to display every two years beginning from 1990 (1990, 1992, etc.)
3. Set the fixed color (not mapped to data) of the geom of the chart to something other than the default. Here is a good list of all the named colors in R. You can also use any hex code using a color picker.
Create a bar graph of the gross takings for each movie in the Star Wars franchise.
1. Reorder the chart going from the highest to the lowest takings (see this section in the course book for tips)
2. Flip the coordinate of the chart so that the movie names are on the vertical axis.
3. (optional) Give the bars a black outline, set a custom fill colour, and decrease the transparency slightly.
Import the Nobel Prize winners ‘Laureates’ dataset (laureates_df.csv) and the ‘Prizes’ dataset (prize_df.csv). These datasets contain information on the winners of the Nobel prize, up until 2024.
1. Create a bar chart charting the number of winners from each continent per year. You’ll need to join the two datasets together using the relevant column so that you have the information on the winner continent and the prize date in the same dataframe.
2. Change the palette for the fill of the bars to ‘Dark2’.
3. Set the major breaks on the y axis to every 20 and the minor breaks to every 5.
Import the Titanic dataset (titanic_df_full.csv) and do the following:
1. Calculate the average fare paid for each embarking location, and draw it as a bar chart, with the embarking location on the x axis and the average fare on the y axis.
2. Change the default fill scale of the bars to a gradient starting with the colour aquamarine and ending with the colour deepskyblue.
3. Reverse the legend bar and set the height to 10 cm.
(optional, slightly more challenging). Last week we looked at gender in the IMDB dataset. Can you create a visualisation which investigates which genres became more or less equal over time? Limit your investigations to a sensible number of the top genres to help with visualisation, or group the genres together into 6 or 7 categories. Once you’re done, change 5 scale elements.

Tip

The genres in the IMDB genre dataset list up to three genres for each movie, separated by a comma. This isn’t very helpful for data analysis or visualisation - let’s assume it would be better to treat each movie as having separate observations for each genre listed. One way to do this is by using some new functions: separate() and pivot_longer(). You may also need to clean the results slightly using str_trim().