Lab 2: Julia Quickstart
Functions, Logic, and Packages
First steps
We start by loading the packages we will use in this lab
Defining a function
In index.qmd
, we read in a CSV file from scratch. However, we’d like to repeat this process for each year of data, and to do it in a consistent way so that we can read in the data for all available years into a single file. To do this, we’ll write a function that we can use to read in the data for any year. Specifically, our function will take in the year as an argument, and return a DataFrame
with the data for that year.
Before we do that, let’s define a function that will return the filename for a given year. It’s often valuable to stack several functions together.
Now we’re ready to define our function:
function read_tides(year::Int)
# define the CSV file corresponding to our year of choice
fname = get_fname(year)
# a constant, don't change this
date_format = "yyyy-mm-dd HH:MM"
# <YOUR CODE GOES HERE>
# 1. read in the CSV file and save as a dataframe
# 2. convert the "Date Time" column to a DateTime object
# 3. convert the " Water Level" column to meters
# 4. rename the columns to "datetime" and "lsl"
# 5. select the "datetime" and "lsl" columns
# 6. return the dataframe
end
# print out the first 10 rows of the 1928 data
first(read_tides(1928), 10)
Fill out this function. Your function should implement the six steps indicated in the instructions. Use the example code from index.qmd
to help you. When it’s done, convert it to a live code block by replacing ```julia``` with ```{julia}```. When you run this code, it should print out the first 10 rows of the 1928 data. Make sure they look right!
Building the dataset
Now that we have the ability to read in the data corresponding to any year, we can read them all in and combine into a single DataFrame
. First, let’s read in all the data.
- Hint: to vectorize a function means to apply it to each element of a vector. For example,
f.(x)
will apply the functionf
to each element of the vectorx
. This is a very common operation in Julia! - Update the code blocks below, then replace ```julia``` with ```{julia}```.
years = 1928:2021 # all the years of data
annual_data = # call the read_tides function on each year (see hint above!)
typeof(annual_data) # should be a vector of DataFrames
Next, we’ll use the vcat
function to combine all the data into a single DataFrame
.
And we can look at the last 5 rows
Finally, we’ll make sure we drop any missing data.
Plots
- Plot the hourly water levels for March 2020, using subsetting and plotting techniques from the instructions
- In the instructions, we plotted the average monthly water level from each month using
groupby
. Repeat this analysis, using the full dataset (all years). - Now repeat the analysis, but group by day of the year. What do you notice? (Hint: use
Dates.dayofyear
to get the day of the year from aDateTime
object)