This was my second RStudio Conference following last year’s edition in San Diego! In addition, at Tidyverse Developer Day I got a really cool chance to work on issues and contribute to making the Tidyverse better. This post won’t be a complete overview of the talks at the conference (others have already released some good blog posts on that note: Julia Silge, Brooke Watson, Zev Ross, etc.) and will be more of a reflection on how I contributed to the Tidyverse at #TidyverseDevDay and how I felt being at the conference.
As usual I collected a bunch of hex stickers at this conference, many of them that I already own… I seem to have a weird thing about collecting them but never using them (from Twitter I can see this isn’t something that’s exclusive to me however). Talking about hex stickers…for Tidyverse Developer Day each participant got a shiny Tidyverse hex medallion!
|Front side||Back side|
Too bad it’s not like the Dumbledore’s Army galleon, or Hadley could just send a covert message to all the participants, like “TidyDevs, Assemble!” Maybe next year, I suppose.
Anyways, let’s get started!
Tidyverse Developers’ Day
The day following RStudio::Conf those lucky people that got a ticket gathered at the “Sunset Room” to contribute to the Tidyverse. After grabbing a quick coffee and breakfast taco, Hadley made a small introduction outlined exactly what and how the day was going to go and then we all got to work. There was a large list of tagged issues ready for us but we could also choose our own and ask a RStudio member to tag it as “tidy-dev-day” for us.
After finding an issue I wanted to work on here was my basic workflow:
- Fork the repo of the package I needed to work on.
- Go into RStudio: File > New Project > Version Control >
Git > Paste the
.gitlink from your forked repository on GitHub (click on the big green “Clone or download” button)
- Once you’ve opened up the project, make sure to create a new branch through the Git tab (click on the icon with two purple boxes next to the gear icon)
- You’re all set to start coding!
There is another way to do most of this through the Git Bash terminal, which you can learn from Tony’s blog post here.
An important thing that I learned was that it’s good practice to create a different branch for working on a Pull Request for different issues on the same package! In terms of labeling the branches you’re working on Claus Wilke recommends the format “issue-issue#-brief-description-of-an-issue”.
The main things I focused on were improving documentation and providing additional examples. For these tasks I found it important to do a lot of research first. Thankfully I was able to find many Stack Overflow posts of people explaining the issues that I wanted to write about as well as #rstats blog posts/tutorials that could provide me with ideas on how to phrase things and write good small examples!
When you’re changing documentation in a package it’s
important to make sure you use
function to update changes. Don’t forget to run the R CMD Check as well
(the “Check” icon in the
Build tab). After you’re done with all of
that, it’s time to commit, push, then create a pull request (PR) to
merge your proposed changes with the master branch!
When you write a commit message you can use a hashtag and then number to refer to issues in the Github repo as well as use a number of keywords to close these issues automatically (in our case when the PR is merged).
If you check Github you can see that it automatically prepended the repo name and a link to the issue being referenced.
Then I waited to see if those changes were approved or if there were still a few things that needed changing:
OK! After reading the comments from Hadley (!) and Lionel (!) I go back into my branch in RStudio and fix those changes. When I commit and push to my forked repo again, it is automatically tracked in the PR. I usually make the comment, “edits to comply with PR review” when pushing again.
There we go, I have now officially contributed to the Tidyverse!
Another good resource for contributing to open-source is Nic Crane’s step-by-step blog post (she also presented at RStudio::Conf on building a Shiny app for genomic medicine) and Tony El-Habr’s blog post also on #TidyverseDevDay (whom I actually met at the Sports Analytics Birds-of-a-Feather session!).
I felt this was a fantastic opportunity for people of all skill levels to experience contributing to open source. The RStudio team were very helpful buzzing around the event space and for those extremely new to programming or git it was a valuable lesson as you were guided along the process from start to finish. For myself, I have had previous experience contributing to open source packages as well as creating, testing, bug-squishing R packages at my workplace but this was a great way for me to give back to the R community in what little way that I could. I actually still have a few more issues from Tidyverse Developer Day that are a work-in-progress and I hope to continue contributing in the future!
To get to Austin I had a long flight in from Japan with a 5 hour layover in Minneapolis. Bored, I decided to do some #TidyTuesday to pass the time. It turned out alright in the end but jet lag does not make for very interpretable code… While I’m still on the topic of #TidyTuesday… apparently, Thomas Mock had some TidyTuesday hex stickers but unfortunately I couldn’t get my hands on them!
Here were some of my highlights from Day One:
An awesome #DataForGood type of presentation by Brooke Watson who talked about using R to tidy data on families separated at the US border.
Tyler Morgan-Wall on
rayshader: I’ve been casually keeping up with developments on twitter but I was still wowed by the presentation, especially 3D printing. If I had that kind of tech when I was a kid I would’ve won ALL the science fairs with the most realistic looking baking soda volcano!
Thomas Pedersen came out with another
gganimatepresentation showing all the new features introduced since his last
gganimatetalk at UseR 2018. This is definitely a talk that you need to watch for all the examples! (Slides)
All in all Day One was great but I was still pretty exhausted from my long trip so I didn’t get to talk to as many people as I liked.
Day Two began with a great talk on teaching programming by
Felienne, her talk was so good I
realized she didn’t say anything about R until after she finished! My
biggest take-away from her was “You don’t become an expert by doing
expert things!” which I agreed with as a self-taught R user. For me it
was really about starting with the basics, integrating what I already
knew outside of R into what I did with R (ex. bringing my love of soccer
into creating World Cup
gganimate), and incrementally building up my skills through
reading blog posts and tutorials.
One of the most informative talks from my perspective was by Jim
Hester on dependencies. He talked about
how “not all dependencies are created equal” due to differences between
dependencies in install times, package sizes, and the system
requirements. He also talked about the “illusionary superiority” problem
every package developer gets in regards to overestimating their own
abilities and underestimating the probability of introducing new bugs
from adding dependencies. To address these concerns Jim introduced the
itdepends package which acts as a toolbox for dependency
decision-making. This package allows one to assess usage, measure
weights, visualize proportions, and assist in the removal of
dependencies through a series of
dep_*() functions. As I help develop
and maintain all the R packages that my NGO
uses for data processing/visualization, this talk and package will be
extremely useful for me to do some code “auditing” and find ways to
reduce technical debt.
Several other highlights from Day Two were:
Jesse Sadler talked about tackling problems dealing with accounting/inheritance data from 16th Century Europe using R. Along the way he created the
debkeeprpackage to help himself analyze non-decimal currencies! (Slides)
On Day Two I mustered up the courage and energy to go to two different Birds-of-a-Feather sessions, Public Sector/Government and Sports Analytics. At lunch I was able to meet R users from places like the Federal Reserve and the Federal Aviation Administration. I heard stories on how hard it was to convince people, especially non-technical higher-ups, to give them the green light to switch to R as well as more recent success stories of running workshops and tutorials within their departments. Even though I work for a NGO I felt comfortable talking to these people and it was a great way to exchange knowledge with people in a somewhat similar industry (especially since I was unable to attend the “Data for Good” Birds-of-a-Feather session). The shadow that hung over a lot of the people I met was that they were unable to work due to the government shutdown, I can only hope that the conference provided some good cheer and that they can get back to work soon.
In the afternoon break was the Sports Analytics Birds-of-a-Feather session in the main conference lounge area. While I was there I finally got to meet Mara in-person for the first time and I had an enjoyable time talking with her and the surrounding group of baseball and hockey team analysts on the latest trends and topics like fantasy sports and analytics in the betting industry. Overall these Birds-of-a-Feather groups were a great way to mingle with people in industries you’re interested in but I thought it was a shame how some were longer/shorter depending on which slot the event happened in. Understandably it is quite hard to schedule so many different groups equally, but maybe a dedicated “industry” session block could be worked in next year?
To wrap the conference up David Robinson gave a great keynote on spending time on contributing to open-source, “public work”. Whether through answering questions on SO or on Twitter, writing up a blog post, to giving a talk at a conference/meetup, David talked about the many ways to contribute to the knowledge pool in not just R and data science, but also for your respective research domain as well. His words really resonated with me as he was the one back about a year-and-a-half ago that gave me confidence to start my own blog and share my stuff with the #rstats community. Since then I got a job doing R stuff and even gave a talk at the TokyoR meetup last summer! One of my goals for this year is to try to do a talk in Japanese while a long-term goal is to present at one of the big R conferences. (Slides)
Throughout the conference I managed about four-five hours of sleep on average, which seemed to have been a thing for other people as well:
For me it was mostly jet lag but also I was kept up by looking up all the cool stuff I learned and how I could apply it at work and for my own personal projects… well, and looking up taco places to eat at on the next day too!
This conference was the one I talked to people the most up until now as I’ve slowly gained confidence in working in R and being a member of this community. I was even recognized by some people for my soccer-related blog posts, which is a first! I almost feel stupid for being rather timid in the past and I want to try and be more outgoing in future conferences (possibly UseR in Toulouse this year)!
For next year I already grabbed a SuperFan ticket so I hope to see some old faces and new faces next year in San Francisco. It’s going to be nice to go back to the Bay!