This past weekend was the 9th JapanR Conference hosted at LINE Corporation in Tokyo, Japan!

I’ve been back in Japan for nearly a year now and I’ve been going to nearly every one of the R user meetups here, TokyoR, and it’s been a great experience to learn about R and its wide variety of uses by Japanese practitioners and academics. Besides the near-monthly meetup of TokyoR there are smaller gatherings spread throughout Japan such as FukuokaR and TsukubaR but the meeting that gathers the biggest crowd is the JapanR Conference held every December since 2010. Of course, there are outliers such as the special TokyoR session this past July when Joe Rickert and Hadley Wickham visited Japan!

  • Want to learn more about TokyoR and when the next meeting is? Check this link! (Next meeting is January 19th, 2019)
  • Want to learn more about JapanR? Check this link!

This time around I took the time to take notes on the presentations and write up a little round-up blog post about it. As much as I would like to write about every single presentation there were a number of topics where I really wouldn’t have been able to explain well even if the presentation were done in English! You can watch most of the presentations on the JapanR YouTube channel. Although the talks are in Japanese maybe you’ll still find something useful in their slides… or you can read on as I give a summary on around 9 (out of 22) presentations that I found interesting!

NOTE: Some people presented using their Twitter/online name only, it’s just a cultural thing I’ve found here relating to privacy.

NOTE 2: There are still several presentations/slides that haven’t been uploaded yet but I will put more screenshots in as they become available so please check back in the coming days!

The Presentations

Creating your own RMarkdown template! - Kazuhiro Maeda

Kazuhiro Maeda is well known in the Japanese R community mainly through his online avatar (an elephant plushie) and his love for R Markdown. For this conference he presented about creating your own customized R Markdown template. Maeda-san noted that knowledge of CSS, JavaScript, and Pandoc are crucial for this task as he first explained how the render() function works to create a document of your choice. Within this explanation he highlighted how the output from a render() call depends on templates and options set through Pandoc, therefore it is important to create a template that has options that can be utilized by Pandoc. As making a template from scratch is extremely difficult, Maeda-san recommended that you find an existing template and play around with it to get used to the process involved.

An example template Maeda-san worked on was: having an image pop out when you click on it in your R markdown document. To do this you need to use “lightbox” (a JavaScript library) and implement a script in your Pandoc template that calls on this library at the appropriate time (when you click on an image). Following a very thorough and technical explanation he showed us the fruits of his labor in a live-demo that you can see here, where he knits the R Markdown document and clicks on a plot image, et Voila! It pops up very nicely!

Since I wasn’t able to translate the template creation process well enough (live-translating technical stuff is hard!), I will leave some good links to creating your own R Markdown template in English below:

Easy and modern data analysis with “R AnalyticFlow”! - Ryota Suzuki

Ryota Suzuki, CEO of ef-prime and author of the pvclust package, gave a talk on R AnalyticFlow which is a free software that his company built that utilizes the R environment for statistical computing in a GUI format. R AnalyticFlow was created in Java and is compatible with Windows, Mac, and Linux OSs as well as being available in English, Japanese, Chinese, and many more languages.

As you can see in the picture above, R AnalyticFlow allows you to represent your data analysis workflow through nodes and edges in a descriptive flow chart. In previous versions of the GUI, the goal was to use as much of base R functions as possible but more recently R data analyses including predictive modeling have been relying heavily on external packages such as the tidyverse, glmnet, xgboost, etc. So now the new direction Suzuki-san wants to take is to implement these packages into R AnalyticFlow and provide support to users who want to install their own packages to use in the GUI. Lastly, in a live demonstration he showed us a development version of the GUI as he made some simple ggplot2 plots with a simple mouse-and-click. Due to the new direction R AnalyticFlow is taking, Suzuki-san is looking for Java developers to help contribute to the development of the new versions of the GUI. If you know your way around Java and want to help, let him know!

I have completely understood Shiny! - Med_KU

In what was a very lively and fun presentation, @Med_KU took us through a very comprehensive tour of Shiny apps. First he talked about how to create a Shiny app via R Studio, working with the app.R and ui.R files, and publishing through R Studio Connect. Afterwards, he went through many examples with plotly and googleVis showing all the interactive/reactive capabilities that Shiny apps are known for. Personally, I’m more of a ggiraph fan myself (I use it at work for flexdashboards and Shiny apps) but @Med_KU’s presentation has gotten me interested in trying googleVis out as well!

I recommend watching the recording of the presentation as @Med_KU goes through a lot of different examples!

DID Analysis with R! - Yuki Yagi

University student Yuki Yagi presented on DID (Difference-in-differences) analysis and how he utilized it in one of his research papers. For those unfamiliar, DID is a statistical technique that observes that differential effect of a treatment (training program, medication intake, etc.) on a treatment group vs. a control group. A quick overview of DID can be found here.

The main question that Yagi-san investigated in his research paper was: “What would be the impact on the number of patents produced when research subsidies were given to companies that were already highly skilled and had a track record of producing many patents.”

DID is really easy to understand given the above diagram. On the left hand side is the measurement of the outcome variable, in this case the number of patents before the treatment while on the right is the measurement of the number of patents after the treatment (research subsidies) were given to the treatment group (blue dot is the control and red dot is the treatment group).

Above is how the model looked like for the research question with the use of dummy variables for B (subsidy), C (post-treatment), and D (subsidy + post-treatment).

Rugby Analytics with R! - Koichi Kinoshita

Koichi Kinoshita, a rugby performance analyst for the HITO-Communications Sunwolves and the Northland Rugby Union (in New Zealand), gave a presentation on how he applied his nascent R skills to his favorite sport. After giving a brief explanation about the state of sports analytics in rugby and his resolve to improve his data analytics skills in R, he showed us a number of plots from data he gathered from the Japanese national rugby league website.

Throughout the presentation Kinoshita-san tried to answer several questions such as “Is tackling percentage related to lost tries?”, “Do a higher number of tackles help stop line-breaks?”, “Can you stop line-breaks if you have a higher tackle success percentage?” among others as he explained his results in thorough detail using plots as a visual aid. Ultimately, his data showed that those Japanese rugby teams with over 86% tackle success rate were able to limit line-breaks to 10 or less and were very likely to win matches while those teams with a tackle success rate below 78% mostly wound up losing.

Armed with this knowledge, Kinoshita-san investigated further and found that across the entire season around half the teams totaled around 100~150 tackles in any single game. Assuming an 80% tackle success rate, a team with 100 tackles will have 20 mistackles while a team with 150 tackles will have 30 mistackles. So, the big question was: “How much will this 10 mistackle difference cost a team?

Consequently, Kinoshita-san ran a regression analysis on line-breaks against a mistackles and found that on average you concede 10 line-breaks from 30 mistackles. Coupling this with data presented that if a team concedes 10 line-breaks or over a team is ~70% likely to lose a game, a higher number of attempted tackles isn’t necessarily a good thing, what matters is preventing line-breaks with successful tackles!

In his conclusion, Kinoshita-san brought up a really good point in that it’s not enough to look at pure success/fail percentages and he brought up pass completion rate in soccer as an example. I concurred with his statement as in soccer you could naively assume a team being “dominant” or “good” if they have a really high pass completion rate, but if most of those successful passes came from the defenders and goalkeeper passing among themselves you can’t really say that that is a good thing. In soccer (and in other sports) it’s important to dig a little bit deeper, for example it might be more insightful to look at a soccer team’s pass completion rate in the opposition’s third of the pitch!

Using linear regression to find a new home in Tokyo! - Kaori Sawamura

This presentation by Kaori Sawamura showed off a fun real life case study using R. One of Sawamura-san’s co-workers wanted to move to Tokyo and become a “city boy”, so she set out to use some of her newly learned R skills to take try to find a dream home for him!

Here are the 3 basic requirements that Sawamura-san was given:

  • Somewhere with a gym close by (preferably Tokyo Metropolitan Gymnasium)
  • Somewhere close to Sendagaya Station
  • Somewhere with a monthly rent below 200,000 Yen

After filtering out houses above 200,000 Yen monthly rent Sawamura-san fitted a multiple linear regression as seen below:

After looking at the diagnostic plots for the model she took out a few outliers that she confirmed was mainly due to incorrect data on the housing website and was able to cut her list down to around 70 houses! Then using leaflet Sawamura-san mapped out all the potential houses that fit her criteria and labelled them with details about the house/apartment.

After doing the analysis, she showed it to her co-worker and got him to take a look around. Unfortunately, the information provided by the website didn’t account for things such as construction work, cleanliness and safety of the neighborhood, along with the added bonus of having to live with the landlord. So this case study was also a great reminder about the necessity of doing some field work in addition to analytics!

Using external C/C++ libraries with R! - Wataru Iwasaki

Wataru Iwasaki loves using C++ and R, he has given talks on the subject before and this time was no different as he talked about incorporating and using C/C++ libraries with R. First, he introduced a number of great online resources for developing Rcpp packages including stuff from his own website as well as the free “Rcpp for Everyone (English version)” written by fellow Japanese R user Masaki Tsuda.

Next, he talked about a couple of basic steps needed to incorporate C/C++ libraries as well as the advantages and disadvantages of the various styles of doing so.

In Iwasaki-san’s concluding remarks he called on the community to share their knowledge of handling external C++ libraries via social media or through blog posts.

Build an R compiler…with R! - igjit

For this presentation @igjit told us about his attempt at creating an R compiler written in R! You can see the fruits of his labor in the nrc package that he created here. Note: currently it only works on Linux so you should use something like Docker if you want to try it out on other OSs.

Here are some examples:

Simple addition:

You can even assign and use variables! (the function that got cut off is execute):

I unfortunately don’t have much experience with compilers so it was tough for me to understand the technical details but it was another pretty cool example of how you can use R for just about anything!

Tennis Player Ratings with R! - flaty13

In the second use of R in sports analysis we had @flaty13 take a look at tennis player ELO ratings. After a brief introduction relating to tennis, ELO ratings, and his views on the difficulty of rating/ranking tennis players, he used a data set from Kaggle and the PlayerRatings package to conjure up some visualizations on tennis player rankings over time.

As a matter of course, he also looked at Kei Nishikori’s performance and highlighted his rapid rise in the rankings during 2014 where he became the first Asian to reach a Grand Slam tournament final!

Overall, it was nice to see how sports analytics have grown these past few years in Japan as besides this tennis presentation and the rugby presentation earlier there have been presentations relating to soccer and baseball analytics at past TokyoR meetups.

Conclusion

Following both the main talks and the LTs, sushi and drinks were served at the after-party as R users from all over Japan shared their stories of success and struggle alike!

The main organizer, Atsushi Hayakawa mentioned that he eventually wants JapanR to grow even bigger in the coming years and to have every participant give a LT! Whether as a joke or if that is actually feasible it would be cool if we set the Guiness World Record for Most Presentations at an R Conference!

As you saw the quality of presentations at JapanR was very high. Unfortunately, most of the content was only in Japanese which I thought was a shame. That’s why I thought of doing this to share the knowledge of Japanese R Users to those around the world! This is my first time writing up one of these and I hope to contribute more and improve in the years to come!

If you’re ever in Japan, come join us for some R&R … and R!