1: Welcome and RStudio Vision

Tareef Kawaf

Three key concepts
The way of RStudio
Reproducibility
Imporantance of 1439 (printing press technology)
Mass communication
Code is the reproducibility, Reproducibility is good science, and good science is good business.
If you belileve in the code and reproducibility, you can get reuse, automation, scheduling, and parametrization.

4 reasons to love code:

Repeatable
Inspectable
Reusable
Diffable

Code Importance

Complex problems will require code
Code is communication, and communication is critical
With code you can inspect how a problem is solved and either adopt it, or figure out how to improve it yourself.
With code, the answer is always YES!!

2: R shiny Keynote Talk

Joe Cheng

slide: https://speakerdeck.com/jcheng5/shiny-in-production

rstudio/cranwhales

cranwhales is a Shiny example application, meant to demonstrate how to convert synchronous (traditional) Shiny apps to asynchronous ones.

Staging and Production

Goals:

keep it up -Unplanned outages are rare or nonexistent
keep it safe -Data, functionality, and code are all kept safe from unauthorized access
keep it correct
-works as intended, provide right answers
keep it snappy
- Fast response time, ability to predict needed capacity for expected traffic

New tools for Shiny in production

Rstudio connenct
shinytest- automated UI testing for Shiny
shinyloadtest- load testing for shiny
profvis - profiler for R
Plot caching
Async -last resort technique for dealing with slow operations

Big Question: Is Cranwhales fast enough?

Performance workflow

Using shinyloadtest to see if it’s fast enough
if not, use profvis to see what’s making it slow.
optimize
3.1 Move work out of Shiny 3.2 Make code faster
3.3 Use Caching
3.4 Use async
Repeat!

3: Getting it right: Writing reliable and maintainable R code

Amanda Gadrow, RStudio

Characteristics of good code:

Reliability
Reproducibility
Flexibility
Longevity
Scalability

How do we determine the quality of our code?

Execution feedback
User feedback
measure it with tests; functional, integration, perfromance.

make the part of your regular development cycle for;

faster updates
R package-testthat, usethis
R package-shinytest

Modular design for ease of testing and maintenance

Decoupling, code isolation
DRY: Don’t Repeat Yourself
KISS: Keep It Simple, Stupid
Consistent code style
Happy Git and GitHub for the useR

Resources to read/watch:

R packages by Hadley Wickham, particularly the chapter on Testing
Advanced R by Hadley Wickham, for advice on good software design
Jenny Bryan’s video on “Code smells and feels” from useR!2018
Charles Gray’s blog post on the benefits of a workflow based on testthat::auto_test
Winston Chang’s talk on testing shiny apps with shinytest from rstudio::conf(2018)

4: A guide to making your data reproducible

Karthik Ram

Set up Research compendium principles

Stick with the conventions of your peers
Keep data, methods and outputs separate
Specify your computational environment as clearly as you can

3 separate different components ofa compendium

Data (Small –> Medium)
1.1 Small data: put small data inside packages, especially if you ship a methods package with your analysis
1.1.1 CRAN =< 5mb
1.2 piggyback package: Attach large [data] files to Github repositories
1.3 Medium data arkdb package: Attach large [data] files to Github repositories
Computing environment

It’s important to isolate the computing envrionment so that changes in software dependencies don’t break your analysis.
Adding a Dockerfile to your compendium

Many ways to write a Dockerfile for your project

o2r/containerit
jupyter/repo2docker

API
metadata (Tidy data)
tensorflow
Binder
Binder is an open source project that is designed to make it really easy to share analyses that are in notebooks.

http://github.com/karthik/rstudio2019

Git + Docker + RStudio

Workflows

Include a workflow to manage relationsihps between data output and code.

drake

General purpose workflow manager & pipeline toolkit for reporducibility and high-performance computing.

5: Solving R for data science

Jeffrey Arnold

R for Data Science: Exercise Solutions: https://jrnold.github.io/r4ds-exercise-solutions

github link: https://github.com/jrnold/r4ds-exercise-solutions

tidyverse work flow

6: Working with categorical data in R without losing your mind

Amelia McNamara

Slides: http://www.amelia.mn/WranglingCats.pdf

Factors

R’s representation of categorical data. Consists of: 1. A set of values 2. An ordered set of valid levels

eyes <- factor(x = c("blue", "green", "green"),
               levels = c("blue", "brown", "green"))
eyes

## [1] blue  green green
## Levels: blue brown green

stringAsFactors: An unauthorized biography

Historical view from Roger Peng’s blog

Solution : Use Tidyverse family read_csv() function instead of read.csv()
But sometimes, you still need factors…
- In particular, for modeling (changing reference levels, etc) and plotting (reordering elements)
wrangling categorical data in R : https://github.com/dsscollection/factor-mgmt/blob/master/analysis/working_with_factors.pdf
package - forcats https://forcats.tidyverse.org/

Level manipulation functions
values change to match levels

fct_recode() Relabel levels “by hand”
fct_relevel() Reorder levels “by hand”
fct_reorder() Reorder levels by another variable
fct_collapse() Collapse levels “by hand”
fct_lump() Lump levels with small counts together
fct_other() Replace levels with “Other”
R Syntax Comparison:: cheat sheet Scroll to the bottom and find the Factors with forcats Cheat Sheet https://www.rstudio.com/resources/cheatsheets/

library(tidyverse)
library(forcats)
summary(GSS$OpinionOfIncome)
GSS <- GSS %>%
    mutate(tidyOpinionOfIncome = 
               fct_relevel(OpinionOfIncome,
                           "Far above average",
                           "Above average",
                           "Below average",
                           "Far below average"))
summary(GSS$tidyOpinionOfIncome)

defensive coding
summary(), summary(), summary()
Certain testing function

Takeaways:

• Use forcats • Practice defensive coding • summary() is your friend • assertthat and testthat • Check out http://bit.ly/WranglingCats

7: R4DS online learning community

Jesse Mostipak

1. install R base
1. install RStudio
1. google How to [XYZ] in R?

R4DS Online community

8: Scaling R with Spark by Javier Luraschi, RStudio

how to use R with Spark?

library(sparklyr)
spark_install()
sc <- spark_connect(master = "local")

Doc: <spark.rstudio.com>

Blog: <blog.rstudio.com/tags/sparklyr>

9: Introducing MLflow for R-Managing the Machine Learning Lifecycle by Kevin Kuo

Mlflow- open source platform

MLflow

library(mlflow)
mlflow_log_param("foo")

10: Democratizing R using plumber

download today

Resource:

Plumber

OperAPI

Github Repo

Example of API library?

11: Solving a $15M/year Employee Attrition Problem with the tidyverse (work flow) by Matt Dancho

Data science workflow

Business problem –> techinical skills –> efficient process –> business value

12: 3D mapping by Tyler

what is rayshader?

https://github.com/tylermorganwall/rayshader

13: R/qtl2: Rewrite of a very old R package by Karl Broman, University of Wisconsin

https://www.biostat.wisc.edu/~kbroman/presentations/rqtl2_rstudio2019.pdf

14: Keynote talk Explicit direct instruction in programming education by Felienne

Why they all have twitter?

How to teach programming and other things?

15: R Markdown the bigger picture

Garret Grolemund

Slide: https://github.com/garrettgman/rmarkdown-the-bigger-picture/blob/master/rstudioconf2019-rmarkdown.pdf

The replication prices is the dark cloud.
Challenges in irresproducible researech.
The replication crisis for you is the credibility crisis.
Many applications of statistics are cargo-cult statistics.

One last thing: Reproducibility is an opportunity.

16: Pagedown: Creating Beautiful PDFs with R Markdown + CSS by Yihui Xie

pagedown_github * https://github.com/rstudio/pagedown

slide : http://bit.ly/pagedown

In HTML and the Web I trust.

It is easier parsing HTML than parsing PDF.

HTML and CSS will soon catch up the Typesetting style in Latex but it is difficult for Latex or Word to catch up in other aspects in HTML, such as the interactivity.

Installation

You can install from CRAN (an initial version has been released), but this package is only two months old. You are recommended to install from Github:

remotes::install_github($\color{red}{\text{'rstudio/pagedown'}}$)

This package requires Pandoc 2.x, which is currently bundled in RStudio 1.2.x (you may install the preview version of RStudio).

Google Chrome or Chromium is recommended to view and print HTML pages generated from this package.

Preview HTML/ Print to PDF

Manual: https://pagedown.rbind.io/

business card
letter with Letterhead (pagedown::html_letter) Example https://pagedown.rbind.io/html-letter/
resume Example https://pagedown.rbind.io/html-resume/
poster Example https://pagedown.rbind.io/poster-relaxed/
book

css grid

17: Introducing the gt Package

Rich Iannone

This package is mainly used for generating tables.

package github : https://github.com/rstudio/gt

18: The lazy and easily distracted report writer

Mike Smith

xkcd figure

Who is your audience?

Present (distracted) me
Future (6 months later) me
Quantitative colleagues / reviewers
Decision makers (may not be quantitative)

Rule of three

Copy & paste code ≥3 times?
- Write and use a function
Perform analysis across ≥ 3 endpoints?
- Multiple markdown reports?
- NOPE. Parameterised reports.

### Hide code if we're not rendering the report for quantitative audience.
if (!params$quantAudience) knitr::opts_chunk$set(echo = FALSE)

### Hide code if we're not rendering the report for quantitative audience.
data %>% 
    head(10)

# pull child text.Rmd

# r To pull child text, eval = params$p, echo=TRUE, child = "datamanipulate_txt.Rmd"

19 Visualizing uncertainty with hypothetical outcomes plots

Claus Wilke

Tools for visualizing uncertainty with ggplot2: https://wilkelab.org/ungeviz/

Hypothetical outcome plots (HOPs) are a great way of communicating uncertainty to non-experts.
Hadley’s challenge poses three questions

How do we generate outcomes?
- Bayesian MCMC sampling (tidybayes)
- Bootstrapping/resampling input data
- Normal approximation to sampling distribution
How do we get them into ggplot?
- Use package-ungeviz

library(ungeviz)

Is there anything to be done?

20: New language features in RStudio

Jonathan McPherson https://github.com/jmcphers

21: gganimate live cookbook

Thomas Lin Pedersen

What is gganimate?

An implementation of the grammar of animated graphics
An extension to ggplot2
Not what it was a year ago!

How do I use gganimate?

transitions: You want your data to change
views: You want your viewpoint to change
shadows: You want the animation to have memory
Getting Started Guide: https://gganimate.com/
useR presentation: https://youtu.be/21ZWDrTukEs

22: pkgman: A fresh approach to package installation

Gábor Csárdi https://github.com/gaborcsardi

R-package: pak https://github.com/r-lib/pak

The main goals of pkgman is to make package installation fast and more reliable. This allows new, simpler and safer workflows, such as separate package libraries for projects.
pak installs R packages from CRAN, Bioconductor, GitHub, and local files and directories. It is an alternative to install.packages() and devtools::install_github(). pak is fast, safe and convenient.

Installation

Install the package from CRAN:

install.packages("pak")

(After installation, you might also want to run pak::pak_setup(); it’ll be run automatically when needed but you might want to do it now to save some time later.)

Usage

Call pkg_install() to install CRAN or Bioconductor packages:

pak::pkg_install("usethis")

To install GitHub packages, use the user/repo syntax:

pak::pkg_install("r-lib/usethis")

All dependencies will be installed as well, to the same library.

Why pak?

Fast

Fast downloads and HTTP queries. pak performs all HTTP requests concurrently.
Fast installs. pak builds and installs packages concurrently.
Metadata and package cache. pak caches package metadata and all downloaded packages locally. It does not download the same package files over and over again.
Lazy installation. pak only installs the packages that are really necessary for the installation. If the requested package and its dependencies are already installed, pak does nothing.

Safe

Private library (pak’s own package dependencies do not affect your regular package libraries and vice versa).
Every pak operation runs in a sub-process, and the packages are loaded from the private library. pak avoids loading packages from your regular package libraries. (These package files would be locked on some systems, and locked packages cannot be updated. pak does not load any package in the main process, except for pak itself).
To avoid updating locked packages, pak warns and requests confirmation for loaded packages.
Dependency solver. pak makes sure that you end up in a consistent, working state of dependencies. If finds conflicts up front, before attempting installation.

Convenient

BioC packages. pak supports Bioconductor packages out of the box. It uses the Bioconductor version that is appropriate for your R version.
GitHub packages. pak supports GitHub packages out of the box. It also supports the `Remotes entry in DESCRIPTION files, so that GitHub dependencies of GitHub packages will also get installed. See e.g. https://cran.r-project.org/package=remotes/vignettes/dependencies.html
Package sizes. For CRAN packages pak shows the total sizes of packages it needs to download.

Summary and Links

Goal 1: cheap and reliable package installation
Goal 2: project centered work flows
pak Features: fast(er), (more) reliable, (more) convenient
Installation:

source("https://install-github.me/r-lib/pak")

Repository: https://github.com/r-lib/pak

23: Effecitive use of Shiny modules by Eric Nantz

http://r-prodcast.org

slide: http://bit.ly/modules2019

24: Reactlog 2.0: Debugging the state of Shiny by Barret Schloerke

slide: https://bit.ly/rstudio-conf-2019-reactlog

demo: https://bit.ly/rstudio-conf-2019-reactlog-demo

25: Integrating React.js and Shiny by Alan Dipert

slide: http://bit.ly/reactR-tutorial

tutorial: https://github.com/react-R

26: Learning R survey

http://rstd.io/learning-r-survey

27: Lazy evaluation to R

http://rstd.io/tidy-eval-context

28: RStudio Cloud for Education

Mel Gregory

https://rstudio.cloud
Log in
Create a new project

Basic terminology: projects and spaces

A project is a fundamental unit of work.

A space is a place for a group of people to collaborate on work.

Basic terminology: roles

Admin: can view, manage and edit all projects, manage space membership, and delete the space itself. Good for: System Administrator
Moderator: can view, manage, and edit all projects in the space. Good for: project lead
Contributor: seee public projects in the space, and create new projects in the space. Good for: project memeber/students.

Basic terminology: visibility

A project can be public or private.

Private projects in a space can be viewed by the author and space admins and moderators.
Provate projects have a lock icon.

Summary

RStudio Cloud: Not just for education

We create RStudio Cloud to make it easy for professionals, hobbyists, trainers, teachers and students to do, share, teach and learn data science using R.

Try it out: $\color{red}{\text{https://rstudio.cloud}}$

Talk to us: $\color{red}{\text{https://community.rstudio.com}}$

29: Pimp my RMD: a few tips for R Markdown by Yan Holtz

https://holtzy.github.io/Pimp-my-rmd/

30: Keynote 3: The unreasonable effectiveness of public work

David Robinson, DataCamp

Why spend time on public work?

Advance your career
Practice good habits
Contribute to the community

Disclaimer:

A lot of this talk is about what worked for me
The unresonable Effectiveness of Public work (~_Necessity~)

Write a blog

if you repeat your code 3 times, write a function.
if you give in-person advice to the same question 3 times, write a blog to explain/post it.

blogdown package

But what I can blog about?
1. provide some information
TidyTuesday (R4DS online community)
2. Teach a concept

Contribute to the community

Contribute by creating packages

Give a talk

Imagine giving the talk to yourself 3-6 months ago
Practice giving the talk out loud
Don’t rely on bullet points
Be sure to publish your slides

where to give a talk?

Internal seminar
Local meetup
Larger conference

Recording a screencasts

Limination of screencasts

you need to be capable and confident enough to improvise
You have to be comfortable embarrassing yourself!

Write a book

Introduce the bookdown package

31: Final Panel discussion

By Hilary Parker, Karthik Ram, Angela Bassa, Tracy Teal, Eduarado

Key words worth to mention in the conference

API
css
plumber
metadata
<www.stackoverflow.com>

RStudio Conference 2019 Notes @Austin

Ou Zhang

January 17-18, 2019

1: Welcome and RStudio Vision

4 reasons to love code:

Code Importance

2: R shiny Keynote Talk

rstudio/cranwhales

Goals:

New tools for Shiny in production

Big Question: Is Cranwhales fast enough?

Performance workflow

3: Getting it right: Writing reliable and maintainable R code

Characteristics of good code:

How do we determine the quality of our code?

Modular design for ease of testing and maintenance

Resources to read/watch:

4: A guide to making your data reproducible

Set up Research compendium principles

Key components you’ll need for sharing a compendium

3 separate different components ofa compendium

5: Solving R for data science

6: Working with categorical data in R without losing your mind

Factors

stringAsFactors: An unauthorized biography

Takeaways:

7: R4DS online learning community

8: Scaling R with Spark by Javier Luraschi, RStudio

9: Introducing MLflow for R-Managing the Machine Learning Lifecycle by Kevin Kuo

10: Democratizing R using plumber

download today

11: Solving a $15M/year Employee Attrition Problem with the tidyverse (work flow) by Matt Dancho

12: 3D mapping by Tyler

13: R/qtl2: Rewrite of a very old R package by Karl Broman, University of Wisconsin

14: Keynote talk Explicit direct instruction in programming education by Felienne

Why they all have twitter?

15: R Markdown the bigger picture

One last thing: Reproducibility is an opportunity.

16: Pagedown: Creating Beautiful PDFs with R Markdown + CSS by Yihui Xie

Installation

Preview HTML/ Print to PDF

17: Introducing the gt Package

18: The lazy and easily distracted report writer

Who is your audience?

Rule of three

19 Visualizing uncertainty with hypothetical outcomes plots

20: New language features in RStudio

21: gganimate live cookbook

22: pkgman: A fresh approach to package installation

Installation

Usage

Why pak?

Fast

Safe

Convenient

Summary and Links

23: Effecitive use of Shiny modules by Eric Nantz

24: Reactlog 2.0: Debugging the state of Shiny by Barret Schloerke

25: Integrating React.js and Shiny by Alan Dipert

26: Learning R survey

27: Lazy evaluation to R

28: RStudio Cloud for Education

Basic terminology: projects and spaces

Basic terminology: roles

Basic terminology: visibility

Summary

29: Pimp my RMD: a few tips for R Markdown by Yan Holtz

30: Keynote 3: The unreasonable effectiveness of public work

A few types of sharing:

Why spend time on public work?

Disclaimer:

Write a blog

Contribute to the community

Give a talk

Recording a screencasts

Write a book

31: Final Panel discussion

Key words worth to mention in the conference