Three key concepts
The way of RStudio
Reproducibility
Imporantance of 1439 (printing press technology)
Mass communication
Code is the reproducibility, Reproducibility is good science, and good science is good business.
If you belileve in the code and reproducibility, you can get reuse, automation, scheduling, and parametrization.
slide: https://speakerdeck.com/jcheng5/shiny-in-production
cranwhales is a Shiny example application, meant to demonstrate how to convert synchronous (traditional) Shiny apps to asynchronous ones.
Staging and Production
keep it up -Unplanned outages are rare or nonexistent
keep it safe -Data, functionality, and code are all kept safe from unauthorized access
keep it correct
-works as intended, provide right answers
keep it snappy
Rstudio connenct
shinytest- automated UI testing for Shiny
shinyloadtest- load testing for shiny
profvis - profiler for R
Plot caching
Async -last resort technique for dealing with slow operations
Using shinyloadtest to see if it’s fast enough
if not, use profvis to see what’s making it slow.
optimize
3.1 Move work out of Shiny 3.2 Make code faster
3.3 Use Caching
3.4 Use async
Repeat!
Reliability
Reproducibility
Flexibility
Longevity
Scalability
make the part of your regular development cycle for;
faster updates
R package-testthat, usethis
R package-shinytest
Decoupling, code isolation
DRY: Don’t Repeat Yourself
KISS: Keep It Simple, Stupid
Consistent code style
testthat::auto_test
shinytest
from rstudio::conf(2018)Data (Small –> Medium)
1.1 Small data: put small data inside packages, especially if you ship a methods package with your analysis
1.1.1 CRAN =< 5mb
1.2 piggyback package: Attach large [data] files to Github repositories
1.3 Medium data arkdb package: Attach large [data] files to Github repositories
Computing environment
It’s important to isolate the computing envrionment so that changes in software dependencies don’t break your analysis.
Adding a Dockerfile to your compendium
Many ways to write a Dockerfile for your project
o2r/containerit
jupyter/repo2docker
http://github.com/karthik/rstudio2019
Git + Docker + RStudio
Include a workflow to manage relationsihps between data output and code.
General purpose workflow manager & pipeline toolkit for reporducibility and high-performance computing.
R for Data Science: Exercise Solutions: https://jrnold.github.io/r4ds-exercise-solutions
github link: https://github.com/jrnold/r4ds-exercise-solutions
Slides: http://www.amelia.mn/WranglingCats.pdf
R’s representation of categorical data. Consists of: 1. A set of values 2. An ordered set of valid levels
eyes <- factor(x = c("blue", "green", "green"),
levels = c("blue", "brown", "green"))
eyes
## [1] blue green green
## Levels: blue brown green
• Use forcats • Practice defensive coding • summary() is your friend • assertthat and testthat • Check out http://bit.ly/WranglingCats
library(sparklyr)
spark_install()
sc <- spark_connect(master = "local")
Doc: <spark.rstudio.com>
Blog: <blog.rstudio.com/tags/sparklyr>
Mlflow- open source platform
library(mlflow)
mlflow_log_param("foo")
Data science workflow
Business problem –> techinical skills –> efficient process –> business value
https://www.biostat.wisc.edu/~kbroman/presentations/rqtl2_rstudio2019.pdf
The replication prices is the dark cloud.
Challenges in irresproducible researech.
The replication crisis for you is the credibility crisis.
Many applications of statistics are cargo-cult statistics.
pagedown_github * https://github.com/rstudio/pagedown
slide : http://bit.ly/pagedown
In HTML and the Web I trust.
It is easier parsing HTML than parsing PDF.
HTML and CSS will soon catch up the Typesetting style in Latex but it is difficult for Latex or Word to catch up in other aspects in HTML, such as the interactivity.
You can install from CRAN (an initial version has been released), but this package is only two months old. You are recommended to install from Github:
remotes::install_github(\(\color{red}{\text{'rstudio/pagedown'}}\))
This package requires Pandoc 2.x, which is currently bundled in RStudio 1.2.x (you may install the preview version of RStudio).
Google Chrome or Chromium is recommended to view and print HTML pages generated from this package.
Manual: https://pagedown.rbind.io/
css grid
This package is mainly used for generating tables.
Present (distracted) me
Future (6 months later) me
Quantitative colleagues / reviewers
Decision makers (may not be quantitative)
### Hide code if we're not rendering the report for quantitative audience.
if (!params$quantAudience) knitr::opts_chunk$set(echo = FALSE)
### Hide code if we're not rendering the report for quantitative audience.
data %>%
head(10)
# pull child text.Rmd
# r To pull child text, eval = params$p, echo=TRUE, child = "datamanipulate_txt.Rmd"
Tools for visualizing uncertainty with ggplot2: https://wilkelab.org/ungeviz/
Hypothetical outcome plots (HOPs) are a great way of communicating uncertainty to non-experts.
Hadley’s challenge poses three questions
How do we generate outcomes?
How do we get them into ggplot?
library(ungeviz)
source: https://github.com/thomasp85 https://www.data-imaginist.com/
Video: https://resources.rstudio.com/rstudio-conf-2019/gganimate-live-cookbook
What is gganimate?
How do I use gganimate?
transitions: You want your data to change
views: You want your viewpoint to change
shadows: You want the animation to have memory
Getting Started Guide: https://gganimate.com/
useR presentation: https://youtu.be/21ZWDrTukEs
Gábor Csárdi https://github.com/gaborcsardi
R-package: pak https://github.com/r-lib/pak
The main goals of pkgman is to make package installation fast and more reliable. This allows new, simpler and safer workflows, such as separate package libraries for projects.
pak installs R packages from CRAN, Bioconductor, GitHub, and local files and directories. It is an alternative to install.packages() and devtools::install_github(). pak is fast, safe and convenient.
Install the package from CRAN:
install.packages("pak")
(After installation, you might also want to run pak::pak_setup()
; it’ll be run automatically when needed but you might want to do it now to save some time later.)
Call pkg_install()
to install CRAN or Bioconductor packages:
pak::pkg_install("usethis")
To install GitHub packages, use the user/repo
syntax:
pak::pkg_install("r-lib/usethis")
All dependencies will be installed as well, to the same library.
Fast downloads and HTTP queries. pak performs all HTTP requests concurrently.
Fast installs. pak builds and installs packages concurrently.
Metadata and package cache. pak caches package metadata and all downloaded packages locally. It does not download the same package files over and over again.
Lazy installation. pak only installs the packages that are really necessary for the installation. If the requested package and its dependencies are already installed, pak does nothing.
Private library (pak’s own package dependencies do not affect your regular package libraries and vice versa).
Every pak operation runs in a sub-process, and the packages are loaded from the private library. pak avoids loading packages from your regular package libraries. (These package files would be locked on some systems, and locked packages cannot be updated. pak does not load any package in the main process, except for pak itself).
To avoid updating locked packages, pak warns and requests confirmation for loaded packages.
Dependency solver. pak makes sure that you end up in a consistent, working state of dependencies. If finds conflicts up front, before attempting installation.
BioC packages. pak supports Bioconductor packages out of the box. It uses the Bioconductor version that is appropriate for your R version.
GitHub packages. pak supports GitHub packages out of the box. It also supports the `Remotes
entry in DESCRIPTION
files, so that GitHub dependencies of GitHub packages will also get installed. See e.g. https://cran.r-project.org/package=remotes/vignettes/dependencies.html
Package sizes. For CRAN packages pak shows the total sizes of packages it needs to download.
Goal 1: cheap and reliable package installation
Goal 2: project centered work flows
pak Features: fast(er), (more) reliable, (more) convenient
Installation:
source("https://install-github.me/r-lib/pak")
slide: http://bit.ly/modules2019
slide: http://bit.ly/reactR-tutorial
tutorial: https://github.com/react-R
A project is a fundamental unit of work.
A space is a place for a group of people to collaborate on work.
Admin: can view, manage and edit all projects, manage space membership, and delete the space itself. Good for: System Administrator
Moderator: can view, manage, and edit all projects in the space. Good for: project lead
Contributor: seee public projects in the space, and create new projects in the space. Good for: project memeber/students.
A project can be public or private.
Private projects in a space can be viewed by the author and space admins and moderators.
Provate projects have a lock icon.
RStudio Cloud: Not just for education
We create RStudio Cloud to make it easy for professionals, hobbyists, trainers, teachers and students to do, share, teach and learn data science using R.
Try it out: \(\color{red}{\text{https://rstudio.cloud}}\)
Talk to us: \(\color{red}{\text{https://community.rstudio.com}}\)
if you repeat your code 3 times, write a function.
if you give in-person advice to the same question 3 times, write a blog to explain/post it.
blogdown package
But what I can blog about?
1. provide some information
TidyTuesday (R4DS online community)
2. Teach a concept
where to give a talk?
Limination of screencasts
you need to be capable and confident enough to improvise
You have to be comfortable embarrassing yourself!
By Hilary Parker, Karthik Ram, Angela Bassa, Tracy Teal, Eduarado