Productivity Hacks

Do more Data Science, faster!

Andrew Collier
satRday, Cape Town
17 March 2018


andrew@exegetic.biz | DataWookie | DataWookie


Exegetic logo
How can I be more productive?
(And have more fun in the process.)


These are just four ways. There are many more.

1. Configure in Head

Set configuration variables before anything else

Typical Script Layout

  1. Load libraries
  2. Load data
  3. Prepare data
  4. Do analysis
  5. , or results (optional)
  6. Clean up

Where are configuration variables defined?

Often they're scattered across the script.

Inefficient and prone to errors.

A Better Way

Mostly this is good, but storing credentials in a script is a very bad idea!

An Even Better Way

Storing credentials in the environment is somewhat better.

2. Ditch RStudio

Learn to the shell

There was a time before RStudio!


Actually there was quite a long time before RStudio!

  • 2011 - First RStudio release
  • 1995 - First R release

Using Rscript

Write a hello-world.sh script.

The first line tells the shell what interpreter to use.

Make it executable.


$ chmod u+x hello-world.sh
                        

Run it.


$ ./hello-world.sh
Hello World!
                        

When to use Rscript?

  • If you want a standalone executable.
  • If configuration comes from the environment.

Using (Command Line) R

Write a hello-world.R script.

Pipe it through the R interpreter.


$ cat hello-world.R | R --slave
Hello World!
                        

When to just use R?

  • If you want a script that still plays nicely in RStudio.
  • If you want to launch multiple configurations.

3. Flexible Configuration

One script, many jobs

Text Substitution

By default name is set to "World".


$ cat hello-world-variable.R | R --slave
Hello World!
                        

Sprinkle some shell magic! Use sed to substitute "useRs" in place of "World".


$ sed 's/World/useRs/' hello-world-variable.R | R --slave
Hello useRs!
                        

Environment Variables

By default name is set to "World".


$ cat hello-world-environment.R | R --slave
Hello World!
                        

Defining a NAME environment variable causes name to be set to "useRs".


$ NAME=useRs && cat hello-world-environment.R | R --slave
Hello useRs!
                        

Buffon's Needle experiment.


Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?

Multiple Configurations

Changing the value of RATIO.


$ sed "s/\(^RATIO   =\).*/\1 0.25/" buffon-needle-configuration.R
# CONFIGURATION ---------------------------------------------------------------

RATIO   = 0.25
SAMPLES = 500000        # Number of times that the needle is dropped.
SEED    = 13            # Random seed for repeatability.

# -----------------------------------------------------------------------------
                        

Can change the value of SAMPLES too.

Can also change the value of both RATIO and SAMPLES.

Script It!

4. Use the Cloud

Massive compute (on a budget)

Wish List

  1. World Peace.
  2. A massive computer with
    1. 128 CPUs,
    2. 4192 RAM and
    3. a blazing fast network connection.

But I only need it for half an hour.

And my budget is $15.

Cloud Solution

If you have

  • a credit card and
  • a SSH client

then you can have this massive computer.

You can spin it up for just half an hour. And it'll be within budget.



Machine #1 Machine #2
CPUs 1 128
RAM (GB) 1 4192
Cost ($/hour) 0.0035 26.688

Demo

Show us!


View demo on .

andrew@exegetic.biz | DataWookie | DataWookie


Code available here on .