Introduction

Author

Derek Sollberger

Published

January 17, 2023

library("tidyverse")
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.1 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Introducting the Presenter

  • Lecturer: Derek Sollberger

    • I go by “Derek” or “teacher”
  • BA in Applied Mathematics, UC Berkeley

  • MS in Applied Mathematics, CSULB

  • MS in Applied Mathematics, UC Merced

Introducting the Presenter

  • Continuing Lecturer in Applied Mathematics
  • 10+ years of teaching at UC Merced
  • Courses:
    • Bio 18: Data Science
    • Bio 175: Biostatistics
    • Bio 184: Python for Bioinformatics
    • Math 32: Probability and Statistics

Current Research in Pedagogy

  • active learning
  • computer programming
  • augmented reality

Identity Statement

  • Originally from Los Angeles
  • Math: easier to understand through graphs
  • Computer Programming: years of experience with R, Python, MATLAB, PHP, HTML, etc.
  • Learning: drawn to puzzles and manageable tasks
  • Personality: shy, introvert

On Notetaking

  • Do not write all of the information from the slides
  • Do write along with what Derek writes on the whiteboard
  • Make a few notes for main ideas and computer programming
  • Retain placement notes, such as “Example #2” or “Survey#1”
  • No need to copy computer code from lecture (code will be provided)

Why Probability?

The Classic Birthday Problem

How many students have to enter the classroom until there are two students that share a birthday?

Deterministic vs Probabilistic

Deterministic: a situation that can be solved with equation solving and/or an algorithm

  • Example: If water boils at 100 degrees Celsius, what is that threshold in Fahrenheit?

Probabilistic: a situation that cannot be completely solved due to an element of chance

  • Example: What is the chance that it will rain tomorrow?

Probability and You

Does a probabilistic sequence converge or diverge?

What percentage of lyme disease patients would be cured with the current but experimental treatments?

What proportion of reactants undergo a reaction early in the reaction?

How many computers in a network would be affected after a virus infection?

How many of a certain species of plants are in the Vernal Pools Reserve?

What percentage of a semiconductor is made of impurities?

For a commercial passenger airplane, what is the probability that at least two engines fail during a flight?

How many stars are in the Milky Way?

Ugh, the Syllabus

Concepts of probability and statistics. Conditional probability, independence, random variables, distribution functions, descriptive statistics, transformations, sampling errors, confidence intervals, least squares and maximum likelihood. Exploratory data analysis and interactive computing.

  1. Develop probabilistic models of random phenomena.
  2. Infer statistical models from real data.
  3. Apply mathematical methods to probabilistic/statistical models to
  • Make predictions and
  • Quantify the uncertainty in these predictions.
  1. Write and run “simple” R programs for the purposes of data analysis, modeling, and visualization.
  1. Solve mathematical problems using analytical methods.
  2. Solve mathematical problems using computational methods.
  3. Recognize the relationships between different areas of mathematics and the connections between mathematics and other disciplines.
  4. Give clear and organized written and verbal explanations of mathematical ideas to a variety of audiences
  5. Model real-world problems mathematically and analyze those models using their mastery of the core concepts.

Assessment

  • 10 percent of semester grade
  • quizzes due before lecture
    • no extensions
  • 5 to 10 minutes per quiz
    • review concepts and formulas
    • preview thought exercises
  • 15 percent of semester grade
  • language: R
  • platform: JupyterHub
  • answers to frequently asked questions
    • no, work may not be done in another language (e.g. Python)
    • no, work may not be done in another IDE (e.g. VS Code)
  • 10 to 20 minutes per week
  • 20 percent of semester grade
  • classical math textbook homework
  • advice: do most of the work during your discussion section
  • 10 percent of semester grade for discussion section participation
  • TA will track attendance
  • advised to work on written and computer assignments during discussion sections
  • 5 percent of semester grade
  • graded quickly on effort and completion
  • 5 to 10 minutes per survey/reading
  • Exam 1: 10 percent of semester grade (Wed., Mar. 1)
  • Exam 2: 15 percent of semester grade (Mon., Apr. 10)
  • Final Exam: 15 percent of semester grade (Sat., May 6)
  • based on the written assignments (i.e. no computer code)

Student Accessibility Services

Special Accommodations: University of California, Merced is committed to creating learning environments that are accessible to all. If you anticipate or experience physical or academic barriers based on a disability, please feel welcome to contact me privately so we can discuss options. In addition, please contact Student Accessibility Services (SAS) at (209) 228-6996 or disabilityservices@ucmerced.edu as soon as possible to explore reasonable accommodations. All accommodations must have prior approval from Student Accessibility Services on the basis of appropriate documentation. If you anticipate or experience barriers due to pregnancy, temporary medical condition, or injury,please feel welcome to contact me so we can discuss options. You are encouraged to contact the Dean of Students for support and resources at (209) 228-3633 or https://studentaffairs.ucmerced.edu/dean-students.

Academic Integrity

Academic integrity is the foundation of an academic community and without it none of the educational or research goals of the university can be achieved. All members of the community are responsible for its academic integrity. Existing policies forbid cheating on examinations, plagiarism and other forms of academic dishonesty. The UC Merced Academic Honesty Policy The UC Merced Academic Honesty Policy can be found on the Student Conduct website. Infractions against academic integrity will incur consequences such as an “F” on the assignment/exam and/or a report to the Academic Senate.

Nerdy Example

How many numbers between zero and one do we have to add up to have a sum that is greater than one?

  • Assume selection from a uniform distribution

Cumulative Summation

Let us start with the natural numbers

\[i = \{1, 2, 3, ...\}\] Then cumulative summation takes place with

\[F(n) = \sum_{i = 1}^{n} i\]

Cumulative Summation

In R, we can define a sequence of natural numbers

natural_numbers <- 1:10
print(natural_numbers)
 [1]  1  2  3  4  5  6  7  8  9 10

and then employ the cumsum() function to perform the cumulative summation.

cumsum(natural_numbers)
 [1]  1  3  6 10 15 21 28 36 45 55

Random Number Generation

In R, we generate a random number between zero and one (here: assumed from a uniform distribution) with the runif function.

runif(1)
[1] 0.3763247

From there, we can (for example) produce a sample of \(n = 32\) such random numbers

runif(32)
 [1] 0.808496253 0.530255245 0.575500263 0.925266754 0.218109261 0.948042712
 [7] 0.082536464 0.895080403 0.030557328 0.600224842 0.908927160 0.995998403
[13] 0.506766883 0.232773983 0.097109339 0.935733052 0.008453035 0.821926254
[19] 0.446838635 0.138450059 0.236597474 0.374993653 0.606477449 0.855093764
[25] 0.869073509 0.978764013 0.936657508 0.248463629 0.316659372 0.007899186
[31] 0.148523741 0.414026578

One Iteration

Next, we employ function composition to apply cumulative summation to our vector of random numbers

X <- cumsum(runif(32))
print(X)
 [1]  0.7794607  1.1846579  2.1206889  3.0557736  3.3456139  3.9987518
 [7]  4.0430161  4.9879634  5.4354227  6.0101752  6.8707419  7.3198442
[13]  7.6528439  8.3454212  8.3885242  9.1884093 10.0467503 10.9859400
[19] 11.2546798 11.3066498 11.7020358 12.0357989 12.5634824 12.8123160
[25] 12.8256348 12.9948029 13.3551376 13.6047064 13.7093181 13.9621456
[31] 14.6875053 15.0901570

and then we can check when our cumulative summation first exceeded 1.0

which.max(X > 1.0)
[1] 2

Simulation

To try to understand the randomness, we can repeat the procedure for many iterations (here, \(N = 10000\)).

N <- 1e5 #number of iterations
our_results <- rep(NA, N) #initialize space for results
for(i in 1:N){
  this_vector <- cumsum(runif(10))
  this_result <- which.max(this_vector > 1.0)
  our_results[i] <- this_result
}

Visualization

To understand a distribution of a probabilistic setting, we can visualize the results.

df <- data.frame(our_results)
df |>
  ggplot() +
  geom_histogram(aes(x = our_results), binwidth = 1,
                 color = "black", fill = "blue") +
  labs(title = "Histogram of Results",
       subtitle = "How would you describe the distribution?",
       caption = "Math 32",
       x = "number of numbers needed",
       y = "count") +
  scale_x_continuous(breaks = seq(2,8))

Measure of Centrality

To hone in on our understanding of the distribution, let us take the mean() of our_results

mean(our_results, na.rm = TRUE)
[1] 2.71241
  • Note: R stops execution upon evaluating a missing value. For our intents and purposes, we will suppress that exception with na.rm = TRUE

Nerdy Example

How many numbers between zero and one do we have to add up to have a sum that is greater than one?

\[ e \approx 2.718282\]

# theoretical answer
exp(1)
[1] 2.718282

Thought questions:

  • how do we know that the answer converges?
  • how many iterations did we need for a sufficient answer?

Looking Ahead

  • Be mindful of before-lecture quizzes
  • due Fri., Jan. 20:
    • Perceptions of Probability (survey)
    • WHW1
  • Identity Statement (essay)

Exam 1 will be on Wed., Mar. 1