aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/analysis/analysis.Rmd
diff options
context:
space:
mode:
Diffstat (limited to 'analysis/analysis.Rmd')
-rw-r--r--analysis/analysis.Rmd315
1 files changed, 310 insertions, 5 deletions
diff --git a/analysis/analysis.Rmd b/analysis/analysis.Rmd
index b78d169..738db5f 100644
--- a/analysis/analysis.Rmd
+++ b/analysis/analysis.Rmd
@@ -5,15 +5,320 @@ author:
output:
html_document:
toc: true
- toc_float: true
+ toc_float: false
number_sections: false
- theme: united
- highlight: tango
- code_folding: show
+ theme: readable
+ highlight: espresso
+ code_folding: hide
fig_width: 10
fig_height: 8
fig_caption: true
+ df_print: paged
---
-```{r setup, echo=FALSE}
+```{r setup, include=FALSE}
+library(tidyverse)
+library(plotly)
+library(modelsummary)
```
+
+# Hypotheses
+
+1. Computing complex fractals parallelizes very well.
+2. The runtime has a linear relation with the number of samples.
+
+# Data
+
+Each data point is an average of 5 runtimes collected from running the program.
+To recreate out data, compile the programs using the instructions in `README.md` then run `gather_data` in the `analysis` directory from the project root.
+After the the SLURM jobs are finished, run `analysis/collate_data` to collect the data into a single file `analysis/data.csv`.
+Note that the SLURM scripts are unique to MU Cluster and will need to be tweaked to work on other systems.
+
+```{r data_ingest, include=FALSE}
+full_data <- read_csv("data.csv") %>%
+ mutate(samples = horizontal_samples * vertical_samples,
+ program = gsub("^build/(.*?)-fractals$", "\\1", program),
+ threads = ifelse(program == "cuda", 1024, threads)) %>%
+ select(program, threads, fractal, samples, runtime)
+```
+
+For a raw csv view of the data see `data.csv`.
+We collected data for the following fractals:
+
+* Mandelbrot
+* Burning Ship
+* Tricorn
+* Multibrot
+* Multicorn
+* Julia
+
+Using a maximum iterations of 25.
+We collect 3 data points per serial experiment, and 5 data points per shared and CUDA experiment.
+For a more detailed view of our collection methods see the scripts in `analysis/scripts`.
+The runtimes for CUDA experiments include the time to copy the data to and from the GPU for reasons that will become apparent later.
+Note the block size is set to $16x16$ for all CUDA experiments since we found a maxium percentage difference of $0.2%$ between $16x16$ and other block sizes that yield the maximum number of warps.
+
+```{r data_table, echo=FALSE}
+full_data
+```
+
+# Runtimes
+
+## Sampling's Effect on Runtimes
+
+The following plots are interactive, allowing you to zoom in on the data.
+By clicking on the legend you can hide or show data for each fractal.
+Each plot has a line representing the median runtime for each fractal and program.
+
+### Serial, Shared, CUDA
+
+For larger sample sizes we see the GPU with memory copy overhead is far faster than the CPU implementations.
+Without the memory copy overhead the CUDA runtimes are barely measurable.
+We expect this result, since each value in the fractal is computed independently, which is the type of task the GPU is designed for.
+
+Observe that the fractals that require complex exponentiation, such as the Multibrot and Multicorn, have the longest runtimes.
+We expect this, since the complex exponentiation is the most computationally expensive part of the fractal generation.
+
+Also take note of how linear the runtimes are with respect to the number of samples.
+We explore this further in the linear models section.
+
+```{r sampling_effect}
+full_plot <- full_data %>%
+ ggplot(aes(x = samples, y = runtime, color = program,
+ shape = factor(threads))) +
+ geom_point() +
+ facet_wrap(~fractal, scales = "free_y") +
+ stat_summary(fun = median, geom = "line",
+ aes(group = interaction(program, threads))) +
+ labs(title = "Sampling's Effect on Runtimes",
+ x = "Samples",
+ y = "Runtime (s)",
+ shape = "Threads",
+ color = "Implementation")
+
+ggplotly(full_plot)
+```
+
+### Serial
+
+```{r serial_sampling}
+serial_plot <- full_data %>%
+ filter(program == "serial") %>%
+ ggplot(aes(x = samples, y = runtime, color = fractal)) +
+ geom_point() +
+ stat_summary(fun = median, geom = "line",
+ aes(group = fractal)) +
+ labs(title = "Serial Sampling's Effect on Runtimes",
+ x = "Samples",
+ y = "Runtime (s)",
+ color = "Fractal")
+
+ggplotly(serial_plot)
+```
+
+### Shared
+
+```{r shared_sampling}
+shared_plot <- full_data %>%
+ filter(program == "shared") %>%
+ rename(Threads = threads) %>%
+ ggplot(aes(x = samples, y = runtime, color = fractal)) +
+ geom_point() +
+ stat_summary(fun = median, geom = "line",
+ aes(group = fractal)) +
+ facet_wrap(~Threads, scales = "free_y", labeller = label_both) +
+ labs(title = "Shared Sampling's Effect on Runtimes",
+ x = "Samples",
+ y = "Runtime (s)",
+ color = "Fractal")
+
+ggplotly(shared_plot)
+```
+
+### CUDA
+
+```{r cuda_sampling}
+cuda_plot <- full_data %>%
+ filter(program == "cuda") %>%
+ ggplot(aes(x = samples, y = runtime, color = fractal)) +
+ geom_point() +
+ stat_summary(fun = median, geom = "line",
+ aes(group = fractal)) +
+ labs(title = "CUDA Sampling's Effect on Runtimes",
+ x = "Samples",
+ y = "Runtime (s)",
+ color = "Fractal")
+
+ggplotly(cuda_plot)
+```
+
+## Linear Models
+
+We fit linear models to the data to test our hypothesis that the runtime has a linear relationship with the number of samples.
+By doing so we can give with a high confidence models for computing the runtime of a fractal given the number of samples.
+In all the models below, the $R^2$ value is very close to 1, indicating that the model fits the data very well.
+
+### Serial
+
+```{r serial_models, include=FALSE}
+serial_mandelbrot_model <- full_data %>%
+ filter(program == "serial", fractal == "mandelbrot") %>%
+ lm(runtime ~ samples, data = .)
+
+serial_burning_ship_model <- full_data %>%
+ filter(program == "serial", fractal == "burning ship") %>%
+ lm(runtime ~ samples, data = .)
+
+serial_tricorn_model <- full_data %>%
+ filter(program == "serial", fractal == "tricorn") %>%
+ lm(runtime ~ samples, data = .)
+
+serial_multibrot_model <- full_data %>%
+ filter(program == "serial", fractal == "multibrot") %>%
+ lm(runtime ~ samples, data = .)
+serial_multicorn_model <- full_data %>%
+ filter(program == "serial", fractal == "multicorn") %>%
+ lm(runtime ~ samples, data = .)
+
+serial_julia_model <- full_data %>%
+ filter(program == "serial", fractal == "julia") %>%
+ lm(runtime ~ samples, data = .)
+```
+
+<div style="display: flex; justify-content: center; align-items: center;">
+```{r serial_models_table, echo=FALSE}
+modelsummary(list(
+ "Mandelbrot" = serial_mandelbrot_model,
+ "Burning Ship" = serial_burning_ship_model,
+ "Tricorn" = serial_tricorn_model,
+ "Multibrot" = serial_multibrot_model,
+ "Multicorn" = serial_multicorn_model,
+ "Julia" = serial_julia_model), output = "html")
+```
+</div>
+
+### Shared
+
+For the sake of brevity we only show models that include all threads.
+Separating the models by thread counts provided much better models.
+
+```{r shared_models, include=FALSE}
+shared_mandelbrot_model <- full_data %>%
+ filter(program == "shared", fractal == "mandelbrot") %>%
+ lm(runtime ~ samples, data = .)
+shared_burning_ship_model <- full_data %>%
+ filter(program == "shared", fractal == "burning ship") %>%
+ lm(runtime ~ samples, data = .)
+shared_tricorn_model <- full_data %>%
+ filter(program == "shared", fractal == "tricorn") %>%
+ lm(runtime ~ samples, data = .)
+shared_multibrot_model <- full_data %>%
+ filter(program == "shared", fractal == "multibrot") %>%
+ lm(runtime ~ samples, data = .)
+shared_multicorn_model <- full_data %>%
+ filter(program == "shared", fractal == "multicorn") %>%
+ lm(runtime ~ samples, data = .)
+shared_julia_model <- full_data %>%
+ filter(program == "shared", fractal == "julia") %>%
+ lm(runtime ~ samples, data = .)
+```
+
+<div style="display: flex; justify-content: center; align-items: center;">
+```{r shared_models_table, echo=FALSE}
+modelsummary(list(
+ "Mandelbrot" = shared_mandelbrot_model,
+ "Burning Ship" = shared_burning_ship_model,
+ "Tricorn" = shared_tricorn_model,
+ "Multibrot" = shared_multibrot_model,
+ "Multicorn" = shared_multicorn_model,
+ "Julia" = shared_julia_model), output = "html")
+```
+</div>
+
+### CUDA
+
+```{r cuda_models, include=FALSE}
+cuda_mandelbrot_model <- full_data %>%
+ filter(program == "cuda", fractal == "mandelbrot") %>%
+ lm(runtime ~ samples, data = .)
+cuda_burning_ship_model <- full_data %>%
+ filter(program == "cuda", fractal == "burning ship") %>%
+ lm(runtime ~ samples, data = .)
+cuda_tricorn_model <- full_data %>%
+ filter(program == "cuda", fractal == "tricorn") %>%
+ lm(runtime ~ samples, data = .)
+cuda_multibrot_model <- full_data %>%
+ filter(program == "cuda", fractal == "multibrot") %>%
+ lm(runtime ~ samples, data = .)
+cuda_multicorn_model <- full_data %>%
+ filter(program == "cuda", fractal == "multicorn") %>%
+ lm(runtime ~ samples, data = .)
+cuda_julia_model <- full_data %>%
+ filter(program == "cuda", fractal == "julia") %>%
+ lm(runtime ~ samples, data = .)
+```
+
+<div style="display: flex; justify-content: center; align-items: center;">
+```{r cuda_models_table, echo=FALSE}
+modelsummary(list(
+ "Mandelbrot" = cuda_mandelbrot_model,
+ "Burning Ship" = cuda_burning_ship_model,
+ "Tricorn" = cuda_tricorn_model,
+ "Multibrot" = cuda_multibrot_model,
+ "Multicorn" = cuda_multicorn_model,
+ "Julia" = cuda_julia_model), output = "html")
+```
+</div>
+
+
+# Parallelism
+
+```{r serial_times, include=FALSE}
+serial_times <- full_data %>%
+ filter(program == "serial") %>%
+ group_by(samples) %>%
+ summarize(serial_runtime = median(runtime))
+```
+
+In the plots below we show the percentage of parallelism according to Amdahls Law as a function of samples.
+We truncated the data to only show percentage parallelism greater than -75%.
+As we expected, the CUDA implementation is highly parallel, with the shared implementations have varying degrees of parallelism.
+For each thread count, the shared implementations tend to level off at a certain percentage of parallelism.
+An oddity in the data is that the shared multibrot and multicorn implementations have a negative percentage of parallelism.
+We believe this is from their usage of complex exponentiation, which is their ownly major difference between the other fractals.
+From this, we conclude that that there is some resource that each thread must share in multibrot and multicorn that is not shared in the other fractals.
+Most likely this resource is a specialized arrithmetic unit.
+
+On the GPU we have a similar issue, the register count per thread from about 32 to 48 when computing the multibrot and multicorn fractals.
+
+```{r program_parallelism, echo=FALSE}
+parallel <- full_data %>%
+ filter(program != "serial") %>%
+ left_join(serial_times, by = "samples") %>%
+ mutate(speedup = serial_runtime / runtime,
+ percentage_parallel = (threads/(threads-1))*(1-1/speedup)) %>%
+ filter(percentage_parallel > -0.75, is.finite(percentage_parallel))
+```
+
+```{r parallel_plot}
+parallel_plot <- parallel %>%
+ ggplot(aes(x = samples, y = percentage_parallel, color = program,
+ shape = factor(threads))) +
+ geom_point() +
+ stat_summary(fun = median, geom = "line",
+ aes(group = interaction(program, fractal, threads))) +
+ facet_wrap(~fractal, scales = "free_y") +
+ labs(title = "Program/Fractal Parallelism",
+ x = "Samples",
+ y = "Percentage Parallel",
+ color = "Implementation",
+ shape = "Threads")
+
+ggplotly(parallel_plot)
+```
+
+
+# Conclusions
+
+From this data we can conclude that the runtime of fractal generation is linear with respect to the number of samples.
+We also conclude that the CUDA implementation is highly parallel, with the shared implementations having varying degrees of parallelism. \ No newline at end of file