diff options
Diffstat (limited to 'analysis/analysis.Rmd')
| -rw-r--r-- | analysis/analysis.Rmd | 315 |
1 files changed, 310 insertions, 5 deletions
diff --git a/analysis/analysis.Rmd b/analysis/analysis.Rmd index b78d169..738db5f 100644 --- a/analysis/analysis.Rmd +++ b/analysis/analysis.Rmd @@ -5,15 +5,320 @@ author: output: html_document: toc: true - toc_float: true + toc_float: false number_sections: false - theme: united - highlight: tango - code_folding: show + theme: readable + highlight: espresso + code_folding: hide fig_width: 10 fig_height: 8 fig_caption: true + df_print: paged --- -```{r setup, echo=FALSE} +```{r setup, include=FALSE} +library(tidyverse) +library(plotly) +library(modelsummary) ``` + +# Hypotheses + +1. Computing complex fractals parallelizes very well. +2. The runtime has a linear relation with the number of samples. + +# Data + +Each data point is an average of 5 runtimes collected from running the program. +To recreate out data, compile the programs using the instructions in `README.md` then run `gather_data` in the `analysis` directory from the project root. +After the the SLURM jobs are finished, run `analysis/collate_data` to collect the data into a single file `analysis/data.csv`. +Note that the SLURM scripts are unique to MU Cluster and will need to be tweaked to work on other systems. + +```{r data_ingest, include=FALSE} +full_data <- read_csv("data.csv") %>% + mutate(samples = horizontal_samples * vertical_samples, + program = gsub("^build/(.*?)-fractals$", "\\1", program), + threads = ifelse(program == "cuda", 1024, threads)) %>% + select(program, threads, fractal, samples, runtime) +``` + +For a raw csv view of the data see `data.csv`. +We collected data for the following fractals: + +* Mandelbrot +* Burning Ship +* Tricorn +* Multibrot +* Multicorn +* Julia + +Using a maximum iterations of 25. +We collect 3 data points per serial experiment, and 5 data points per shared and CUDA experiment. +For a more detailed view of our collection methods see the scripts in `analysis/scripts`. +The runtimes for CUDA experiments include the time to copy the data to and from the GPU for reasons that will become apparent later. +Note the block size is set to $16x16$ for all CUDA experiments since we found a maxium percentage difference of $0.2%$ between $16x16$ and other block sizes that yield the maximum number of warps. + +```{r data_table, echo=FALSE} +full_data +``` + +# Runtimes + +## Sampling's Effect on Runtimes + +The following plots are interactive, allowing you to zoom in on the data. +By clicking on the legend you can hide or show data for each fractal. +Each plot has a line representing the median runtime for each fractal and program. + +### Serial, Shared, CUDA + +For larger sample sizes we see the GPU with memory copy overhead is far faster than the CPU implementations. +Without the memory copy overhead the CUDA runtimes are barely measurable. +We expect this result, since each value in the fractal is computed independently, which is the type of task the GPU is designed for. + +Observe that the fractals that require complex exponentiation, such as the Multibrot and Multicorn, have the longest runtimes. +We expect this, since the complex exponentiation is the most computationally expensive part of the fractal generation. + +Also take note of how linear the runtimes are with respect to the number of samples. +We explore this further in the linear models section. + +```{r sampling_effect} +full_plot <- full_data %>% + ggplot(aes(x = samples, y = runtime, color = program, + shape = factor(threads))) + + geom_point() + + facet_wrap(~fractal, scales = "free_y") + + stat_summary(fun = median, geom = "line", + aes(group = interaction(program, threads))) + + labs(title = "Sampling's Effect on Runtimes", + x = "Samples", + y = "Runtime (s)", + shape = "Threads", + color = "Implementation") + +ggplotly(full_plot) +``` + +### Serial + +```{r serial_sampling} +serial_plot <- full_data %>% + filter(program == "serial") %>% + ggplot(aes(x = samples, y = runtime, color = fractal)) + + geom_point() + + stat_summary(fun = median, geom = "line", + aes(group = fractal)) + + labs(title = "Serial Sampling's Effect on Runtimes", + x = "Samples", + y = "Runtime (s)", + color = "Fractal") + +ggplotly(serial_plot) +``` + +### Shared + +```{r shared_sampling} +shared_plot <- full_data %>% + filter(program == "shared") %>% + rename(Threads = threads) %>% + ggplot(aes(x = samples, y = runtime, color = fractal)) + + geom_point() + + stat_summary(fun = median, geom = "line", + aes(group = fractal)) + + facet_wrap(~Threads, scales = "free_y", labeller = label_both) + + labs(title = "Shared Sampling's Effect on Runtimes", + x = "Samples", + y = "Runtime (s)", + color = "Fractal") + +ggplotly(shared_plot) +``` + +### CUDA + +```{r cuda_sampling} +cuda_plot <- full_data %>% + filter(program == "cuda") %>% + ggplot(aes(x = samples, y = runtime, color = fractal)) + + geom_point() + + stat_summary(fun = median, geom = "line", + aes(group = fractal)) + + labs(title = "CUDA Sampling's Effect on Runtimes", + x = "Samples", + y = "Runtime (s)", + color = "Fractal") + +ggplotly(cuda_plot) +``` + +## Linear Models + +We fit linear models to the data to test our hypothesis that the runtime has a linear relationship with the number of samples. +By doing so we can give with a high confidence models for computing the runtime of a fractal given the number of samples. +In all the models below, the $R^2$ value is very close to 1, indicating that the model fits the data very well. + +### Serial + +```{r serial_models, include=FALSE} +serial_mandelbrot_model <- full_data %>% + filter(program == "serial", fractal == "mandelbrot") %>% + lm(runtime ~ samples, data = .) + +serial_burning_ship_model <- full_data %>% + filter(program == "serial", fractal == "burning ship") %>% + lm(runtime ~ samples, data = .) + +serial_tricorn_model <- full_data %>% + filter(program == "serial", fractal == "tricorn") %>% + lm(runtime ~ samples, data = .) + +serial_multibrot_model <- full_data %>% + filter(program == "serial", fractal == "multibrot") %>% + lm(runtime ~ samples, data = .) +serial_multicorn_model <- full_data %>% + filter(program == "serial", fractal == "multicorn") %>% + lm(runtime ~ samples, data = .) + +serial_julia_model <- full_data %>% + filter(program == "serial", fractal == "julia") %>% + lm(runtime ~ samples, data = .) +``` + +<div style="display: flex; justify-content: center; align-items: center;"> +```{r serial_models_table, echo=FALSE} +modelsummary(list( + "Mandelbrot" = serial_mandelbrot_model, + "Burning Ship" = serial_burning_ship_model, + "Tricorn" = serial_tricorn_model, + "Multibrot" = serial_multibrot_model, + "Multicorn" = serial_multicorn_model, + "Julia" = serial_julia_model), output = "html") +``` +</div> + +### Shared + +For the sake of brevity we only show models that include all threads. +Separating the models by thread counts provided much better models. + +```{r shared_models, include=FALSE} +shared_mandelbrot_model <- full_data %>% + filter(program == "shared", fractal == "mandelbrot") %>% + lm(runtime ~ samples, data = .) +shared_burning_ship_model <- full_data %>% + filter(program == "shared", fractal == "burning ship") %>% + lm(runtime ~ samples, data = .) +shared_tricorn_model <- full_data %>% + filter(program == "shared", fractal == "tricorn") %>% + lm(runtime ~ samples, data = .) +shared_multibrot_model <- full_data %>% + filter(program == "shared", fractal == "multibrot") %>% + lm(runtime ~ samples, data = .) +shared_multicorn_model <- full_data %>% + filter(program == "shared", fractal == "multicorn") %>% + lm(runtime ~ samples, data = .) +shared_julia_model <- full_data %>% + filter(program == "shared", fractal == "julia") %>% + lm(runtime ~ samples, data = .) +``` + +<div style="display: flex; justify-content: center; align-items: center;"> +```{r shared_models_table, echo=FALSE} +modelsummary(list( + "Mandelbrot" = shared_mandelbrot_model, + "Burning Ship" = shared_burning_ship_model, + "Tricorn" = shared_tricorn_model, + "Multibrot" = shared_multibrot_model, + "Multicorn" = shared_multicorn_model, + "Julia" = shared_julia_model), output = "html") +``` +</div> + +### CUDA + +```{r cuda_models, include=FALSE} +cuda_mandelbrot_model <- full_data %>% + filter(program == "cuda", fractal == "mandelbrot") %>% + lm(runtime ~ samples, data = .) +cuda_burning_ship_model <- full_data %>% + filter(program == "cuda", fractal == "burning ship") %>% + lm(runtime ~ samples, data = .) +cuda_tricorn_model <- full_data %>% + filter(program == "cuda", fractal == "tricorn") %>% + lm(runtime ~ samples, data = .) +cuda_multibrot_model <- full_data %>% + filter(program == "cuda", fractal == "multibrot") %>% + lm(runtime ~ samples, data = .) +cuda_multicorn_model <- full_data %>% + filter(program == "cuda", fractal == "multicorn") %>% + lm(runtime ~ samples, data = .) +cuda_julia_model <- full_data %>% + filter(program == "cuda", fractal == "julia") %>% + lm(runtime ~ samples, data = .) +``` + +<div style="display: flex; justify-content: center; align-items: center;"> +```{r cuda_models_table, echo=FALSE} +modelsummary(list( + "Mandelbrot" = cuda_mandelbrot_model, + "Burning Ship" = cuda_burning_ship_model, + "Tricorn" = cuda_tricorn_model, + "Multibrot" = cuda_multibrot_model, + "Multicorn" = cuda_multicorn_model, + "Julia" = cuda_julia_model), output = "html") +``` +</div> + + +# Parallelism + +```{r serial_times, include=FALSE} +serial_times <- full_data %>% + filter(program == "serial") %>% + group_by(samples) %>% + summarize(serial_runtime = median(runtime)) +``` + +In the plots below we show the percentage of parallelism according to Amdahls Law as a function of samples. +We truncated the data to only show percentage parallelism greater than -75%. +As we expected, the CUDA implementation is highly parallel, with the shared implementations have varying degrees of parallelism. +For each thread count, the shared implementations tend to level off at a certain percentage of parallelism. +An oddity in the data is that the shared multibrot and multicorn implementations have a negative percentage of parallelism. +We believe this is from their usage of complex exponentiation, which is their ownly major difference between the other fractals. +From this, we conclude that that there is some resource that each thread must share in multibrot and multicorn that is not shared in the other fractals. +Most likely this resource is a specialized arrithmetic unit. + +On the GPU we have a similar issue, the register count per thread from about 32 to 48 when computing the multibrot and multicorn fractals. + +```{r program_parallelism, echo=FALSE} +parallel <- full_data %>% + filter(program != "serial") %>% + left_join(serial_times, by = "samples") %>% + mutate(speedup = serial_runtime / runtime, + percentage_parallel = (threads/(threads-1))*(1-1/speedup)) %>% + filter(percentage_parallel > -0.75, is.finite(percentage_parallel)) +``` + +```{r parallel_plot} +parallel_plot <- parallel %>% + ggplot(aes(x = samples, y = percentage_parallel, color = program, + shape = factor(threads))) + + geom_point() + + stat_summary(fun = median, geom = "line", + aes(group = interaction(program, fractal, threads))) + + facet_wrap(~fractal, scales = "free_y") + + labs(title = "Program/Fractal Parallelism", + x = "Samples", + y = "Percentage Parallel", + color = "Implementation", + shape = "Threads") + +ggplotly(parallel_plot) +``` + + +# Conclusions + +From this data we can conclude that the runtime of fractal generation is linear with respect to the number of samples. +We also conclude that the CUDA implementation is highly parallel, with the shared implementations having varying degrees of parallelism.
\ No newline at end of file |
