class: title-slide, left, bottom
# CDU Data Science Team — RAP ---- ## **Reproducible Analytical Pipelines** ### Zoë Turner | September 2022 --- class: inverse, middle, center # What is RAP? How many have heard of Reproducible Analytical Pipelines (RAP)? --- class: inverse-white, middle # A story in Chrome tabs Guess how many tabs I have open in a "folder" called RAP? .pull-left[ ![Chalk drawn question mark on a black board](data:image/png;base64,#img/pexels-pixabay-356079.jpg) ] -- .pull-right[ 1. 1 1. 15 1. 30 1. 100 ] --- class: inverse-white, middle # Civil Service know about RAP! Since when? .pull-left[ ![Chalk drawn question mark on a black board](data:image/png;base64,#img/pexels-pixabay-356079.jpg) ] -- .pull-right[ 1. 1999 1. 2006 1. 2017 1. 2020 ] --- class: inverse-white, middle # Quoting the Government Analytical Function > Reproducibility is the cornerstone of analysis. Analysts should get the same results as each other when using the same data and methods. -- This is the very first line of the [foreword](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/) from the Head of the Analysis Function and National Statistician Prof Sir Ian Diamond. --- class: inverse-white, middle # Reproducible... ## research > Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. [Coursera Reproducible Research course](https://www.coursera.org/learn/reproducible-research) -- ## analytical pipelines > ... produce high quality, shared, reviewable, re-usable, well-documented code for data curation and analysis; minimise inefficient duplication; avoid unverifiable ‘black box’ analyses; and make each new analysis faster. [Better, broader and safer: using health data for research and analysis](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis) also known as the Goldacre Review .footnote[ [A manifesto for reproducible science](https://www.nature.com/articles/s41562-016-0021) Munafò, M., Nosek, B., Bishop, D. et al. A manifesto for reproducible science. Nat Hum Behav 1, 0021 (*2017*). https://doi.org/10.1038/s41562-016-0021 ] --- class: center, middle # Breaking it down ![Girl with wooden blocks which are rounded in overall shape but have straight sides](data:image/png;base64,#img/pexels-artem-podrez-6951915.jpg) --- # Building blocks - automate every step of analysis -- - have open code (publicly where possible and at the least within the team) -- - use version control -- - commented and documented code --- # Read more about it! Just a few of those Government links I've got open: [Better, broader, safer Review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis) and Twitter breakdown of the document by [Jess Morley](https://twitter.com/jessRmorley/status/1512013395897339909) as well as a [podcast](https://soundcloud.com/nhs-r-community/nhsr-jess-morley). [Government Analysis Function](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/) [Data Science Campus](https://datasciencecampus.ons.gov.uk/) [Government Data Science Slack](govdatascience.slack.com) - channel #rap_collaboration where RAP champion information is also shared [Udemy course RAP using R](https://www.udemy.com/course/reproducible-analytical-pipelines/) - free [Government RAP companion guide](https://ukgovdatascience.github.io/rap_companion/index.html#discovery) [NHS-R Community](https://nhsrcommunity.com/) and [NHS PyCom](https://nhs-pycom.net/) - support networks for R and Python including Slack groups [NHS Digital](https://nhsdigital.github.io/analytics-services-blog/2022/02/24/reproducible-analytical-pipelines-blog-1.html) -- [CDU Data Science Team](https://cdu-data-science-team.github.io/team-blog/) - notes that we'd share with the team but publicly, a nice way to practice being open --- # Last word - spreadsheets .pull-left[Where these are necessary, please make them accessible (which also makes them machine readable!) Good guidance from the [Government Analysis Function](https://analysisfunction.civilservice.gov.uk/policy-store/releasing-statistics-in-spreadsheets/)] .pull-right[ ![Cartoon image of green round creature with a cowboy hat sat upon a bigger bean shaped creature. The cowboy has a whip tied around grey, angry looking spreadsheets with data written above them](data:image/png;base64,#img/data_cowboy.png) ] --- class: inverse name: acknowledgement # Acknowledgments Acknowledgements: the professional look of this presentation, using NHS and Nottinghamshire Healthcare NHS Foundation Trust colour branding, exists because of the amazing work of Silvia Canelón, details of the workshops she ran at the [NHS-R Community conference](https://spcanelon.github.io/xaringan-basics-and-beyond/index.html), Milan Wiedemann who created the CDU Data Science logo with the help of the team and Zoë Turner for putting together the slides. [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @DataScienceNott](https://twitter.com/DataScienceNott) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> Clinical Development Unit Data Science Team](https://github.com/CDU-data-science-team) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> cdudatascience@nottshc.nhs.uk](mailto:cdudatascience@nottshc.nhs.uk) Images (in order of appearance): Photo by Pixabay: https://www.pexels.com/photo/question-mark-on-chalk-board-356079/ Photo by Artem Podrez: https://www.pexels.com/photo/man-person-people-woman-6951915/ Allison Horst's [Data Cowboy](https://github.com/allisonhorst/stats-illustrations/blob/main/rstats-artwork/data_cowboy.png)