Whether you are working on your thesis, doing research for an organization, or want to bring the field you work in forward – applying OS practices will be an advantage for your research process and output. As implementing OS practices is not always easy, this practical guide might give some advice on how to get started or how to implement more OS practices into your research workflow.
Although there has been a fair amount of discussion about the importance of implementing Open Science (OS), only a few papers address how to practically implement OS into your research routine. This is especially difficult for students, who are relatively new to the academic world. Hence, this brief guide will give some useful tips on how to integrate OS practices into your research work flow. We hope these tips will increase the transparency and quality of your research, while requiring less time and effort. And the earlier you start, the easier it is to keep up the good routines later in your career.
The following guide will give practical advice on implementing OS at all stages throughout the entire research process, beginning with the study planning and ending with the post-publication time. Important OS steps after the planning phase are preregistering the project, and making sure that all information you produce are transparent and reproducible (including the code and analyses). This can be aided by progress tracking tools such as version control, backups and lab books. Finally, when everything is done, it is great to make your information and materials available, accessable, and visible.
- Study Planning
Having a good understanding of the literature can help you put your finger on which gaps in the field to fill and how your research could fit into the context of the literature. Reviewing the literature can help you to evaluate what’s worth testing and give you ideas on how to do so in clever ways. Then, think carefully about your design and write those thoughts down. Such notes will help you later to write the preregistration but also the introduction, methods and eventually the discussion of your manuscript or coursework. It is generally better to first write down rather more than less – you can track your progress and back-trace your line of thought. It doesn’t matter so much whether you make this part of your work public or not (I usually don’t), but it will likely help you and eventual collaborators at a later point in time as a memo of what has been done and why. Keeping track of things can happen in many different ways; you can write down your notes on a piece of paper, in Microsoft word or another text processing software, or use the notes tool in your reference managers. Examples of open source reference management software are Zotero and Mendeley. It is helpful to use these from the very beginning of your reading process, and both saving references but also making use of features like tags and notes makes citing literature much easier later on.
2. Preregistering your study
At the preregistration stage, you should clearly articulate hypotheses and how you would like to test them. Preregistering a study involves specifying as much information about what is going to be done in a study as possible before executing the study. The most important parts of the study which should be preregistered are the hypotheses, study design and data analyses. Preregistrations are then uploaded to preregistration platforms to give them a times stamp and making them accessible. Examples of preregistration platforms are www.osf.io and www.aspredicted.org. There, you can find preregistraton templates for different types of studies. It’s important to preregister prior to the data analysis, or, even better, before starting your data collection. If you want to explore the data and have no hypotheses, that’s okay. You can state that in the preregistration as well. The best way to go is to be transparent instead of doing things which cannot be backtracked in the end. The more detail you include in your preregistration, the better, but if you only start out with research and/or using OS practices, don’t let perfect be the enemy of good!
If you test hypotheses, the best way to set them up is by giving them directionality or even a specific range/interval (e.g., determining an effect of interest to be Cohen’s d >0.3). In other words, instead of just hypothesising that there will be a difference between A and B, you could predict A to be bigger or smaller than B and the effect size will fall in the range of XX. Directional predictions allow for using one-sided tests, which, in turn, increases the chance of detecting a true effect (Lakens, 2016). When you are happy to define a certain effect size of interest, you can use equivalence testing (looking whether the test is large enough to be interesting) to see whether an effect is surprisingly small given the true existence of your effect of interest (Lakens, 2018). See Figure 1 for an overview of different forms of null-hypothesis significance testing. On the other hand, not being sure about the directionality of an effect is okay, as long as you state in the preregistration which statistical tests you will use to test the hypotheses.
Figure 1. Different forms of Null-Hypothesis Significance testing (from Lakens, 2018). The more specific you are in your prediction, the more surprising it is if the null-hypothesis is being rejected. That allows for stronger inference than unspecific testing, like in panel (a), in which the prediction merely is that the effect is ≠ 0. Alternatives are to test in different ways whether the observed effect is large enough to be deemed interesting or even different from 0.
Another consideration is the sample size. Here it is often good to report a power or sensitivity analysis to show how you justify your planned sample size (statistical power indicates the probability to find a true effect, given a certain sample and effect size). Alternatively, when you don’t have the possibility to calculate power, due to insufficient information, you can give some information on how you plan to collect data and at what point you would stop – for example, when your 3-months data collection period is over. A widely used, open and easy-use power calculation program is G*Power, but there are many other software packages, such as the pwr or Superpower packages for R (or the Shiny app with the latter). Don’t worry if you have a complicated study design and don’t feel comfortable to code. There are Shiny apps available for many different designs. Just google ‘shinyapp + your specific analysis’ and you will most likely find a user friendly option.
3. Generating Reproducible Code
If you write your own code, use sufficient descriptions (comments), so that you and others can intuitively understand what each line of code does. A rule of thumb is to write a short but decipherable description every 3 lines. Sorting the code into chunks and having an overview or table of contents for each chunk helps others understand the code afterwards. Programs such as R Markdown have some neat features which can aid reproducibility and replicability of findings by making it more efficient to understand code. A nice introduction to a reproducible workflow with R Markdown is this workshop by Mine Çetinkaya-Rundel.
If you don’t write your own code but use point and click interface type programs, consider using programs like JASP or jamovi which make it very easy to export your analyses in order to make your analyses reproducible. But also programs such as SPSS allow you to export your syntax so that anyone can better understand your analyses.
4. Ensuring a Replicable Analysis
No matter which analysis program you use, the most important thing to think about is making it possible for others to reproduce and replicate your analyses. Only when your analyses can be reproduced by others they are robust and credible. A recent, popular and often applied set of guidelines are the FAIR Guiding Principles for scientific data management and stewardship. FAIR stands for Findability, Accessibility, Interoperability, and Reuse of digital assets and gives recommendations how to improve these aspects (more on this here).
Reproducing your analyses requires others to have access to your data, information about the software (version), and, as mentioned above, your code/scripts. There are different ways of exporting your scripts, a simple text or word file in addition to the original program’s syntax file is a safe bet. Additionally, using descriptive names in your data files and analysis can be very helpful; they should be described in an additional codebook which can be a simple text file called README. Here, it is important to describe each variable’s meaning, how variables and missing values are coded (e.g., “99”, “NA” or “”), what has been done with missing values (whether you left them as they are, or imputation data), and which data exclusion criteria were applied. If your analyses are not straightforward (as long as they are not extremely easy, they are not straightforward), make sure to describe what you did in sufficient detail in the codebook (check out a more in-depth guide here).
5. Tracking your progress
Version control and backups can save your life. I know from several people that they frantically send emails to themselves to avoid data loss or to keep an overview of time-stamped document versions. Yes, I’m still doing this sometimes too. You can also save documents with a prefix or suffix which specifies the date you worked on the document, e.g. 2021-01-01-thesis.docx. This way you will end up with many files which you can use to backtrack what has been done, if something goes wrong or you cannot remember how you got to a specific conclusion no matter how hard you try.
Version control and backups are points which clearly show advantages of using online depositories/cloud-based services to store data. And once you are ready, you can make them publicly available. An accessible introduction to the version control software GitHub is this recent talk by Julia Haaf organised by SIOS.
When running experiments, a lab book (notes taken during or after each experimental session/day) can be a helpful tool. A recent international initiative promotes open lab books to “generate scientific ideas and discussions, avoid redundancy, foster collaborations, and accelerate progress”. Just as the other mentioned ways of progress tracking, lab books will also help you keep large amounts of information readily available in a format that you choose. Sooner or later, you will be asked specific questions about your work. Having detailed information easily available will then save a lot of time and potential panic.
6. Availability of your project
When completing the project, make as much information available, but keep it in a format allowing even a stranger to still understand your project. A great platform to upload all information around your project is the Open Science Framework (OSF – www.osf.io). The storage is for free and you can have both private and public projects. This can also be changed – for example, after your manuscript is published, you can make all your project information publicly available. Finally, keep in mind that people will only read what they see or hear about. So, don’t be shy and share your findings on ResearchGate, Twitter, or other social media and don’t necessarily wait until the paper has come through peer review- you can also think about preprinting it. That would mean publishing the manuscript before it gets peer-reviewed to make it quickly accessible and eventually possible for you to get timely feedback.
If you are a psychology student, consider submitting to student-friendly journals such as the Journal of Trial and Error or the Journal of European Psychology Students to experience the scientific publishing process yourself, receive valuable feedback on your work, and get your first peer-reviewed publication which can result from coursework and theses!
Implementing OS practices into your research workflow will not only make it easier for others to understand your work, reproduce and replicate it, but also for yourself. Additionally, you will save immense amounts of time post-project. The recommended steps mentioned here might appear like a lot to keep in mind. Yet, once they become a habit, you won’t feel like they are any additional work but rather logical steps doing transparent and reproducible research.