• Daniel Adorno Gomes Universidade de Trás-os-Montes e Alto D'Ouro
  • Pedro Mestre
  • Carlos Serôdio



Infrastructure-as-Code, Reproducibility, Virtualization, Containerization, Open Science


Objective: The purpose of this paper is to present a case study on how a recently proposed reproducibility framework named Environment Code-First (ECF) based on the Infrastructure-as-Code approach can improve the implementation and reproduction of computing environments by reducing complexity and manual intervention.

Methodology: The study compares the manual way of implementing a pipeline and the automated method proposed by the ECF framework, showing real metrics regarding time consumption, efforts, manual intervention, and platform agnosticism. It details the steps needed to implement the computational environment of a bioinformatics pipeline named MetaWorks from the perspective of the scientist who owns the research work. Also, we present the steps taken to recreate the environment from the point of view of one who wants to reproduce the published results of a research work.

Findings and Conclusion: The results demonstrate considerable benefits in adopting the ECF framework, particularly in maintaining the same applicational behavior across different machines. Such empirical evidence underscores the significance of reducing manual intervention, as it ensures the consistent recreation of the environment as many times as needed, especially by non-original researchers.

Originality/Value: Verifying published findings in bioinformatics through independent validation is challenging, mainly when accounting for differences in software and hardware to recreate computational environments. Reproducing a computational environment that closely mimics the original proves intricate and demands a significant investment of time. This study contributes to educate and assist researchers in enhancing the reproducibility of their work by creating self-contained computational environments that are highly reproducible, isolated, portable, and platform-agnostic.


Download data is not yet available.


Barba, L. A., & Thiruvathukal, G. K. (2017). Reproducible Research for Computing in Science Engineering. Computing in Science Engineering, 19(6), 85–87.

Cacho, J. R. F., & Taghva, K. (2018). Reproducible research in document analysis and recognition. In Information Technology-New Generations (pp. 389–395). Springer.

Cacho, J. R. F., & Taghva, K. (2020). The State of Reproducible Research. In Computer Science, 17th International Conference on Information Technology – New Generations (ITNG 2020), Advances in Intelligent Systems and Computing (Vol. 1134, pp. 519-524). Springer.

Conda. (2024). Conda’s official website. Retrieved from

Coveney, P. V., Groen, D., & Hoekstra, A. G. (2021). Reliability and reproducibility in computational science: implementing validation, verification and uncertainty quantification in silico. Philosophical Transactions of the Royal Society A, 379, 1-5.

de Bayser, M., Azevedo, L. G., & Cerqueira, R. (2015). ResearchOps: The case for DevOps in scientific applications. In 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM) (pp. 1398–1404). IEEE.

Docker. (2024a). Troubleshoot topics for Docker Desktop. Retrieved from

Docker. (2024b). Workarounds for common problems. Retrieved from

Docker. (2024c). Known issues. Retrieved from

Edgar, R. C. (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv.

Gomes, D. A., Mestre, P., & Serôdio, C. (2019). Infrastructure-as-Code for Scientific Computing Environments. In CENTRIC 2019: The Twelfth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (pp. 7-10).

Gomes, D. A., Mestre, P., & Serôdio, C. (2022). Environment Code-First Framework: Provisioning Scientific Computational Environments Using the Infrastructure-as-Code Approach. International Journal on Advances in Software, 15(1 & 2), 1-13.

Gomes, D. A. (2024). Metaworks based on ECF Framework [Github repository]. Retrieved from

Grüning, B. A., Lampa, S., Vaudel, M., & Blankenberg, D. (2019). Software engineering for scientific big data analysis. Gigascience, 8(5).

Humble, J., & Farley, D. (2010). Continuous delivery: reliable software releases through build, test, and deployment automation. Pearson Education.

Liu, D. M., & Salganik, M. J. (2019). Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge. Socius, 5, 1-21.

Marwick, B. (2017). Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory, 24, 424–450.

Martin, M. (2011). CutAdapt removes adapter sequences from high-throughput sequencing reads. EMBnet journal, 17, 10.

MetaWorks. (2024a). MetaWorks’ official page on Github. Retrieved from

MetaWorks. (2024b). MetaWorks’ official implementation tutorial. Retrieved from

Morris, K. (2020). Infrastructure as Code: Dynamic Systems for the Cloud Age (2nd ed.). O’Reilly Media, Inc.

Porter, T. M., & Hajibabaei, M. (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLoS ONE, 17(9), 1-11.

Reinecke, R., Trautmann, T., Wagener, T., & Schüler, K. (2022). The critical need to foster computational reproducibility. Environmental Research Letters, 17.

Segal, J., & Morris, C. (2012). Developing Software for a Scientific Community: Some Challenges and Solutions. In J. Leng & W. Sharrock (Eds.), Handbook of Research on Computational Science and Engineering: Theory and Practice (pp. 177-196). IGI Global.

Snakemake. (2024). Snakemake’s official website. Retrieved from

St John, J. (2016). SeqPrep’s official page on Github. Retrieved from

Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology, 73, 5261–5267.

Wiebels, K., & Moreau, D. (2021). Leveraging containers for reproducible psychological research. Advances in Methods and Practices in Psychological Science, 4(2), Article 25152459211017853.

Wiggins, A. (2017). The Twelve-Factor App Official Website. Retrieved from




How to Cite

Adorno Gomes, D., Mestre, P., & Serôdio, C. (2024). INCREASING THE REPRODUCIBILITY OF SCIENTIFIC RESEARCH WORKS: A CASE STUDY USING THE ENVIRONMENT CODE-FIRST FRAMEWORK. International Journal of Professional Business Review, 9(5), e04662.