The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work
dc.contributor.author | Geiger, R. Stuart | |
dc.contributor.author | Varoquaux, Nelle | |
dc.contributor.author | Mazel-Cabasse, Charlotte | |
dc.contributor.author | Holdgraf, Chris | |
dc.date.accessioned | 2018-06-01T22:02:02Z | |
dc.date.available | 2018-06-01T22:02:02Z | |
dc.date.issued | 2018 | |
dc.description.abstract | Computational research and data analytics increasingly relies on com- plex ecosystems of open source software (OSS) “libraries” – curated collections of reusable code that programmers import to perform a specific task. Software docu- mentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source soft- ware libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethno- graphic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer- based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, in- cluding writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our intervie- wees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project- specific factors, such as the perceived level of credit for doing documentation work versus more ‘technical’ tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their pro jects. | en |
dc.identifier.doi | 10.1007/s10606-018-9333-1 | |
dc.identifier.pissn | ISSN 0925-9724 | |
dc.language.iso | en | |
dc.publisher | Springer, London | |
dc.relation.ispartof | Computer Supported Cooperative Work 27(3-4)- ECSCW 2018: Proceedings of the 16th European Conference on Computer Supported Cooperative Work | |
dc.relation.ispartofseries | ECSCW | |
dc.subject | documentation | |
dc.subject | standards | |
dc.subject | invisible work | |
dc.subject | motivations | |
dc.subject | peer production | |
dc.subject | collaboration | |
dc.subject | infrastructure | |
dc.subject | ethnography | |
dc.subject | open source | |
dc.title | The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work | en |
dc.type | Text/Journal Article | |
gi.citations.count | 23 | |
gi.citations.element | Taylor Reiter, Phillip T Brooks†, Luiz Irber†, Shannon E K Joslin†, Charles M Reid†, Camille Scott†, C Titus Brown, N Tessa Pierce-Ward (2021): Streamlining data-intensive biology with workflow systems, In: GigaScience 1(10), doi:10.1093/gigascience/giaa140 | |
gi.citations.element | Oihane Cereceda, Danielle E.A. Quinn (2020): A graduate student perspective on overcoming barriers to interacting with open-source software, In: FACETS 1(5), doi:10.1139/facets-2019-0020 | |
gi.citations.element | Michel Muszynski, Sven Lugtigheid, Fernando Castor, Sjaak Brinkkemper (2022): A Study on the Software Architecture Documentation Practices and Maturity in Open-Source Software Development, In: 2022 IEEE 19th International Conference on Software Architecture (ICSA), doi:10.1109/icsa53651.2022.00013 | |
gi.citations.element | Nicholas A. Bokulich, Michal Ziemski, Michael S. Robeson, Benjamin D. Kaehler (2020): Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods, In: Computational and Structural Biotechnology Journal, doi:10.1016/j.csbj.2020.11.049 | |
gi.citations.element | April Yi Wang, Dakuo Wang, Jaimie Drozdal, Michael Muller, Soya Park, Justin D. Weisz, Xuye Liu, Lingfei Wu, Casey Dugan (2022): Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks, In: ACM Transactions on Computer-Human Interaction 2(29), doi:10.1145/3489465 | |
gi.citations.element | Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, Zhicong Lu (2024): CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming, In: Proceedings of the CHI Conference on Human Factors in Computing Systems, doi:10.1145/3613904.3642212 | |
gi.citations.element | Alex Cummaudo, Rajesh Vasa, John Grundy, Mohamed Abdelrazek (2022): Requirements of API Documentation: A Case Study into Computer Vision Services, In: IEEE Transactions on Software Engineering 6(48), doi:10.1109/tse.2020.3047088 | |
gi.citations.element | Hannes Hauswedell (2021): The Design of SeqAn3, In: Computational Biology, doi:10.1007/978-3-030-90990-1_4 | |
gi.citations.element | Sofia Migliorini, Roberto Verdecchia, Ivano Malavolta, Patricia Lago, Enrico Vicario (2024): Architectural Views: The State of Practice in Open-Source Software Projects, In: Lecture Notes in Computer Science, doi:10.1007/978-3-031-70797-1_27 | |
gi.citations.element | Junran Yang, Andrew M. McNutt, Leilani Battle (2024): Considering Visualization Example Galleries, In: 2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), doi:10.1109/vl/hcc60511.2024.00043 | |
gi.citations.element | Xin Tan, Minghui Zhou (2019): How to Communicate when Submitting Patches, In: Proceedings of the ACM on Human-Computer Interaction CSCW(3), doi:10.1145/3359210 | |
gi.citations.element | Jumana Almahmoud, Robert DeLine, Steven M. Drucker (2021): How Teams Communicate about the Quality of ML Models: A Case Study at an International Technology Company, In: Proceedings of the ACM on Human-Computer Interaction GROUP(5), doi:10.1145/3463934 | |
gi.citations.element | April Yi Wang, Dakuo Wang, Jaimie Drozdal, Xuye Liu, Soya Park, Steve Oney, Christopher Brooks (2021): What Makes a Well-Documented Notebook? A Case Study of Data Scientists’ Documentation Practices in Kaggle, In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, doi:10.1145/3411763.3451617 | |
gi.citations.element | Ellen Balka, Ina Wagner (2020): A Historical View of Studies of Women’s Work, In: Computer Supported Cooperative Work (CSCW) 2(30), doi:10.1007/s10606-020-09387-9 | |
gi.citations.element | David Ojimaojo Ebiloma, Clinton Ohis Aigbavboa, Chimay Anumba (2023): Towards Digital Twin Maintenance Management of Health Facilities in Nigeria: The Need for Maintenance Documentation, In: Buildings 5(13), doi:10.3390/buildings13051339 | |
gi.citations.element | Sam Lau, Justin Eldridge, Shannon Ellis, Aaron Fraenkel, Marina Langlois, Suraj Rampure, Janine Tiefenbruck, Philip J. Guo (2022): The Challenges of Evolving Technical Courses at Scale: Four Case Studies of Updating Large Data Science Courses, In: Proceedings of the Ninth ACM Conference on Learning @ Scale, doi:10.1145/3491140.3528278 | |
gi.citations.element | Karthik Ram, Carl Boettiger, Scott Chamberlain, Noam Ross, Maelle Salmon, Stefanie Butland (2019): A Community of Practice Around Peer Review for Long-Term Research Software Sustainability, In: Computing in Science & Engineering 2(21), doi:10.1109/mcse.2018.2882753 | |
gi.citations.element | Jane Hsieh, Joselyn Kim, Laura Dabbish, Haiyi Zhu (2023): "Nip it in the Bud": Moderation Strategies in Open Source Software Projects and the Role of Bots, In: Proceedings of the ACM on Human-Computer Interaction CSCW2(7), doi:10.1145/3610092 | |
gi.citations.element | Ei Pa Pa Pe-Than, Laura Dabbish, James Herbsleb (2021): Open Collaborative Writing, In: Proceedings of the ACM on Human-Computer Interaction CSCW1(5), doi:10.1145/3449211 | |
gi.citations.element | Hannes Hauswedell (2021): The SeqAn Library (Versions 1 and 2), In: Computational Biology, doi:10.1007/978-3-030-90990-1_2 | |
gi.citations.element | Nicholas Gorman, Iain MacGill, Anna Bruce (2024): How to support the adoption of open-source energy system modelling software? Insights from interviews with users and developers, In: Energy Research & Social Science, doi:10.1016/j.erss.2024.103479 | |
gi.citations.element | Taylor Reiter, Phillip T. Brooks, Luiz Irber, Shannon E.K. Joslin, Charles M. Reid, Camille Scott, C. Titus Brown, N. Tessa Pierce (2020): Streamlining Data-Intensive Biology With Workflow Systems, doi:10.1101/2020.06.30.178673 | |
gi.citations.element | Mathieu Nassif, Martin P. Robillard (2025): Non-Linear Software Documentation with Interactive Code Examples, In: ACM Transactions on Software Engineering and Methodology 2(34), doi:10.1145/3702976 | |
gi.conference.date | 4-8 June 2018 | |
gi.conference.location | Nancy, France | |
gi.conference.sessiontitle | Long Papers | |
mci.conference.review | full |
Files
Original bundle
1 - 1 of 1