Chapter 5 Education & Training

Researchers often view software development skills as separate from, but complementary to their disciplinary training. This perspective introduces a dilemma for researchers, particularly for early-career researchers, who need to master their discipline and simultaneously acquire sufficiently-good software engineering skills. When forced to choose, they typically make sacrifices or take shortcuts in software engineering, as they view their disciplinary skills and knowledge as more important to their careers, and more rewarded by current incentive systems. Further, many researchers do not aspire to become software engineering experts, but approach the acquisition of software skills as a means to a research end. This approach is implicitly encouraged by the fact that the educational content of most disciplines that require software skills does not actually include sufficient discussion of software development or engineering. As a result, there are valuable skills such as packaging, testing, releasing, archiving, and even designing research software that are dramatically underdeveloped in the overall research ecosystem.

Indeed, previous studies demonstrate that researchers are rarely purposely trained to develop software. A 2017 survey of US National Postdoctoral Association members regarding postdocs’ use of software in research and their training regarding software development found that 95% of respondents use research software. However, 54% (n = 104) of the respondents to this survey reported that they have not received any training in software development (Nangia and Katz 2017). A similar survey of software use and training from the UK repeats this finding: in 2014, Hettrick et al. asked researchers at randomly selected researchers at 15 Russell Group universities about their software use and training. Of the 417 responses, 45% reported having no training in software development. Of the 55% of respondents that reported having received training in software development, 73% had received some form of formal training in a software development course, the remaining 27% were self-taught. In further analyzing these results, Hettrick et al. report that 21% of respondents who develop their own software had no training in software development (Hettrick 2018, 2014). A 2016 survey of 704 PIs from the Biological Sciences Directorate of the US NSF found the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC (Barone, Williams, and Micklos 2017). And a 2018 survey of almost 1600 scientist-developers found that 82% of respondents felt that they were spending “more time” or “much more time” developing software than they did 10 years ago (Pinto, Wiese, and Dias 2018).

Not only is there a gap in software development training in general, there’s also an additional gap across gender. When analyzed by gender (self reported binary of men and women) the first two surveys show that only 39% of female respondents in the UK and 32% in the USA report have received some form of software development training, compared to 63% of male respondents in the UK and 64% in the USA.

This lack of training is not new, nor is it newly discovered. Greg Wilson has written (Wilson 2016; Software Carpentry n. d.) about the history of Software Carpentry:

In 1995–96, [Wilson] organized a series of articles in IEEE Computational Science & Engineering titled, “What Should Computer Scientists Teach to Physical Scientists and Engineers?” These grew out of the frustration he had working with scientists who wanted to run before they could walk, i.e., to parallelize complex programs that were not broken down into self-contained functions, that did not have any automated tests, and that were not under version control.

In response, John Reynders (then director of the Advanced Computing Laboratory at Los Alamos National Laboratory) invited [Wilson] and Brent Gorda (now at Intel) to teach a week-long course to LANL staff. This course ran for the first time in July 1998, and was repeated nine times over the next four years. It eventually wound down as Gorda and [Wilson] moved on to other projects, but two valuable lessons were learned:

  1. There is tremendous pent-up demand for training in basic skills.

  2. Textbook software engineering is not the right thing to teach most scientists.

This need led to the Software Carpentry organization, which built a successful train-the-trainer model for collaborative lesson development and delivery, and which was used as a model to build Data Carpentry, which then merged with Software Carpentry into The Carpentries, and now also includes Library Carpentry. The Carpentries has developed highly impactful training content and events that meet an intermediate need for training researchers to better develop software. Many computing centers, including those funded by NSF and DOE, have also long had user training in some types of development (similar to that of Los Alamos), for example, joint activities run by TeraGrid and now XSEDE, and Argonne’s ATPESC. Given the success of the Carpentries, these computing center training events often either include or contribute to Carpentries lessons, or complement them. In addition, other NSF-funded software institutes (e.g., MolSSI, SGCI, and IRIS-HEP) are also developing training resources for their target communities, similarly making use of, contributing to, and complementing the Carpentries.

While many researchers may learn how to program on their own, the opportunities for them to learn software engineering and software deisgn are less available. Therefore, we believe that there is an important role that a US-based research software institute can play to expand the breadth and depth of the training available, and to further coordinate and facilitate the acquisition of software engineering expertise by researchers engaged in various development activities. Rather than reinventing content, we anticipate collaborating with and pointing to relevant resources developed through other related efforts.

The Education & Training thrust of URSSI complements and builds upon existing efforts through the following planned activities (more detail appears below):

  1. Develop and annually run a summer school on practices for sustainable software development for research software.

  2. Develop a Research Project Carpentry lesson program that teaches people how to turn code into a formal project.

  3. Plan a review service for software project plans (and note that the implementation of this plan is likely out of scope of URSSI by itself).

  4. Evaluate and document successes and failures of industrial software development practices in research software.

  5. Compile and maintain a body of knowledge of best practices for research software development. (This URSSI activity will seek to aggregate both existing practices, and serve as an outlet for emerging practices from other domain-specific software institute, e.g., MolSSI, SGCI, IRIS-HEP)

5.1 Education & Training resources

FigName

This area requires the following staff:

  • Training Lead: coordinates all training activities and assessment of those activities (1 FTE).

  • Training Officer: assists Training Lead in coordination of training activities, develops and delivers training activities (1 FTE).

  • Assessment Coordinator: leads the assessment efforts, in conjunction with the Training Lead (this position could be shared with other parts of the project) (0.5 FTE).

  • Research Coordinator: leads the effort to gather experiences from research software projects (0.25-0.5 FTE).

5.2 Education & Training methods

This section provides more details on how we plan to design and execute each of these activities within URSSI.

5.2.1 Develop and annually run a summer school on practices for sustainable software development for research software

Overview: During the conceptualization phase, we identified a need for additional, focused training opportunities on sustainable software development practices for researchers who were not in a domain covered by an existing institute. One of the primary reasons for this need is that it is generally difficult for degree programs to make room for this type of content. We plan to meet this need with a summer school, which is a proven method for providing hands-on training to transfer these skills to students, in this case on sustainable software development practices for research software development. We conducted a Pilot Winter School as part of the conceptualization process. The details of that activity appear in Chapter 2. The plan described here builds on the key lessons learned, including: (1) the school needs to be longer to allow for both discussion time and focused time for students to work on their own code, applying the lessons from the lectures, and (2) the presence of additional helpers outside of the primary instructors was quite beneficial to help answer specific questions from the students.

Content: Based on the needs identified during our survey, workshops, and other discussions, we identified the following topics as candidates for inclusion in such a school: (1) software design (including modularity), (2) software testing, (3) packaging and distributing code, (4) collaborative software development with modern version control (i.e., git/GitHub), (5) peer code review, (6) open science principles, and (7) software citation.

Format: The summer school will be a 1-week in-person event. We plan to follow the successful PNAS hackweek model (Huppenkothen et al. 2018) where we allocate 50% of the time to lectures and 50% of the time to putting the concepts into practice. We will invite instructors with expertise in the different topic areas (see above) to participate and deliver the lectures, and will take diversity into consideration when selecting instructors. One of the criteria used to select participants will be a diversity statement that potential participants will be required to submit, and at least 25% of the selected participants will be members of underrepresented groups. To facilitate the “hacking” time, we will ask participants to bring a code project they wish to work on during the school. The “hacking sessions” will provide ample opportunities for the participants to have hands-on time applying the concepts from the lectures to their own code projects. These sessions also provide time for code reviews and other ad hoc topics that arise during the school. The instructors will be available during the hands-on time to help participants as needed. By the end of the school, participants should have substantial intellectual knowledge and experiential knowledge. Because researchers often do not have sufficient time for training or to attend events like this one, as an alternative to the face-to-face summer school events, we also plan to package the content into Carpentries-style lessons that can be used either in shorter training sessions, webinars, or asynchronously.

The URSSI summer school is primarily aimed at the people and careers portion of URSSI’s impact goals, and secondarily at the research software sustainability and research software ecosystem portions.

5.2.2 Develop a Research Project Carpentry lesson program that teaches people how to turn code into a formal project

Overview: One challenge faced by many research software developers is how to convert a small (individual) coding effort into a more formal project. This situation occurs frequently as researchers develop small units of code to accomplish tasks important in their own work. When other researchers are exposed to these code units, they often want either to use the code, add to the code, or both. This is the point when the researcher needs to convert that code into a more formal project, both to ensure quality as others use it and to allow the community to contribute to it. Therefore, we propose to develop a Research Project Carpentry lesson plan, along similar lines as Software Carpentry, that will help researchers make this transition from an individual coding effort to a community-focused project.

Content: The goal of this effort will be to help projects develop a good project plan. As such, the Research Project Carpentries lesson program will focus on helping researchers understand what content a good plan needs to contain. It will also provide information researchers need for making choices about which of the various practices are most relevant to their particular project. The topics to be covered in Research Project Carpentry include: (1) requirements elicitation, (2) issue tracking, (3) source and version control, (4) testing, (5) documentation, (6) packaging and distribution, (7) pair programming, (8) code review, and (9) metrics. Note that there is some overlap between the focus of Research Project Carpentry and the Summer School described above. We anticipate reusing relevant content between these efforts.

Format: We will develop modules on the topics above as carpentries-style modules. This approach will allow the Research Projects Carpentries materials to be used either in a face-to-face training course, like the Summer School, or as stand-alone units. This flexibility will allow researchers to access the content in a mechanism that is most relevant to them. When this material is tested, we will use use a diversity statement submitted by potential participants as an input to the decision process, and at least 25% of the selected participants will be members of underrepresented groups.

Note: While this activity is relevant to the URSSI effort, we believe that it will have appeal beyond URSSI. Therefore, even though we include it here in the plan, we will also seek funding to support it from organizations outside of NSF. We also plan to solicit other organizations and individuals to contribute to this activity. URSSI will lead this work, organizing a community that will include external participants.

The Research Project Carpentry lesson program is primarily aimed at the research software sustainability portion of URSSI’s impact goals, and secondarily at the research software ecosystem portion.

5.2.3 Plan a review service for software project plans

Building on the work done for Research Project Carpentries, URSSI has identified the need for a service where experts review and comment on proposed project plans. Researchers who are interested in this service would submit a project plan in a specified format. Then, experts would review the plan to ensure both that it is (1) complete with respect to the content necessary for the type of project and its current stage, as well as (2) the choices made for each aspect of the project plan are consistent with best practices. Once the review is completed, the reviewer would work with the project team to help resolve any deficiencies identified in the original project plan. This model is akin to that used by many “incubators” for start-up companies.

This activity is likely beyond the scope of URSSI, so we will create an initial plan for it, then seek to partner with other organizations who are pursuing similar efforts and explore external funding sources. In addition to the operating the service, URSSI and partners will need to develop and make available the following resources: (1) templates for project plans, (2) checklists for researchers to follow when developing the plan, and (3) guidance on how to choose from various alternatives for each section of the plan.

The softare project plan review service is aimed at the research software sustainability portion of URSSI’s impact goals.

5.2.4 Evaluate and document successes and failures of the use of industrial software development practices in research software

Industrial software engineering has developed and refined a number of best practices for software engineering and software development, resulting in a large body of related literature. Conversely, it is not always clear which of these practices are relevant for use in developing research software, most notably because of the different incentive structures for developers in the research arena. It is also not always clear which aspects of the research software domain motivate the need to develop new software engineering or software development practices to handle the unique contexts within which research software exists. To accomplish this activity, URSSI will need to perform the following steps.

First, URSSI will gather experience reports describing the successful and unsuccessful use of industrial software engineering and software development practices in research software projects. Gathering these experiences will involve conducting surveys and interviews of developers of research software to document important information. In addition to the successes and failures with specific practices, these experience reports will also document cases where developers of research software encountered a situation or a problem for which they were unable to find software engineering or software development practices that were relevant. Cases where there is a lack of relevant practices suggest opportunities for creation of new practices.

Second, URSSI will analyze the experience reports to systematize the successes and failures and develop a series of evidence-based lessons learned. Because this information will be based upon real experiences from research projects, they will have great value to other research software projects.

The results of these efforts will feed into the next activity by providing concrete experiences that we can document into best practices. These experiences will be beneficial because we will draw them from experiences on real projects, with knowledge about the context within which the practice works. This context will help provide a description of the limitations on the practice, which is equally important to document.

This activity is primarily aimed at the research software sustainability portion of URSSI’s impact goals, and secondarily at the research software ecosystem portion.

5.2.5 Compile and maintain a body of knowledge of best practices for research software development

Based on the outcomes of the above activities, URSSI will develop a portal to provide living information about best practices for research software development. In addition to including content developed from our own activities, URSSI will also actively find, through various means that include workshops and surveys, and curate information from other sites related to the development of research or scientific software. We will specifically ensure that information produced by the Carpentries is included, as relevant.

We will curate the following types of information:

  1. Checklists for sustainable software - much of the content here will overlap with ideas discussed in earlier activities.

  2. Templates for use in developing software, including templates for project plans, templates for documenting requirements, templates for test planning, and templates for community guidelines and policies.

  3. Badging efforts that provide recognition for various qualities of research software and research software projects.

In addition to best practices, the URSSI clearinghouse will also curate information about training resources. In addition to providing links to the URSSI-developed training (described above), we will also link to other relevant training resources developed through other efforts like MoLSSI, SGCI, and IRIS-HEP. To help individual researchers and teams better understand how to make use of the various training options, we will develop roadmaps that guide people to the most relevant training for their current situation.

This activity is also primarily aimed at the research software sustainability portion of URSSI’s impact goals, and secondarily at the research software ecosystem portion.

5.3 Cross-cutting activities

To support the five activities described above, URSSI will also need to perform a number of cross-cutting activities as follows:

  1. Curriculum Development and Coordination To achieve the goals stated above, URSSI will need to support the development of new training materials. We anticipate providing support to experts in the research software community who can take their existing knowledge and package it in a Carpentries-style format that is most amenable to be used in the various types of training activities URSSI will support. Importantly, the training proposed above is meant to build upon, and provide a complement to existing educational outreach efforts. This training will provide follow on, more advanced content than the Carpentries, and will be longer in duration. We can provide this advanced content because our target audience is a subset of the audience focused on by the Carpentries.

  2. Instructor Training To help deliver all of the developed content to a wide range of audiences in various locations, URSSI will need to provide support for training instructors. These instructors will be integral to the success of the Summer School and the Research Projects Carpentry.

  3. Piloting Workshops As we develop content, we will conduct smaller workshops to pilot the content to ensure its sufficiency before deploying it to a larger audience. Because we will be asking people to participate in content development by providing feedback, we will need to have resources to support the travel for instructors and participants.

  4. Gathering Experiences on Successes and Failures of Software Practices URSSI will support a team of researchers, who are experienced in survey and interview methods, to gather experiences, summarize the results, and document the findings. In addition, URSSI will need to provide support for travel for URSSI personnel to meet with members of research projects.

  5. Conducting Workshops URSSI will provide support for the Summer School and Research Project Carpentries instructors to travel to the workshops. URSSI will also provide small fellowships for the instructors to compensate for the time required to participate. In addition, to increase participation from a diverse community, URSSI will provide travel support for a small number of workshop participants, determined based on need in individual cases.

  6. Packaging and Sharing As URSSI develops content related to the educational activities described in this chapter, we will package that content in a shareable format. We will make the content freely available on GitHub. We will also archive each release of content on Zenodo using a proper license that allows anyone to use and/or modify the content as needed. In addition, we will strive to work with the Carpentries to enable the content to reach more people.

5.4 Education & Training metrics/milestones

To ensure that the training activities are providing sufficient content, URSSI will conduct numerous assessment activities. These activities will occur both concurrently with the training activities, i.e., as entry and exit surveys, as well as longitudinally after training. The benefit of conducting assessment longitudinally is that it provides insight into whether the content learned during training is viewed as useful over time as real projects evolve.

Some of the specific measures that we will gather include:

  • Post-event evaluations: For workshops and training courses, we will conduct follow-up surveys to evaluate the value of the content. We will conduct evaluations immediately at the end of each event while the information is fresh in the attendees’ minds. We will then conduct a follow-up survey 6 months, 1 year, and 3 years later to evaluate the applicability of the content to the attendees’ work. In conducting these evaluations, we will build on lessons learned and guidance from The Carpentries Assessment Network.

  • Demand: We will measure demand for the content by the number of people who apply to attend various training events and workshops.

  • Use of material: We will measure downloads and use of the information that we produce.

Impacts:

This table maps URSSI activities from this chapter to the three portions of URSSI’s intended impact. (A complete table of impacts from all URSSI activities can be found in Chapter 10 (Metrics and Evaluation).) In the impact cells, X indicates a designed primary impact on an activity, and y indicates a designed secondary impact.

Table 5.1: Impact table
Activity Impact on Research Software Impact on people and careers Impact on research software ecosystem
Summer school y X y
Projects Carpentry X y
Review service for software project plans X
Examine use of industrial software development practices in research software X y
Book of knowledge of software dev best practices X y

References

Barone, Lindsay, Jason Williams, and David Micklos. 2017. “Unmet Needs for Analyzing Biological Big Data: A Survey of 704 NSF Principal Investigators.” PLoS Computational Biology 13 (10). https://doi.org/10.1371/journal.pcbi.1005755.
Hettrick, Simon. 2014. “It’s Impossible to Conduct Research Without Software, Say 7 Out of 10 UK Researchers.” https://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers.
———. 2018. softwaresaved/software_in_research_survey_2014: Software in research survey.” Zenodo. https://doi.org/10.5281/zenodo.1183562.
Huppenkothen, Daniela, Anthony Arendt, David W. Hogg, Karthik Ram, Jacob T. VanderPlas, and Ariel Rokem. 2018. “Hack Weeks as a Model for Data Science Education and Collaboration.” Proceedings of the National Academy of Sciences 115 (36): 8872–77. https://doi.org/10.1073/pnas.1717196115.
Nangia, Udit, and Daniel S. Katz. 2017. “Track 1 Paper: Surveying the u.s. National Postdoctoral Association Regarding Software Use and Training in Research.” figshare. https://doi.org/10.6084/m9.figshare.5328442.v3.
Pinto, Gustavo, Igor Wiese, and Luiz Felipe Dias. 2018. “How Do Scientists Develop Scientific Software? An External Replication.” In 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 582–91. https://doi.org/10.1109/SANER.2018.8330263.
Software Carpentry. n. d. “The History of Software Carpentry.” https://software-carpentry.org/scf/history/.
Wilson, Greg. 2016. “Software Carpentry: Lessons Learned [Version 2; Peer Review: 3 Approved].” F1000Research 3 (62). https://doi.org/10.12688/f1000research.3-62.v2.