‘MaDiH: Research Software Engineering Training’

An Open Access synthesis of the RSE Training workshop for MaDiH project, King’s Digital Lab, London 2-5 July 2019

The purpose of the workshop was to introduce best practices in Research Software Engineering and the Software Engineering Lifecycle (SDLC) adopted at King’s Digital Lab. The target audience of the workshop was members of the team of the AHRC-funded “MaDiH: Mapping the Digital Cultural Heritage in Jordan”.

You can annotate and discuss all the material here with the hypothes.is plugin.

King's Digital Lab: Overview and Context

Welcoming remarks

Professor Graeme Earl

The participants were welcomed by the Vice-Dean for External Relations at King’s College London, Professor Graeme Earl from the Department of Digital Humanities

Abstract

King’s Digital Lab: Overview and Context

The session gave an overview of the institutional context of the workshop, mentioning over 30 years of history of Humanities Computing and Digital Humanities (DH) at King’s College London and presented King’s Digital Lab (KDL), founded in 2015; KDL’s place and role within the university, its team, infrastructure, activities, projects, funding strategy, partners and collaborators were briefly discussed. The session dwelled on emerging as well as core research themes and directions for KDL, such as immersive experiences, software development, archiving and sustainability, machine learning and big data analysis, design, visualisation and indigenous DH. The last part of the session focused on the contexts in which KDL operates that necessitate a balance between a drive for innovation and focus on continuity and value, between experimentation and institutional responsibility. In order to address the scale and complexity of contemporary DH projects and infrastructures, KDL has adopted and adapted industry standards, broadly accepted research data management workflows and clearly defined processes for documentation and software development lifecycle (SDLC).

Speaker for this session

  • James Smithies

    Dr James Smithies is Director of King’s Digital Lab. He was previously Senior Lecturer in Digital Humanities and Associate Director of the UC CEISMIC Digital Archive at the University of Canterbury, New Zealand. He has worked in the government and commercial IT sectors as a technical writer and editor, business analyst, and project manager. His monograph The Digital Humanities and the Digital Modern was published by Palgrave Macmillan in 2017.

KDL Practices: Team, Systems, Data and Models (Part One)

Abstract

The first part of the session began with the description of team roles at KDL, aligned with a holistic vision of Research Software Engineering (RSE) career paths plotted in a continuum from research active to research support profiles, mappable to best practices in industry standards such as development and management frameworks (in particular the Agile Dynamic Systems Development Method: DSDM) and competence skills (e.g. SFIA). These roles and promotion processes were compared and contrasted with traditional roles and promotion processes in academia on the one hand, and industry on the other. Focusing on the example of the analyst role, the range of tasks and responsibilities inherent to the role was demonstrated.

In what followed, infrastructures and frameworks used at KDL, the lifecycle of a typical project and its SDLC were outlined, beginning with initial contact by the partner (i.e. an academic, a research institution or a business) through to funding application, release and post-project hosting and maintenance. Sources of funding and classification of projects according to size were also discussed.

KDL members of staff work on several projects in different stages of development at the same time, and in order to manage the workflow most effectively, the team has implemented flexible “timebox planning”, whereby tasks are planned for each “timebox” (two-week period) and then are revised according to priority and resources. In order to manage and document the processes effectively, KDL has developed a set of templates for project governance documentation corresponding to each stage of the project SDLC (such as “Terms of Reference”, “Feasibility”, “Product Quote”, “Project Review Record” and “Service Level Agreement (SLA))”. Simplified templates with contextual information on use can be consulted by accessing KDL’s repository on GitHub.

Speaker for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

KDL Practices: Team, Systems, Data and Models (Part Two)

Abstract

The second part of the session introduced the concept of “Double Diamond” – a visual map of the design process (developed and described by the Design Council). The Double Diamond model posits the necessity, for any creative process, to spend time defining the problem before setting out to solve it. Therefore, the process is divided into four distinct phases – Discover (a problem), Define (the problem), Develop (a solution) and Deliver (the solution).

These four phases can be mapped onto two of the three stages of KDL’s workflow which comprises

  1. Pre-project (Discover/Define)
  2. Project development from start to launch (Define, Design and Develop)
  3. Post project (maintenance, archiving and decommission)

The second phase of project evolutionary development is an iterative cycle that evolves as development progresses.

The rest of the session was dedicated to detailed analysis of workflows by RSE role (Project Manager; Analyst; Software Engineer; UI/UX Designer; Systems Manager) with the tasks mapped onto the three stages of KDL project workflow, from Pre-project through Project development to Post-project.

Speaker for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

Project Requirements

Abstract

At the beginning of the session, the elements to be taken into consideration while planning any DH project were named and discussed: a core idea typically evolves into a set of research questions, then methods to address them are identified along with management approaches, required budget to support the project and any sustainability issues that need to be planned for in advance.

When it comes to development, in contrast to traditional project management approaches that fix the features to be implemented in the course of the project, Agile project management approaches, adopted by KDL, treat time and cost as fixed variables. Instead of committing to implementing a set of features, the team commits to delivering value within defined constraints of cost and time, whereas features may change as the project progresses.

At pre-project stage, feasibility analysis needs to be conducted before a project is considered viable and the rest of the life-cycle initiated. In order to prioritise requirements, KDL uses a prioritisation technique known as the MoSCoW principles, whereby, following consultation with the partners, each required feature is assigned a status: M for “must”, S for “should”, C for “could” and W for “will not have this time”. W requirements, although recorded by the analyst, are considered to be outside the scope of the project, and C requirements are low priority. In the iterative process of project development, this prioritisation technique is used throughout the project at review meetings.

Since this training workshop was part of a specific project (“MaDiH: Mapping Digital Cultural Heritage in Jordan”) with a defined core idea and a set of high level requirements, the rest of the session focused on the project deliverables. Producing a Proof of Concept (PoC) is one of the objectives of the project; the notion of prototype, proving that a scaled-up version of the project can be executed, was therefore discussed and critiqued as RSE best practice to encourage review of and reflection on the project’s feasibility and gap analysis.

The session further provided some examples taken from the MaDiH pre-project stage and demonstrated KDL’s template for requirement analysis.

In addition, the applications of the “Comprehensive Archive Network” (CKAN) system, widely used across the public sector, as a proposed solution for capturing metadata in MaDiH was discussed as an example of how an RSE solution evolves through the KDL software development lifecycle.

Speaker for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

Data Modelling

Abstract

This session started with a discussion of the concepts of Data, Information and Knowledge. A data model, i.e. a formalised description of how to organize data in an information system, is the backbone of most RSE evolving solutions. One way of modelling data is by creating an ontology, that is an explicit set of relational statements connecting entities (e.g. artefacts, places, people, events). Ontologies not only help organise data for the purposes of a particular project, they also make future integration and reuse of structured data possible.

The session traced the meaning of the term “ontology” from philosophy to data modelling, where it is used to designate conceptualisation and agreed upon representation of data about a knowledge domain.

There are two approaches to building ontologies. One can be described as realist (that strives to render full complexity), the other as pragmatic (that focuses on representation of data best suited for the project and purpose in question). Due to the often complex and ambiguous nature of data in the Humanities and Cultural Heritage sectors, DH scholars often favour the pragmatic approach. In order to understand the concept of ontology, metaphors such as “agreement”, “contract” or “compromise” were introduced as useful semantic devices in the data modelling process. Access to expert knowledge about the domain and reflection on the future uses of the data were discussed as essential constituent parts of this process.

After analysing a set of examples from the “Henry III Fine Rolls” project, the session concluded with a detailed discussion of the CIDOC Conceptual Reference Model (CRM) as a guide for good practice in conceptual modelling in the Humanities and Cultural Heritage.

The afternoon of Day Two was dedicated to a visit to the British Museum led by Dominic Oldman and Diana Tanase.

Speaker for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

MaDiH Example

Abstract: Requirement Analysis and Data Modelling for MaDiH

During this project-specific session, the information presented in the first two days of the workshop was applied further to the concrete example of the Mapping Digital Cultural Heritage in Jordan (MaDiH) project. The session discussed the requirement elicitation phase of the project, presented the template for requirements elicitations developed at KDL, the project’s CKAN instance and the workflow for the identification and entry of datasets and resources, with reference to supporting techniques, such as interview template and diary/log, and standard SDLC operational methods, such as the review process.

RSE in Action

The session was followed by two short presentations by partners showcasing other projects:

  1. Arts of Making (Will Wootton, Department of Classics, King’s College London)
  2. Getty Digital Itineraries of Art History (Stuart Dunn, Dept of Digital Humanities, King’s College London)

Speakers for this session

  • James Smithies

    Dr James Smithies is Director of King’s Digital Lab. He was previously Senior Lecturer in Digital Humanities and Associate Director of the UC CEISMIC Digital Archive at the University of Canterbury, New Zealand. He has worked in the government and commercial IT sectors as a technical writer and editor, business analyst, and project manager. His monograph The Digital Humanities and the Digital Modern was published by Palgrave Macmillan in 2017.

Practical Methods in Digital Heritage: 3D Modelling, Photogrammetry, XR

Abstract

Moving on to specific examples and RSE methods, the session started with a brief introduction to digital 3D-modelling and its applications for DH research. 3D-models help preserve fragile artefacts by allowing close examination without endangering the objects themselves, defy distance by making study of objects in dispersed collections possible, and reconstruct no longer existing environments in order to provoke emotional and intellectual responses and facilitate research. A useful way of thinking about 3D modelling consists in differentiating between “acquisition” (i.e. direct scanning or sensing of a physical surface) and “creation” (i.e. manual, interpretative (re)construction).

The second part of the session was dedicated to photogrammetry, a set of techniques and tools for extracting three-dimensional data from images. The necessary equipment and available software tools as well as the frictions between “ideal set-up” and the more realistic contexts of noisy environments for capturing images were discussed. Different stages of the photogrammetry process (of the “photogrammetry pipeline”) such as Natural Feature Extraction, Image Matching, Features Matching, Structure from Motion, Depth maps estimation, Meshing, and Texturing were outlined. Limitations of the method and ways of counteracting them were mentioned. The session concluded with a practical exercise, equipment demonstrations, and a taste of immersive experiences.

Speaker for this session

  • Neil Jakeman

    Neil Jakeman is Senior Software Analyst at King's Digital Lab. Neil has a background in environmental analysis, spatial statistics and experience in commercial development. At KDL, Neil has led the development of a number of important projects. His particular teresearch interests lie in the fields of geospatial humanities and virtual reality.

Best Practices in Design and Development of Research Infrastructures

Abstract

This session covered a number of topics in best practices of design and development of Research Infrastructures with a particular focus on Research Data Management (RDM).

It started with an outline of the “Open Access Chain” (Open standards – Open data – Open source – Open licenses – Open access), drawing on opportunities and challenges that an open approach to data presents in an RSE context. One of the key challenges is a necessity for a rigorous SDLC (software development lifecycle) process and adoption of common standards.

Responding to the needs of data-intensive research and machine-actionability, FORCE11 coalition developed FAIR data principles that specify a set of requirements for data and metadata: data should be findabile, accessible, interoperable and reusable. To facilitate transition to FAIR data, Go-FAIR initiative developed a workflow (“FAIRification process”) that guides researchers from retrieval of non-FAIR data to deployment of FAIR data resources.

In the session, the place of RDM in the software development lifecycle at KDL was discussed. The nature of Agile projects that KDL often works with, however, means that in many cases the project begins before any data is collected or data collection continues throughout evolutionary development requiring monitoring methods and forward-planning to be in place and exercised throughout the SDLC.

With proliferation of data, questions of privacy are more and more pressing. The General Data Protection Regulation (GDPR) became directly applicable in all EU states on 25 May 2018 and requires informed consent to process personal data. In this context, the history of the regulation and its implications for DH projects, both legal and ethical, were discussed.

The final topic of the session was the challenges related to software sustainability and citation, including the tension between the need for openness and reusability and a requirement for appropriate citation giving credit to contributors to every stage of the project. In order to ensure adequate monitoring of the project at every stage of its development, issues of Versioning, Sharing, Documentation, Licensing, Publishing (e.g. via Zenodo) need to be addressed.

Speakers for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

  • James Smithies

    Dr James Smithies is Director of King’s Digital Lab. He was previously Senior Lecturer in Digital Humanities and Associate Director of the UC CEISMIC Digital Archive at the University of Canterbury, New Zealand. He has worked in the government and commercial IT sectors as a technical writer and editor, business analyst, and project manager. His monograph The Digital Humanities and the Digital Modern was published by Palgrave Macmillan in 2017.

Data Sovereignty

Abstract

The session focused on the concept of indigenous data sovereignty and its application to DH projects that deal with artefacts, data, images, and concepts that may be considered sacred, secret, or culturally sensitive by certain groups of people. Data sovereignty typically refers to the understanding that data is subject to the laws of the nation in which it is stored, whereas indigenous data sovereignty is a concept that reflects a conviction that data is subject to the laws of the nation from which it was collected (i.e. of the source community). These issues become most poignant when dealing with cultural artefacts from non-Western cultures (such as Māori culture) or looted objects dispersed in Western collections and the dissemination of information about them. The tensions between Open Access principles and the requirements of Intellectual Property Rights/Management were highlighted. The session warned against presenting data without context, commercialisation without permission, and the use of data to the detriment of communities and advocated respect for and a need for consultation with the communities. The session formulated a set of questions to be considered when envisaging any DH project that may deal with data that is potentially sensitive for a group or groups of people, stressing the requirement for well-thought-through infrastructures for holding datasets as well as for consistent policies for data management and access.

This session by Samantha Callaghan was presented by Dr James Smithies.

Speakers for this session

  • Samantha Callaghan

    Samantha Callaghan is a Research Software Engineering Project Manager as well as the Georgian Papers Programme Metadata Analyst. As an RSE Project Manager, Samantha guides projects through KDL’s Software Development Life Cycle. She also undertakes analysis, decommissioning documentation and outreach work, particularly in relation to Indigenous DH and decolonising practice in the GLAM sector.

  • James Smithies

    Dr James Smithies is Director of King’s Digital Lab. He was previously Senior Lecturer in Digital Humanities and Associate Director of the UC CEISMIC Digital Archive at the University of Canterbury, New Zealand. He has worked in the government and commercial IT sectors as a technical writer and editor, business analyst, and project manager. His monograph The Digital Humanities and the Digital Modern was published by Palgrave Macmillan in 2017.

Community Pointers

Abstract

This final session of the workshop started with the presentation of UK and European DH infrastructures and organisations, a list of links to internationally recognised policies and best practices in DH, important peer-reviewed journals and mailing lists dedicated to diverse aspects of human and technical research infrastructures that may be of relevance to the MaDiH project and of interest to its members.

The second part of the session was dedicated to discussion and requirements elicitation for the MaDiH project.

Speakers for this session

  • Arianna Ciula

    Dr Arianna Ciula is Senior Software Analyst and Deputy Director of King's Digital Lab. Dr Ciula has broad experience in digital humanities research and teaching, research management, and digital research infrastructures. She holds a PhD in Manuscript and Book Studies (digital palaeography, University of Siena), an MA in Applied Computing in the Humanities (King’s College London) and a BA Hons in Communication sciences (computational linguistics, University of Siena).

  • James Smithies

    Dr James Smithies is Director of King’s Digital Lab. He was previously Senior Lecturer in Digital Humanities and Associate Director of the UC CEISMIC Digital Archive at the University of Canterbury, New Zealand. He has worked in the government and commercial IT sectors as a technical writer and editor, business analyst, and project manager. His monograph The Digital Humanities and the Digital Modern was published by Palgrave Macmillan in 2017.