Development and validation of the educational leadership scale for mental health providers: A network analysis approach

Table of Contents

Design

A descriptive cross-sectional investigation was conducted to develop and validate an educational leadership scale for mental health providers called the ELSM. The current study included three independent phases: development of items, development of the scale, and finally, evaluation of the psychometric properties. Different guidelines and recommendations were followed throughout the scale development process (Boateng et al., 2018; Clark & Watson, 2019; Teresi et al., 2022). The current study included three phases, two of which were unrelated to working with subjects and focused instead on a literature review and sessions with colleagues. After submitting the proposal, the Department of Education and Psychology (the affiliation of the author) approved the main procedures of the study. Accordingly, from October 2023 to December 2023, Phase 1 was conducted, which involved a literature review and item development. From December 2023 to February 2024, the second phase (scale item selection) was conducted. Finally, after receiving ethical approval, Phase 3, which involved data collection, commenced. This research was reviewed and approved by the Research Ethics Committee at Najran University, Kingdom of Saudi Arabia (reference number 202403-076-019341-043956). After receiving the ethical code in March 2024, data collection took place between March 2024 and May 2024.

Phase 1: Item Development

The authors conducted a comprehensive review of established educational leadership theories to initiate the item development process. Subsequently, the items were synthesized, and the initial version of the scale underwent a Delphi analysis. The following section elucidates each step of Phase 1 (Item Development).

Literature review for item development

First, a careful review of published papers in Web of Science, CINAHL, Scopus, and PubMed was conducted. Search strategies combining “Educational Leadership,” “Leadership in Mental Health,” “Mental Health Providers,” “Leadership Styles,” “Leadership Practices,” “Mental Health Education,” “Leadership Competencies,” and variations thereof were employed. The authors found no scales for educational leadership in mental health providers. The Principal Instructional Leadership Scale and the Teacher Leadership Style Scale were identified as the closest available scales (Lai & Lien, 2023; Tsai, 2017). Concurrently, a qualitative investigation was carried out involving 12 mental health providers from various regions of Saudi Arabia, selected through convenience sampling. The aim was to delve into dimensions and themes of educational leadership to discern associated factors. The sample size was determined based on data saturation. Interviews were conducted individually in a private setting, following a semi-structured format. The interviews were conducted by a team of three highly qualified clinical psychologists. Each interviewer holds a Ph.D. in Clinical Psychology and possesses extensive experience in qualitative research methods. Specifically, each psychologist has over ten years of experience conducting qualitative research, ensuring high expertize and reliability in the data collection process. Their qualifications and substantial experience in the field significantly contributed to the depth and quality of the interviews conducted for this study.

Interview for item development

The interview guide was meticulously crafted to capture discourse surrounding leadership experiences, perceived challenges, and perspectives. Interview durations ranged from 60 to 90 minutes, all meticulously recorded and transcribed verbatim. The interviews were administered by a seasoned researcher proficient in conducting in-depth interviews. As per established guidelines, data collection ceased upon reaching saturation (Green & Thorogood, 2018). The gathered data was analyzed using Braun and Clarke’s six-phase framework. Rigor was ensured through credibility, transferability, reliability, and confirmability (Braun & Clarke, 2006; Lincoln & Guba, 1985). The results obtained were consistent with the elements of leadership outlined in the theoretical framework and were realized as follows: commitment to professionalism and ethical conduct, support for the learning environment, advocacy for continuous learning and development, adherence to educational standards and best practices, collaboration and networking, customization and adaptation of education, and evaluation and assessment.

Item Composition

The first version of the Educational Leadership Scale for Mental Health Providers (ELSM) was developed after synthesizing findings from the literature review and qualitative interviews. Thirty-five initial items were drafted to encompass the various dimensions of educational leadership relevant to mental health providers. These dimensions included ethical leadership, advocacy for continuous learning, support for learning environments, and evidence-based practices, as identified by analyzing interview themes and existing educational leadership frameworks. The process ensured that the items were both theoretically grounded and reflective of the practical experiences of mental health providers. Each item was designed to align with specific leadership dimensions to create a comprehensive tool for assessing leadership in mental health education. The resulting items formed the basis for further refinement and validation through expert consultation and psychometric evaluation.

Delphi Study

The initial version of the ELSM underwent a meticulous process involving two rounds of a Delphi study, a method renowned for achieving expert consensus in complex decision-making. Ten specialists from diverse fields, such as educational psychology, industrial and organizational psychology, psychiatry, and human resources, were solicited for their valuable insights. These experts, boasting an average of 7.2 years of professional experience, engaged in an online questionnaire exercise utilizing a Likert scale ranging from 1 to 4, where 1 represented disagreement and 4 denoted complete agreement. During this assessment, they meticulously scrutinized each item on relevance, pertinence, clarity, and completeness parameters. Moreover, they were actively encouraged to offer constructive suggestions for enhancement. The evaluation process prioritized consensus among the experts, with scores of three or four signifying agreement. A stringent minimum consensus threshold of 80% was established, ensuring robustness in the selection process. Items that failed to meet this criterion, rejected by more than one of the ten reviewers, underwent careful review, and either were discarded or subjected to revisions. Following this rigorous process, out of the initial set of 35 items, 23 emerged as resilient candidates after the second round. This outcome reflected a remarkable consensus rate of 90.17%, as quantified by the Content Validity Index (CVI), affirming the credibility and relevance of the retained items.

Phase 2: Scale Development

The initial 35-item scale underwent a Delphi study involving ten experts from fields including educational psychology, industrial and organizational psychology, psychiatry, and human resources. The experts, averaging 7.2 years of professional experience, evaluated the items using a Likert scale from 1 (disagreement) to 4 (complete agreement). Items were assessed for relevance, clarity, and completeness. A consensus threshold of 80% agreement was set, and items not meeting this criterion were revised or discarded. After two rounds, 23 items were retained with a consensus rate of 90.17%, as determined by the Content Validity Index (CVI). The sample size was determined based on data saturation for qualitative interviews and statistical requirements for pilot testing and validation phases. 46 participants provided sufficient data for the pilot test to evaluate the scale’s reliability and factor structure. The final validation phase included a larger sample to ensure robust statistical analysis and generalizability of the results.

Phase 3: Scale Evaluation

Sample and Setting

Participants were recruited from mental health professionals (including psychologists, psychotherapists, psychiatric nurses, counselors, social workers, and other allied health professionals specializing in mental health care) in 2024. The inclusion criteria include (1) residency in Saudi Arabia, (2) being employed as a mental health provider, (3) a minimum of two years of experience working in mental health provision, (4) willingness to provide informed consent for participation in the study, and finally, (5) active involvement in direct patient care or clinical practice within mental health settings.

Data Collection

The first author, the primary investigator, contacted participating mental health centers across Saudi Arabia. Collaborative efforts were made to identify and recruit eligible mental health providers meeting the inclusion criteria outlined previously. Once institutional approval and ethical clearance were obtained, recruitment materials, including study information and consent forms, were disseminated to the identified centers. Prospective participants were provided with detailed information about the study objectives, procedures, and their rights as participants. They were also assured about the privacy and confidentiality of their data. Interested individuals were invited to participate voluntarily and were asked to provide informed consent before collecting data. Before completing the Educational Leadership Scale for Mental Health Providers (ELSM), participants were asked to complete a demographic checklist to provide background information. Data collection methods included both online and in-person approaches, such as electronic surveys, virtual interviews, or face-to-face interviews, depending on the preferences and availability of the participants. From January 2024 to April 2024, a list of potential participants was received, and the pilot study started. Then, in March 2024, the primary data collection was started. During the recording of participants’ responses, the data wastable gathered, recorded, and securely stored for subsequent analysis.

Measures

The Educational Leadership Scale for Mental Health Providers (ELSM)

The ELSM is a 23-item self-report questionnaire developed throughout this paper. In fact, the main aim of the current study was the development of the ELSM. Accordingly, Figure S1, presented associations between items by Person correlation.

Data Analysis

All data analyzes were conducted using R Studio software (version 4.2.1). Two-tailed tests were applied for all statistical tests, with the significance level set at α = 0.05. Missing data were handled via listwise deletion, leading to varying degrees of freedom corresponding to the available sample sizes. The dataset was randomly divided into two subsamples using the random.org tool to perform Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA), denoted as S1 and S2, respectively.

For the EFA conducted on subsample S1, the “EFAtools” package in R was employed (Steiner & Grieder, 2020). Data adequacy was evaluated using the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity. A KMO value greater than 0.70, along with a significant Bartlett’s test, indicated the suitability of the data for factor analysis. Promax oblique rotation and principal component analysis were used. Factors with eigenvalues greater than one were retained without pre-specifying the number of factors. Factor loadings exceeding 0.30 were considered significant. Items were expected to load onto their respective factors; loadings below 0.30 on all factors or greater than 0.30 on non-target factors indicated poor fit (Shrestha, 2021).

Next, a CFA was conducted on subsample S2 using R Studio to assess construct validity based on the EFA results. Model fit was evaluated using several indices, including standardized root mean square residual (SRMR), goodness-of-fit index (GFI), the ratio of chi-square to degrees of freedom (χ²/df), comparative fit index (CFI), and root mean square error of approximation (RMSEA). Acceptable model fit was defined by GFI and CFI values above 0.9, RMSEA below 0.08, χ²/df below 3, and SRMR below 0.10 (Kordbagheri et al., 2024a).

Exploratory Graph Analysis (EGA) was then performed using the “EGAnet” package in R (Golino, Christensen, et al., 2020). Redundancy analysis was first conducted to identify local dependencies, particularly those caused by highly correlated items due to wording effects. In contrast to traditional factor models, which assume that questionnaire items measure a shared latent variable, network models treat items as causally independent (Christensen et al., 2023). Identifying and resolving redundancies, such as local dependencies, is crucial as they can distort dimensionality estimations.

A network was constructed using the Graphical Least Absolute Shrinkage and Selection Operator (GLASSO) method (Moriana et al., 2022; Abdelrahman et al., 2024). GLASSO produced a regularized partial correlation matrix, modeled as a Gaussian Graphical Model where nodes represent items and edges denote partial correlations. The walktrap algorithm (Golino & Epskamp, 2017) was applied to identify communities of highly correlated items. To ensure the robustness of EGA results, non-parametric bootstrapping with 1000 iterations was performed, as recommended by Christensen et al. (2023).

For reliability assessment, statistical analyzes were carried out in R Studio. Internal consistency was evaluated using the “psych” package in R (Revelle, Revelle (2015)), which computes Cronbach’s alpha—a measure of the average correlation between all possible item combinations within a scale. Test-retest reliability was assessed by administering the same measure to the same group of participants on two occasions. Correlation analysis was used to calculate the correlation coefficient between scores from the two-time points (Kordbagheri et al., 2024b), reflecting the stability of the instrument over time.

Additionally, centrality metrics were computed to determine the importance of specific nodes in the network. Centrality measures, including Betweenness, Closeness, and Strength, are frequently used in behavioral sciences to assess the relative importance of nodes. These measures provide insights into structural influence, visibility, or status within the network, depending on the relationship between a given node and other nodes (Bringmann et al., 2019; Davoudi et al., 2024; Yan & Ding, 2009).

link