Thematic atlas of Italian oncological research: the analysis of public IRCCS

This paper has been developed in the frame of the research project “V:ALERE 2019” focused on Italian public-owned Academic Medical Centers (AMCs that is 16 public AMCs as “Aziende Ospedaliere Universitarie”, 9 public AMCs as “Ex Policlinici Universitari a gestione diretta”, 21 public-owned “Istituti di Ricovero e Cura a Carattere Scientifico” (IRCCS) (Ministry of Health http://www.salute.gov.it/, 2018)). These institutions have a triple mission: research, teaching, and care, having an enormous impact on society and the nation’s health. The main aim of the project is to provide new evidences and proposals to support and advise Italian public AMCs in their quest to address their challenges. In recent years, there is increasing recognition of the potential value of research evidence as one of the many factors considered by policymakers and practitioners. Even more, in the case of medical science, the analysis of research and its impact is indispensable, in light of its implications for public health. The starting point for mapping a research area is to review the related scientific literature by synthesizing past research findings and then, effectively use the existing knowledge base and advanced lines of future researches. In this sense, bibliometrics becomes useful, by introducing a systematic, transparent, and reproducible review process based on the statistical measurement of science, scientists, or scientific activity (Cuccurullo et al., 2016). Many research areas use bibliometric methods to explore the impact of their field, the impact of a set of researchers, the impact of a particular paper, journals taken as a reference by researchers, the input knowledge, research gaps, trends, and future opportunities (Zaho, 2010). Performance analysis and science mapping (Noyons et al., 1999) are the two main bibliometric approaches for investigating a research area. In this work, we focus on science mapping as it allows identifying and displaying themes and trends with a synchronic (Callon et al., 1983) or a diachronic perspective (Cobo et al., 2011). By means of science mapping techniques, namely the term co-occurrence networks, and strategic/thematic maps, we aim at providing a data visualization of strategic positioning of the different Italian public AMCs in terms of their research positioning. In particular, we identify the research-front of different AMCs and then, we visualize them in a joint representation, useful for comparing their main research themes and at the same time their different specializations, by considering also their evolution during the years. Mapping the dynamic positioning of Italian medical research at various levels (i.e. national, regional, AMCs type, AMC) will provide a conceptual framework for policymakers and managers to understand and manage the problems of the AMCs (e.g. appropriate funding mechanisms for financing the triple-mission). Moreover, this tool could be useful for the institutions themselves to direct their research efforts towards increasingly innovative fronts taking into account the general landscape and at the same time exploiting this information to establish collaborations with other AMCs dealing with the same research topics. Here, the effectiveness of our strategy is showed by considering the scientific production of the last 20 years of IRCCSs specialized in the oncology research.


Introduction
This paper has been developed in the frame of the research project "V:ALERE 2019" focused on Italian public-owned Academic Medical Centers (AMCs -that is 16 public AMCs as "Aziende Ospedaliere Universitarie", 9 public AMCs as "Ex Policlinici Universitari a gestione diretta", 21 public-owned "Istituti di Ricovero e Cura a Carattere Scientifico" (IRCCS) (Ministry of Health -http://www.salute.gov.it/, 2018)). These institutions have a triple mission: research, teaching, and care, having an enormous impact on society and the nation's health.
The main aim of the project is to provide new evidences and proposals to support and advise Italian public AMCs in their quest to address their challenges. In recent years, there is increasing recognition of the potential value of research evidence as one of the many factors considered by policymakers and practitioners. Even more, in the case of medical science, the analysis of research and its impact is indispensable, in light of its implications for public health.
The starting point for mapping a research area is to review the related scientific literature by synthesizing past research findings and then, effectively use the existing knowledge base and advanced lines of future researches. In this sense, bibliometrics becomes useful, by introducing a systematic, transparent, and reproducible review process based on the statistical measurement of science, scientists, or scientific activity (Cuccurullo et al., 2016). Many research areas use bibliometric methods to explore the impact of their field, the impact of a set of researchers, the impact of a particular paper, journals taken as a reference by researchers, the input knowledge, research gaps, trends, and future opportunities (Zaho, 2010). Performance analysis and science mapping (Noyons et al., 1999) are the two main bibliometric approaches for investigating a research area.
In this work, we focus on science mapping as it allows identifying and displaying themes and trends with a synchronic (Callon et al., 1983) or a diachronic perspective (Cobo et al., 2011). By means of science mapping techniques, namely the term co-occurrence networks, and strategic/thematic maps, we aim at providing a data visualization of strategic positioning of the different Italian public AMCs in terms of their research positioning.
In particular, we identify the research-front of different AMCs and then, we visualize them in a joint representation, useful for comparing their main research themes and at the same time their different specializations, by considering also their evolution during the years.
Mapping the dynamic positioning of Italian medical research at various levels (i.e. national, regional, AMCs type, AMC) will provide a conceptual framework for policymakers and managers to understand and manage the problems of the AMCs (e.g. appropriate funding mechanisms for financing the triple-mission). Moreover, this tool could be useful for the institutions themselves to direct their research efforts towards increasingly innovative fronts taking into account the general landscape and at the same time exploiting this information to establish collaborations with other AMCs dealing with the same research topics.
Here, the effectiveness of our strategy is showed by considering the scientific production of the last 20 years of IRCCSs specialized in the oncology research.

97
Thematic atlas of Italian oncological research: the analysis of public IRCCS

Data and methodology
IRCCSs are Italian healthcare organizations of relevant national interest that drive clinical assistance in strong relation to research activities. Their mission is the continuous upgrade of healthcare. The IRCCS title is granted by the Italian Ministry of Health to a very limited number of institutes throughout the nation, and their activities are federally regulated by Legislative Decree 288/2003. They are committed to being a benchmark for the whole public health system for both the quality of patient care and the innovation skills in the field of the organization. The activity of IRCCSs relates to well-defined research areas whether they received recognition for a single subject (monothematic IRCCS) or for multiple integrated biomedical areas (polythematic IRCCS).
Among the 21 public IRCCSs in Italy, we considered the nine institutions specialized in the oncology research area (6 monothematic and 3 polythematic IRCCSs).
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) was used for the selection process of the publications (Liberati et al., 2009). We retrieved on Web of Science (WoS) indexing database -launched by the Institute for Scientific Information (ISI) and now maintained by Clarivate Analytics -all the publications from January 2000 to December 2019. To identify the publications related to each IRCCS, we searched by full name, part of the organization name's or by its commonly known abbreviation from the Organizations -Enhanced List available on WoS (e.g. "IRCCS FND MILANO" for the Fondazione IRCCS Istituto Nazionale Tumori Milano). We limit our search by document type and selected only Articles, Proceedings Papers, Review Articles, and Book Chapters in the English language. The records were exported into PlainText format.
Starting from our final collection, we loaded the data and converted it into R data frame using bibliometrix, an open-source tool for quantitative research in scientometrics and bibliometrics that includes all the main methods for performance analysis and science mapping (Aria and Cuccurullo, 2017).
In this preprocessing phase, for the polythematic IRCCSs (Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Istituto Nazionale Tumori Regina Elena (IRE), IRCCS Ospedale Policlinico San Martino) we considered only the publications dealing with oncological topics, by filtering the records with respect to the metadata "Research Areas" (SC) included in WoS.
In order to consider the publications that have a major impact in the field of oncological research, we calculated the normalized citation score (NCS), one of the most frequently used field-normalized indicators (Bornmann and Haunschild, 2016). It has been calculated by dividing the citation count of a focal paper by the average citation count of the papers published in the same field (and publication year). The normalization procedure is based on all articles published within one year (and must be repeated for publications from other years).
The citation count of the article is divided by the average number of citations in the field of the article, yielding the normalized citation score for the paper. The overall normalized citation impact of each IRCCS can be analyzed on the basis of the mean value over the publication set. This results in the mean NCS (MNCS) for the paper set. In the end, following the percentile approach, we performed our analysis only on the publications with an MNCS greater than 75% (the top 25% publications).
To map the conceptual structure of each IRCCS we conducted two related analyses: a term co-occurrence network analysis and a strategic or thematic map. The combined use of these techniques allows us to illustrate: how terms relate to each other, the main research themes within each institution, and how they develop.
The basic idea behind the term co-occurrence network analysis (Wang et al., 2018) is that each research field or topic can be represented as a set of terms (e.g. keywords, terms extracted from titles, or abstracts). Network representation is used to understand the themes 98 110 covered by a research field, to define which are the most important and the most recent ones; i.e., the research front. Following the network approach, we built a term co-occurrence matrix, in which each cell outside the principal diagonal contains the number of times two terms appear together in the articles (co-occur). Then, the co-occurrences among terms were normalized by the association index as proposed by Van Eck and Waltman (2009). This measure assumes values in the interval [0,1] and reflects the strength of the association among terms. Co-occurrence matrices can be seen as undirected weighted graphs; therefore, we can build a network in which each term is a node and the association between linked terms is expressed as an edge, visualizing both single terms and subsets of terms frequently cooccurring together. To detect subgroups of strongly linked terms, where each subgroup corresponds to a center of interest or to a theme of the analyzed collection, we refer to community detection algorithms (Fortunato, 2010). Here, to this end, we carried out a community detection procedure by using Louvain algorithm (Blondel et al., 2008).
Strategic or Thematic map (Cobo et al., 2011) allows plotting the themes, identified through community detection, in a bi-dimensional matrix where axes are functions of the Callon centrality and density, respectively (Callon et al., 1983). Centrality can be read as the importance of the theme in the research field; while density can be read as a measure of the theme's development.
In this way, we identified the conceptual structure of each IRCCS in the three different considered time slices. Then, we standardized centrality and density values, in order to make a comparison among the research fronts of the different institutions by plotting themes in a joint map. As in classical analysis, the obtained strategic map allows defining four typologies of themes (Cahlik, 2000) according to the quadrant in which they are placed. Themes in the upper-right quadrant are known as the motor themes. They are characterized by both high centrality and density. This means that they are both developed and important for the research field. Themes in the upper-left quadrant are known as isolated themes or niche themes. They have well developed internal links (high density) but unimportant external links and so are of only limited importance for the field (low centrality). Themes in the lower-left quadrant are known as emerging or declining themes. They have both low centrality and density meaning that are weakly developed or marginal. Themes in the lower-right quadrant are known as basic and transversal themes. They are characterized by high centrality and low density. These themes are important for a research field and concern general topics transversal to the different research areas of the field. In each temporal interval, we considered the KeyWords Plus (ID) used in the different documents. The ID are words or phrases that frequently appear in the titles of an article's references but do not appear in the title of the publication itself.
Their generation is based upon a special algorithm (Garfield, 1990) that is unique to Clarivate Analytics databases.

Main results
To highlight the main research themes of oncological IRCCSs and evaluating their evolution over time, we decided to divide our timespan (2000-2019) into three-time slices.
In Table 1 the distribution of the selected publications per IRCCS in the three different periods is reported. The scientific production of institutions has increased over time. The production is constant in the three-time slices for two IRCCSs (i.e. IRCCS Ospedale San Martino and Istituto Nazionale Tumori Regina Elena (IRE) IRCCS). However, some IRCCSs produced a great number of publications in the third period with respect to the previous ones (e.g. Istituto Tumori Bari "Giovanni Paolo II" IRCCS and IRCCS Centro di Riferimento Oncologico della Basilicata (CROB)). In Figure 1 the thematic Atlas of IRCCSs' oncological research is shown. It is worth noting that each theme, identified with the community detection, is labelled with the corresponding most frequent ID.
In the three-time slices, the production of IRCCSs is rich but they have three main themes in common: expression, survival, and chemiotherapy. In the first time slice (2000 -2006) expression was a basic theme for many IRCCSs and only for IRE RO was a motor theme. The position of this theme changes over the years. In the second time slice (2007 -2013) expression becomes a motor theme -high density and high centrality -for many IRCCSs and starting to shift from the upper-right quadrant to the lower-right quadrant in the third slice (2014 -2019), consolidating its role as traditional theme -low density and high centrality. Since 2007 studies focus on survival that appeared as an emerging theme on the lower-left quadrant -low density and low centrality. In the third period, survival becomes a traditional theme, indicating great interest in the health care of patients by many IRCCSs.
Chemiotherapy is also a theme treated by many IRCCSs over time, always positioned to the right of the map -high centrality -in the three-time slice. From the second to the third period the chemiotherapy theme shift from the upper-right quadrant to the lower-right quadrant, becoming a basic theme. On the upper-left quadrant, we have observed that niche themes -low centrality and high density -have increased over time. This means that the oncological research of IRCCSs is oriented towards studies more and more specialized from 2000 to 2019.

Conclusion and future developments
In this paper, we propose to jointly represent the dynamic research positioning of the different Italian public IRCCSs specialized in Oncology. These graphical representations summarize many aspects of the cancer research landscape in Italy. Obviously, the presented results are only a small part of what could be observed starting from the thematic maps. Therefore, they are powerful decision support tools for the different agents involved in the health system. However, it is important to highlight that this approach could be used for different purposes in a more general bibliometric framework (e.g. comparison of topics covered by different sources, by different countries, or as in this case by different institutions).
On the one hand, future developments will be devoted to extending our analysis to the other Italian AMCs in order to completely mapping their research positioning; on the other hand working on the graphical representations to improve the readability of the results.