Predicting your future h-index
Study aims to forecast success.
23 September 2020
Collaborating with leading researchers and having co-authors from diverse academic backgrounds are key factors in building an individual’s h-index, a metric used to gauge a researcher’s academic success, a new study has found.
Analyzing a researcher’s scientific impact through the lens of their h-index is common practice for assessing funding applications and identifying mentors and collaborators. The study authors set out to find which factors most influence a scientist’s future scholarly achievements, judged by their h-index over five, eight and ten years.
The h-index metric is widely used to measure a researcher’s productivity and impact, based on their article output and citations over time. It was invented in 2005 by University of California physicist Jorge Hirsch, as an alternative to raw citation counts, to recognize the cumulative achievements of authors with good citation levels over a range of papers, rather than “one-hit wonders.”
Including highly-cited references, publishing in journals with a high impact factor, and joining a renowned institution were some of the other factors identified in the study that influenced an author’s future h-index.
The six study authors, all of them data scientists based in China, the United States and Australia, used big data and machine learning techniques to analyze two data sets; one related to nearly 80,000 computer scientists and the 105,000 articles they published, and the other of just over 80,000 physicists with 98,000 articles.
“We tried to cover as many as possible causal factors to predict a scholar’s scientific success measured in his/her future h-index,” says the lead author Kong Xiangjie, a computer scientist at Zhejiang University of Technology in Hangzhou, China.
Predicting future h-index
The study, which was published in May 2020 in ACM (Association for Computing Machinery) Transactions on Knowledge Discovery from Data (TKDD), identified 35 causal factors in five broad categories associated with a scientist's future h-index:
author-centered, based on factors such as the h-index of a scientist’s co-authors, the maximum h-index value of his/her co-authors and the differences between the maximum and the lowest h-index value of their co-authors;
article-centered, with factors covering a scientist’s citation counts and publication numbers, the frequency with which their topic is covered in the literature, and the highest and the lowest citation numbers of the references in their publications;
venue-centered, referring to factors concerning the journals in which a scientist publishes;
institution-centered, looking at a scientist’s colleagues in terms of their h-index, their number of publications and their citation counts, as well as the distribution of these indicators among all scientists at the same institution; and
temporal factors, including a scientist's academic age and the change in their h-index over time.
The study found author-centered and article-centered factors had the strongest bearing on a scholar’s future h-index. To confirm their empirical results, the study authors used machine learning to model a scholar’s potential evolution over time. From given starting conditions, variations were introduced with the addition or removal of conditions.
The aim was to compare predictive results with actual values in scholars’ careers and to verify causal relationships.
According to the study, article-centered factors in the computer scientists dataset have 41.47% importance, author-centered factors have 25% importance, the temporal-centered factors have 16.67% importance, and the venue and institution-centered factors have 8.33% importance.
In the physicists dataset, article-centered factors have 33.42% importance, author-centered factors 42.33% temporal-centered factors 6.39% importance, venue-centered 8.5% importance, and institution-centered factors have 9.36% importance.
They also found that the h-indeces of scholars in the same institutions tend to be very close to each other.
Success factor warning
Liu Yuxian, a librarian at Tongji University, Shanghai, welcomed the study, but questioned the characterization of some factors as both causing success and indicative of it.
For example, an author's citation and publication numbers, identified as article-centered causal factors, are also components of the h-index used as the indicator of scientific success. She also warned against any use of the h-index in isolation to gauge success.
Zhao Rongying, an informatics researcher in the Research Center for Chinese Science Evaluation at Wuhan University, praised the study's originality, but questioned the role assigned to venue-centred (journal) factors in as contributors to an individual’s future success.
Kong acknowledged the dual use of article and author elements for both cause and effect, but argued that the machine learning technique confirmed the early role of citation and publication numbers in a scholar’s eventual h-index success, with other factors coming into play along the way.
"As for h-index, we know its limitations (in measuring one's academic success), but currently, there is no ideal, widely-accepted alternative," he told Nature Index.
Hao Yufeng, a physicist at Nanjing University, said that some factors the TKDD study revealed are instrumental in guiding young scientists' career development.
"Collaborating with both top scientists and junior ones such as doctoral students, pursuing hot topics, and co-authoring with others with multidisciplinary backgrounds, these are very reasonable strategies for future success, particularly to those young PIs (primary investigators).”