Tokyo Tech, Tohoku University, Fujitsu, and RIKEN collab to develop LLMs
Tokyo Institute of Technology (Tokyo Tech), Tohoku University, Fujitsu Limited, and RIKEN have announced that they will embark on the research and development of a distributed training of Large Language Models (LLMs) on supercomputer Fugaku in May 2023, within the scope of the initiatives for use of Fugaku defined by Japanese policy.
LLMs are AI models for deep learning that serve as the core of generative AI including ChatGPT. The four organisations aim to improve the environment for creating LLMs that can be widely used by academia and companies, contribute to improving the research capabilities of AI in Japan, and increase the value of utilising Fugaku in both academic and industrial fields by disclosing the results of this R&D in the future.
While many anticipate that LLMs and generative AI will play a fundamental role in as the research and development of technologies for security, the economy, and society overall, the advancement and refinement of these models will require high-performance computing resources that can efficiently process large amounts of data.
Tokyo Tech, Tohoku University, Fujitsu, and RIKEN are undertaking an initiative to this end that will focus on research and development toward distributed training of LLMs.
From 24th May 2023 to 31st March 2024 *Period of the initiative for use Fugaku for Japanese policies.
Roles of each organisation and company
The technology used in this initiative will allow the organisations to efficiently perform large-scale language model training on the large-scale parallel computing environment of the supercomputer Fugaku. The roles of each organisation and company are as follows:
- Tokyo Institute of Technology: Oversight of overall processes, parallelisation, and acceleration of LLMs
- Tohoku University: Collection of learning data, selection of models
- Fujitsu: Acceleration of LLMs
- RIKEN: Distributed parallelisation and accelerating communication of LLMs, acceleration of LLMs
To support Japanese researchers and engineers to develop LLMs in the future, the four organisations plan to publish the research results obtained through the scope of the initiatives for use of Fugaku defined by Japanese policy on GitHub and Hugging Face in fiscal 2024. It is also anticipated that many researchers and engineers will participate in the improvement of the basic model and new applied research to create efficient methods that lead to the next generation of innovative research and business results.
The four organisations will additionally consider collaborations with Nagoya University, which develops data generation and learning methods for multimodal applications in industrial fields such as manufacturing, and CyberAgent, which provides data and technology for building LLMs.
Comment from Toshio Endo, Professor, Global Scientific Information and Computing Centre, Tokyo Institute of Technology: "The collaboration will integrate parallelisation and acceleration of large-scale language models using the supercomputer ‘Fugaku’ by Tokyo Tech and RIKEN, Fujitsu's development of high-performance computing infrastructure software for Fugaku and performance tuning of AI models, and Tohoku University's natural language processing technology. In collaboration with Fujitsu, we will also utilise the small research lab we established under the name of ‘Fujitsu Collaborative Research Centre for Next Generation Computing Infrastructure’ in 202X. We look forward to working together with our colleagues to contribute to the improvement of Japan's AI research capabilities, taking advantage of the large-scale distributed deep learning capabilities offered by ‘Fugaku.’"
Comment from Kentaro Inui, Professor, Graduate School of Information Sciences, Tohoku University: "We aim to build a large-scale language model that is open-source, available for commercial use, and primarily based on Japanese data, with transparency in its training data. By enabling traceability of the learning data, we anticipate that this will facilitate research robust enough to scientifically verify issues related to the black box problem, bias, misinformation, and so-called "hallucination" phenomena common to AI. Leveraging the insights gained from deep learning from Japanese natural language processing developed at Tohoku University, we will construct large-scale models. We look forward to contributing to the enhancement of AI research capabilities in our country and beyond, sharing the results of the research we obtain through the initiative for researchers and developers."
Comment from Seishi Okamoto, EVP, Head of Fujitsu Research, Fujitsu Limited: "We are excited for the chance to leverage the powerful, parallel computing resources of the supercomputer Fugaku to supercharge research into AI and advance research and development of LLMS. Going forward, we aim to incorporate the fruits of this research into Fujitsu's new AI Platform, codenamed "Kozuchi," to deliver paradigm-shifting applications that contribute to the realisation of a sustainable society."
Comment from Satoshi Matsuoka, Director, RIKEN Center for Computational Science: "The A64FX CPU is equipped with an AI acceleration function known as SVE.
Software development and optimisation are essential to maximise its capabilities and to utilise it for AI applications, however. We feel that this joint research will play an important role in bringing together experts of LLMs and computer science in Japan, including RIKEN R-CCS researchers and engineers, to advance techniques for building LLMs on the supercomputer "Fugaku". Together with our collaborators, we contribute to the realisation of Society 5.0."