Stanford Research Center Studies Impacts of Popular Pretrained Models

Stanford University recently announced a new research center, the Center for Research on Foundation Models (CRFM), devoted to studying the effects of large pretrained deep networks (e.g. BERT, GPT-3, CLIP) in use by a surge of machine-learning research institutions and startups.

As a multi-disciplinary research center, it includes 32 faculty members from computer science, law, psychology, and political science departments. The main goal of CRFM is to initiate studies of such foundation models and to develop new strategies for the future of responsible machine learning.

Along wIth the announcement, the CRFM team also published an in-depth report describing the pros and cons of using foundation models as backbone deep networks for large-scale applications such as image and natural language understanding. These downstream applications are created by fine-tuning the base network’s weights. Foundation models are trained by self-supervision at a massive scale, mostly using open data from different sources and deployed as few-shot learners.

The paper states that this situation creates homogeneity as applications employ the same base models. Although the use of homogeneous high-capacity networks simplifies fine-tuning, the homogeneity carries potential dangers such as ethical and social inequalities to all downstream tasks. The paper emphasizes that fairness studies of such models deserve a special multi-disciplinary effort.

Another issue the report covers is the loss of accessibility. In the last decade, the deep-learning research community has favored open source as it leads to improved reproducibility and fast-paced development while propagating novel ideas. Open-source deep-network development frameworks such as Caffe, Tensorflow, Pytorch, and MXNet have had a major impact in popularizing and democratizing deep learning. However, as deep-network size goes well beyond a billion parameters, industry-led research code repositories and datasets are kept private (e.g. GPT-2) or commercialized by API endpoints (e.g. GPT-3). CRFM researchers underline the dangers of this barrier and point to the importance of government funding for possible resolution.

As applications of deep networks increase, deep learning understanding and theory research is gaining attention. Direct usage of deep networks without proper analysis has previously triggered discussions in machine learning conferences. Deep neural networks consist of cascaded nonlinear functions that limit their interpretability. The main problem is the mathematical difficulties when analyzing such cascaded functions, hence most of the research works have focused on the analysis of simpler fully connected models. CRFM aims to go beyond simplified models and proposes practical ideas for the commonly used pretrained networks.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter