Meta is hard and OpenAI, and the domestic "small model" official announces open source. Where is the "Hundred Models War" going?

  Since the beginning of this year, the global Internet giants have set off a "hundred-model war", and Microsoft, Google, Baidu and Ali have come to the end one after another. After more than half a year of competition, technology giants are welcoming a new round of road disputes around the big model ecology: facing the parameter "ceiling", will the future of the big model be closed or open?

  The open source model can run on a home computer.

  On August 3rd, two open source models, Qwen-7B and Qwen-7B-Chat, were put on the domestic AI developer community "ModelScope", which were Alibaba Cloud Tongyi Qianwen’s 7 billion parameter general model and dialogue model respectively. Both models were open source, free and commercially available.

  According to reports, Tongyi Qianwen Qwen-7B is a pedestal model that supports many languages such as Chinese and English, and it is trained on more than 2 trillion token (text unit) data sets, while Qwen-7B-Chat is a Chinese-English dialogue model based on the pedestal model, which has reached the cognitive level of human beings.In short, the former is like a "foundation" and the latter is a "house" on the foundation.

  The actual test shows that the comprehensive performance of Qwen-7B model is good. Among them, on the English proficiency evaluation benchmark MMLU, the score is generally higher than that of the mainstream models with the same parameter scale, even surpassing some models with 12 billion and 13 billion parameter scales. On the Chinese evaluation C-Eval verification set, the model also achieved the highest score of the same scale. Qwen-7B model is also among the best in evaluating GSM8K in mathematical problem solving ability and HumanEval in code ability.

  That is to say,In the tests of Chinese and English writing, solving mathematical problems and writing codes, Qwen-7B model is properly a "master of learning", and its score even exceeds the international mainstream model with the same parameter level.

  Besides, the industry is more concerned about the usability of Qwen-7B model. As we all know, the training and operation of mainstream large models need special AI training chips (such as NVIDIA A100), which are not only expensive, but also as high as 10,000 — per NVIDIA A100; 15,000 dollars, and it is monopolized by countries such as Europe and the United States, and it is almost impossible to buy it in China.The domestic Qwen-7B model supports the deployment of consumer graphics cards, which is equivalent to a high-performance home computer to run the model.

  Thanks to free commercialization and low threshold, the Qwen-7B model has been put on the shelves, which has attracted the attention of AI developers.In just one day, on the code hosting platform GitHub, the Qwen-7B model has been collected by more than a thousand developers, and most of the questioners are Chinese developers.As Alibaba Cloud said in the statement: "Compared with the lively AI open source ecology in the English-speaking world, the Chinese community lacks an excellent pedestal model. The addition of Tongyi Qianwen is expected to provide more choices for the open source community and promote the open source ecological construction of AI in China. "

  Open source or closed?

  In fact, Qwen-7B model is not the first big open source model. In fact, GPT-2, the predecessor of ChatGPT, is also completely open source. Its code and framework can be used for free on the Internet, and related papers can be consulted. However, after ChatGPT spread all over the world, OpenAI chose closed-source development, and the model codes such as GPT-3 and GPT-4 have become the trade secrets of OpenAI.

  The so-called open source is open source code.For example, once the big model is declared open source, anyone can publicly obtain the model source code, modify it or even redevelop it within the scope of copyright restrictions. To make a simple analogy,The source code is like the manuscript of a painting, and everyone can fill in the colors according to the manuscript to create their own artistic paintings.

  Closed source is just the opposite of open source.Only the source code owner (usually the software developer) has the power to modify the code, others can’t get the "manuscript" and can only buy the finished product from the software developer.

  The advantages and disadvantages of open source and closed source are very obvious. After open source, the big model will undoubtedly attract more developers, and the application of the big model will be more abundant, but the corresponding supervision and commercialization will become a difficult problem, which is prone to the embarrassing situation of "making wedding clothes for others".After all, open source considers ecological co-prosperity, and it is difficult to figure out the economic account of how much money can be earned at this stage, and these problems happen to be opportunities to close the source.

  Open source or closed source, this is a big model of life and death, the international giants have given the answer.

  Meta, the parent company of Facebook, released the big model Llama2 last month, which is open source and free for developers and business partners, while OpenAI firmly chose GPT-4 closed source development, which not only can maintain OpenAI’s leading position in the generative AI industry, but also can earn more revenue. According to the authoritative magazine Fast Company,OpenAI’s revenue in 2023 will reach 200 million US dollars, including providing API data interface services and subscription service fees for chat bots.

  Domestic big models have gradually begun to "go their separate ways".Alibaba Cloud’s General Meaning ModelAs early as April this year, it was announced to be open to enterprises, and the open source of Qwen-7B model will go further.ERNIE Bot of BaiduIt has also recently announced that it will gradually open the plug-in ecosystem to third-party developers to help developers build their own applications based on the Wenxin model.

  In contrast, Huawei does not take the usual path. When the Pangu Big Model 3.0 was released, Huawei Cloud publicly stated that,Pangu modelThe full stack technology is independently innovated by Huawei, and no open source technology is adopted. At the same time, Pangu Big Model will gather numerous industry big data (involving industry secrets, etc.), so Pangu Big Model will not be open source in the future.

  The big parameters are still small and beautiful.

  In addition, the open source of Qwen-7B model brings another thought:How many parameters do we need a big model?

  There is no denying that,The parameter scale of the large model is constantly expanding.Take the GPT model under OpenAI as an example. GPT-1 only contains 117 million parameters, and the parameters of GPT-3 have reached 175 billion, which has increased by more than 1000 times in a few years, while the parameters of GPT-4 have exceeded the trillion level.

  The same is true of large domestic models. Baidu Wenxin model has 260 billion parameters, Tencent mixed-element model has reached 100 billion parameters, Huawei Pangu model has been estimated to be close to GPT-3.5, and ali tong Yida model has officially announced 10 trillion parameters … …According to incomplete statistics, there are at least 79 large-scale models with over 1 billion parameters in China.

  Unfortunately, the larger the parameter, the stronger the capability of the large model. At the World Artificial Intelligence Conference, Wu Yunsheng, vice president of Tencent Cloud, has a very appropriate metaphor: "Just like athletes practicing physical strength, weightlifters need to lift 200 kilograms of barbells, and swimmers need to lift 100 kilograms. Different types of athletes don’t need everyone to practice 200 kilograms of barbells."

  As we all know,The higher the parameters of the large model, the more resources and costs are consumed.However, it is not necessary to blindly pursue "large scale" or "high parameters" to deepen the vertical large-scale model of the industry, but to formulate relevant model parameters according to customer needs. For example, the BioGPT-Large model has only 1.5 billion parameters, but its accuracy in biomedical professional tests is better than that of the general model with 100 billion parameters.

  Sam Altman, co-founder of OpenAI, also publicly stated that OpenAI is approaching the limit of LLM (Large Language Model) scale. The larger the scale, the better the model is, and the parameter scale is no longer an important indicator to measure the quality of the model.

  Wu Di, the head of intelligent algorithm in Volcano Engine, has a similar view. In the long run, reducing costs will become an important factor in the application of large models. "A well-tuned small and medium-sized model may perform as well as a general large model in a specific job, and the cost may be only one tenth of the original."

  At present, almost all domestic science and technology manufacturers have got tickets for big models, but the real road choice has just begun.