GPT-4 shocked everyone, Chinese entrepreneurs fought fiercely for “small models”

Author: Zhou Xinyu

Image source: Generated by Unbounded Layout AI

Just over three months after the release of ChatGPT, OpenAI personally added firewood to this big model craze.

In the early morning of March 15, Beijing time, OpenAI announced the birth of the multi-modal large model GPT-4 on its official website. In addition to optimizing the performance of the input mode and text length that the model can support, OpenAI has upgraded ChatGPT on the basis of GPT-4, and opened the API in one fell swoop-the speed of iteration is staggering.

In this frenzy of big models, numbers are already numb. The first is the number of parameters of the model – previously, OpenAI used GPT-3 (with a parameter volume of 175 billion) to scale the parameters of the large model to hundreds of billions of levels, but soon, the multi-modality model launched by Google on March 6 The large model PalM-E took the position of “the largest visual language model in history” with 562 billion parameters.

The second is the company’s skyrocketing valuation. According to a report by Dealroom, a global early-stage project data service provider, the valuation of global generative AI companies has reached a total of about 48 billion US dollars, which has increased by 6 times in 2 years.

The domestic AI track heats up late, but the soaring speed of corporate valuations is even faster than that——Wang Huiwen’s AI company “Light Years Away” has an angel round valuation of 200 million US dollars. A large-scale model company recently founded by a certain technology giant, the model demo has not yet been seen, and the valuation of the angel round has also made it enter the billion-dollar club-and in the tuyere of the Metaverse, a million dollars seems to have already It is the valuation ceiling of domestic start-up angel rounds.

There are also some entangled and negative voices emerging from the outlet.

On the evening of March 2, a post titled “Why do you think AI in Europe and the United States are stronger than ours” caused a lot of controversy. The poster compared the AI ​​development environment in China and the United States, and regarded the development of AI in Europe and the United States as an “elite education” that suffers from painstaking efforts, while in China it is a “utilitarian education” that focuses on commercialization, and finally came to a slightly desperate conclusion: human Fate is sealed in the womb, and robots are not immune.

A post with the theme “Why do you feel that the AI ​​in Europe and the United States is stronger than ours?”Source: Weibo @陈怡RAN-Duke University, the post was reproduced by him

At present, the violent aesthetics of large models may not be the best choice for most companies to go all out. Computing power, high-quality data, and high-density algorithmic talents are all expensive tickets required to enter the large-scale poker table. Most domestic players cannot have the same reserves as OpenAI overnight.

However, rich data dimensions and broad application scenarios are the rich mines left to Chinese entrepreneurs by the last wave of the Internet that lasted for more than 10 years. In the past month, many small companies with scenarios and user data have trained small models that are suitable for their own business based on the base of large models at home and abroad. And a company with a large model reserve of tens of billions of parameters has also “slimmed down” by itself, launching a lightweight model for a new round of data storage for finance, advertising and other fields.

At present, using small models to sharpen the sharp edge of algorithms and prepare technical reserves for the development of large models may be a way for Chinese entrepreneurs to achieve cornering overtaking in the future.

“Generalist” large model vs “specialist” small model

How to make AI smarter and more human-like is essentially an educational issue.

For a long time before that, people were keen to send AI to “technical colleges” to learn the ability to solve specific problems-small models with parameters often less than a million were born. For example, DeepMind, an AI company under Google, allowed AlphaGO to learn the chess-playing steps of millions of human professional players, and finally defeated the famous Go player Lee Sedol with a score of 4:1 in 2016.

However, the disadvantages of specialist education are also obvious. Most of the small models have the problem of partial discipline. For example, when faced with writing marketing copy, a small model that is good at image generation hit a wall. At the same time, the educational resources of specialists are decentralized, and each small model needs to be trained from scratch.

As parents, most human beings have the expectation of cultivating all-round talents. In 2017, Google invented a new way of education: the Transformer model.

In the past “college education”, the learning of AI relied heavily on the marking and selection of learning materials by humans. For example, the learning materials of AlphaGO came from professional chess players, not children who attended Go interest classes. The essence of the Transformer training method is to allow AI to “focus” on the learning materials of different subjects through a large number of previews.

The more data used for training, the better the effect of model preview; the more parameters, the more accurate the focus of the model will be. The self-focused education method liberates human hands, and at the same time allows AI to work on different subjects, achieving cross-field knowledge accumulation.

In 2018, Google released the first basic model BERT with over 100 million parameters based on Transformer, and in the subject of translation, its performance is far better than that of models cultivated in neural network training (such as CNN and RNN).

Since then, Transformer has swept the model education world, and the “big” of large models has also been sold by many companies. At present, the parameter volume of 10 billion is considered by the industry to be the inflection point for the leap in model capabilities.

The most intuitive advantage of large models is that they have inference and deduction capabilities that are difficult to match with small models, and can understand more complex and broader scenarios.

In addition to the field of content production, where else can the big model be used? Li Tao, the founder of APUS, a mobile Internet service provider, also gave an example: 80% of the traffic congestion in first-tier cities is not caused by too many vehicles, but by the intelligent transportation system with low degree of coordination – the seconds setting of traffic lights at each intersection How much? How to coordinate the traffic lights of different road sections? These problems are difficult to solve with only people or small models.

The emergence of large models has made huge amounts of traffic data useful. “People can only make decisions based on the traffic conditions of one road section at most, while large models can see more comprehensively.”

The greater potential of large models lies in the ability to reduce the cost of training small models. The big model is like a child who has gone through compulsory education. On this basis, it is a relatively low-cost and natural thing to go to university to choose a major and then become a higher-level professional.

This also means that with a large model as a base, a lightweight model for specific application scenarios can be trained from it, which can save the process of cultivating basic understanding from 0. Of course, the risk of this approach is that the ability of the large model will directly affect the quality of the cultivated model.

The AI ​​2.0 era when large models/basic models emerged vs. the previous AI 1.0 era, the process of artificial intelligence landing and application.Source: Innovation Works

The generative AI represented by ChatGPT is the first batch of outstanding graduates from the ivory tower to broad application in the era of large models. GPT-3.5 is a large model base hidden behind ChatGPT, which has excellent language generation ability. It is low-key, but has a huge effect-now, its educational resources have been upgraded and iterated into GPT-4.

However, the advent of the era of large models does not mean that high-end small and medium-sized models will be eliminated. When it comes to specific applications, the economy has to be taken into consideration by the enterprise, and it is particularly important to “slim down” the expensive large models. “The specific application scenarios will still be dominated by small and medium-sized models in the future.” Li Tao concluded.

Where is the difficulty in developing a large model?

In the past month, many chat applications called “ChatGPT-like” have flooded into the market.

Starting from the daily conversation experience only, the difference between each product does not seem to be big. Fooling or pleasing the questioner, poor timeliness and other problems are still common problems, but compared with the intelligent customer service that is limited to specific scenarios and answer templates, the current emerging dialogue robots have initially made people interested in “continuing to chat”.

But when we delve deeper into the details of the model parameters, Token, etc., everything becomes less Optimism. There are very few start-ups whose self-developed models have reached a parameter scale of tens of billions, and many companies with a considerable parameter scale are somewhat tricky.

In order to test the ability of the large model, a strategic analyst of an Internet company showed 36 Krypton 300-400 sets of prompts (question and answer prompts) designed by him for creative writing, news retrieval, logical reasoning, etc., and more than a dozen parameters exceeded 1 billion It will take two or three months to test each of the large-scale “ChatGPT-like” applications one by one.

After the test, he found that the answer patterns of most products are too similar to ChatGPT: “It’s hard not to doubt how much water the ‘self-developed’ model has.”

Why is there still no ChatGPT in China? Most practitioners feel that the answer is obvious, but it is helpless: to make a large model not only needs to spend a lot of money and time “death”, but also requires a social environment that is willing to invest in it regardless of cost.

Computing power, algorithms, data, and scenarios, these are the four key elements to run through the large model. The first two are also conceivably difficult to float at sea level, especially for small companies.

The article “ChatGPT China’s Metamorphosis” has mentioned these soul tortures: if you want to run through a model with more than 10 billion parameters, you need to use at least 1,000 GPU cards for one month of training, and to a certain extent, there are many talents who determine the ability of the algorithm. Gather in Silicon Valley or powerful factories.

The difficulty hidden under the sea is the industry value that has long been limited by commercial returns.

“Since the reform and opening up, China’s economy has maintained a period of rapid growth for more than 30 years, and has quickly ranked among the top in the world. This has a lot to do with the rapid commercialization of more industries driven by the development of the Internet.” AI, an Internet company at home and abroad Practitioners of the team for nearly 20 years told 36 Krypton. However, the experience of development has also become the shackles of inertia. “Facing the new opportunities brought by ChatGPT, we inevitably still use the old business return perspective to evaluate.”

Many investors also feel that it is not easy to take money readily. Affected by factors such as the severe situation of Chinese concept stocks and the difficulty of listing in the United States, many technology companies have become conservative and cautious about US dollar funds. Now that the proportion of government-led funds in RMB LPs has increased, funds are facing greater challenges in raising RMB.

The dual-currency funds caught in the middle are facing a difficult situation that both sides don’t like. “Except for a few top funds that are not short of money, most investment institutions are waiting and watching.” An investor in a dual-currency fund said.

Even if a large model is trained, no one dares to conclude that the return on capital will definitely come after the “5+2” investment cycle.

On March 2, OpenAI released the ChatGPT API at a “cabbage price” of $0.002/1000 tokens (approximately equal to 1 million words/18 RMB), throwing a bomb of uncertainty into the industry. Only half a month later, GPT-4 airborne on the track again as a Terminator. This makes many domestic companies feel: “The transaction volume is not enough.”

The first to be impacted are the companies at the model level. The performance of the model has not been polished to the level that can compete with ChatGPT, and they have lost their pricing power.

Reformation of the content industry is also inevitable, such as search, design, copywriting and so on. An employee of the Internet search business talked about the helplessness when responding to the new technology reform process: “For example, for advertisements directly linked to revenue, after the generative AI is connected, users may have the right to choose not to watch advertisements; Advertisements and search costs have also doubled after accessing large models.”

The idea of ​​commercial monetization seems to be as simple as just prefixing “AI+” to existing applications, but it is not very clear.

“Hazy beauty”, this is how many investors describe the targets on the AI ​​​​track in the past two months. “In the technology industry, many new technologies are themed investments at the beginning, and they invest in a kind of imagination economy.” An investor who has experienced many outlets such as Metaverse and Web3 told 36 Krypton, “We tend to think that the current ‘AI+ ‘It is possible to make it happen, but because of this, the company’s vision (foresight) and business model will be more emphasized in the process of seeking financing.”

When I met an investor of a dual-currency fund a month ago, she was rejecting a company that issued a military order of “training a large model within one year”. When I saw her again recently, the other party used the same two questions to dissuade many companies that are catching up with the trend:

“Where is the necessity for you to make a large model?”

“Is there any clear business model?”

Scenarios and data, opportunities for domestic small models

But fortunately, there is no shortage of AI model landing scenarios in China, as well as rich user data-this allows domestic companies to cultivate the “watermelon” of large models, and at the same time harvest the “sesame seeds” sown by lightweight models.

Back to the essence of model training: quantitative changes cause qualitative changes. The basis for the miracle of violence lies in the massive amount of data, and the over 1 billion Internet users in our country have provided enough fuel for the research and development of large models. The wave of Cryptoization that has swept through the past decade has made it possible for AI to quickly land in enough mature industries, and at the same time inject new blood into the fledgling industries.

Many funds that once set the flag of “All in the big model” chose to cool down after nearly three months of enthusiasm. An investor in a dual-currency fund told 36 Krypton that the team has adjusted its investment strategy, “Instead of investing in a model-level company, it is better to discuss with the existing portfolio (investment portfolio) how to access the model optimization business.”

However, when focusing on specific application scenarios, it is often not the large models that will ultimately play a role, but the lightweight small and medium models. Large models cover a wide range of areas, but their ability to reason and deduce specific scenarios is often inferior to that of “expert” small and medium models. On the other hand, starting from a more realistic cost issue, small and medium models can reduce the cost of computing power required to run large models to 1/10 or even 1/100.

Li Tao believes that what domestic enterprises can pursue at this stage is “bringing doctrine”, based on overseas large-scale open source models, and polishing small and medium models to the top level:

“Now domestic companies can run through the following path: use overseas large models to verify the landing scenarios, then train small and medium models based on our rich data resources, and finally implement specific scenarios—the four elements of large models, in addition to computing Strength is long-distance running, and the remaining three can be grasped in your hands.”

This also means that domestic model-level companies with scenarios and data can still seize many opportunities under the competitive pressure given by OpenAI. After the small and medium-sized models are implemented, the data accumulated by various industries can become the “flywheel” of self-developed large models.

After seeing OpenAI stepping out a clear path, more people are willing to flock to the “no man’s land” regardless of too much cost.

For example, based on the imagination of “manipulating AI with AI”, some overseas companies that build “next-generation RPA (Robotic process automation) platforms” through large models have been favored by capital.

The most typical case is that in April last year, Adept, an American AI start-up company born with the “golden spoon” of Google’s AI core R&D team, quickly won a $65 million Series A round of financing. Companies in a similar direction include Replicate, which has been invested by a16z, and Deepset in Germany.

The breakthrough of the application direction of “RPA+AI” lies in the fact that the large model is implemented as a middle platform for invoking and controlling intelligent tools, allowing enterprises to intelligently invoke corresponding Crypto tools with less code operation. A domestic entrepreneur in a related direction predicted, “In the next ten years, the RPA industry may no longer exist alone, and Crypto tools can be directly connected to individuals without code.”

During the period from 2019 to 2021, overseas capital flows to generative AI businesses increased by about 130%, and the growth was mainly driven by areas such as machine learning operations (MLOps), text writing, and data. Source: Base10

Some intermediate formats serving model training, management, and operation and maintenance have also initially formed. For example, some companies have developed a model that makes model training less costly and more efficient, allowing people to partially reproduce ChatGPT with only one consumer-grade GPU memory.

Whether they are conservative or calm, or embrace uncertainty, the first thing investors have to face is the rising corporate valuation in the tide. How much is the enterprise’s ability, how much is the water in the bubble, and it will take a certain amount of time before the AI ​​dream that was rolled up by ChatGPT really lands, and it will take a certain amount of time to let the track go from the fake to the real.

Further reading:

“ChatGPT China Metamorphosis | Deep Krypton”

Source of information: compiled from 8BTC by 0x Information.Copyright belongs to the author, without permission, may not be reproduced

Related Posts

Director of Beijing Economic and Information Bureau: Beijing will actively promote the procurement and use of safe and reliable large-scale models by government agencies and other organizations in this city

According to a report by the Financial Associated Press on July 2, Jiang Guangzhi, Secretary of the Party Leadership Group and Director of the Beijing Municipal Bureau of Economy and Information Technology, said at the Artificial Intelligence Summit Forum of the 2023…
Read More