跳至主要内容

Revealing DeepSeek: A more extreme story of Chinese technological idealism

 文 | 于丽丽  Wen | Yu Lili

编辑 | 刘旌  Edit | Liu Jing

中国的7家大模型创业公司中,DeepSeek(深度求索)最不声不响,但它又总能以出其不意的方式被人记住。
Of the 7 major model startups in China, DeepSeek is the least silent, but it can always be remembered in an unexpected way.

一年前,这种出其不意源自它背后的量化私募巨头幻方,是大厂外唯一一家储备万张A100芯片的公司,一年后,则来自它才是引发中国大模型价格战的源头。
A year ago, this kind of quantitative private equity giant fantasies that did not mean to derive behind it was the only company outside the large factory that reserved 10,000 A100 chips. One year later, it came from it to trigger the source of China's big model price war.

在被AI连续轰炸的5月,DeepSeek一跃成名。起因是他们发布的一款名为DeepSeek V2的开源模型,提供了一种史无前例的性价比:推理成本被降到每百万token仅 1块钱,约等于Llama3 70B的七分之一,GPT-4 Turbo的七十分之一。
In May, which was bombarded by AI, Deepseek became famous. The reason is that they released an open source model called DeepSeek V2, which provides an unprecedented cost-effectiveness: the reasoning cost of reasoning is reduced to only 1 yuan per million token, which is about one-seventh of LLAMA3 70B, GPT-4 4 Turbo's seventy -tenth.

DeepSeek被迅速冠以“AI界拼多多”之称的同时,字节、腾讯、百度、阿里等大厂也按耐不住,纷纷降价。中国大模型价格战由此一触即发。
At the same time that DEEPSEEK was quickly crowned as "AI Fighting Duoduo", the large manufacturers such as bytes, Tencent, Baidu, Ali and other large manufacturers were also unbearable, and the prices were reduced. The price war in China is from this.

弥漫的硝烟其实掩盖了一个事实:与很多大厂烧钱补贴不同,DeepSeek是有利润的。
The diffuse smoke actually covered a fact: Unlike many large manufacturers burning money subsidies, Deepseek is profitable.

这背后,是DeepSeek对模型架构进行了全方位创新。它提出的一种崭新的MLA(一种新的多头潜在注意力机制)架构,把显存占用降到了过去最常用的MHA架构的5%-13%,同时,它独创的DeepSeekMoESparse结构,也把计算量降到极致,所有这些最终促成了成本的下降。
Behind this is that Deepseek has innovated a full range of model architecture. It proposes a brand new MLA ( a new poly head potential attention mechanism ) architecture, which has reduced the memory of 5%-13%of the most commonly used MHA architecture in the past. The quantity dropped to the extreme, all of which eventually contributed to the decline in cost.

在硅谷,DeepSeek被称作“来自东方的神秘力量”。SemiAnalysis首席分析师认为,DeepSeek V2论文“可能是今年最好的一篇”。OpenAI前员工Andrew Carr认为论文“充满惊人智慧”,并将其训练设置应用于自己的模型。而OpenAI前政策主管、Anthropic联合创始人Jack Clark认为,DeepSeek“雇佣了一批高深莫测的奇才”,还认为中国制造的大模型,“将和无人机、电动汽车一样,成为不容忽视的力量。”
In Silicon Valley, Deepseek is called "mysterious power from the East". The chief analyst of Semianalysis believes that the DeepSeek V2 paper "may be the best article this year." Former OpenAI employee Andrew Carr believes that the paper is "full of amazing wisdom" and applies its training settings to its own model. Jack Clark, the former OPENAI policy director and co -founder of Anthropic, believes that DeepSeek "hired a group of unpredictable wizards" and also believed that the large model made in China, "will be like drones and electric vehicles, which will become unavoidable. strength."

在基本由硅谷牵动故事进展的AI浪潮里,这是罕有的情形。多位行业人士告诉我们,这种强烈的反响源自架构层面的创新,是国产大模型公司乃至全球开源基座大模型都很罕见的尝试。一位AI研究者表示,Attention架构提出多年来,几乎未被成功改过,更遑论大规模验证。“这甚至是一个做决策时就会被掐断的念头,因为大部分人都缺乏信心。”
This is a rare situation in the wave of AI that basically touched the story by Silicon Valley. A number of industry people told us that this strong response originated from the architecture level innovation, which is a rare attempt to be a rare attempt to make large domestic model companies and even global open source base models. A AI researcher said that the Attention architecture has been proposed for many years, and it has almost been successfully changed, let alone large -scale verification. "This is even a idea of ​​being cut off when making decisions, because most people lack confidence."

而另一方面,国产大模型之前很少涉足架构层面的创新,也是因为很少有人主动去击破那样一种成见:美国更擅长从0-1的技术创新,而中国更擅长从1-10的应用创新。何况这种行为非常不划算——新一代模型,过几个月自然有人做出来,中国公司只要跟随、做好应用即可。对模型结构进行创新,意味着没有路径可依,要经历很多失败,时间、经济成本都耗费巨大。
On the other hand, the domestic big model rarely involved the innovation at the architecture level, because few people took the initiative to break such a prejudice: the United States is better at technological innovation from 0-1, and China is better at 1-10 Application innovation. Besides, this behavior is very uncomfortable -a new generation model, naturally someone will do it in a few months. As long as Chinese companies follow and do well. Innovating the model structure means that there is no path to rely on, and a lot of failure is to go through a lot of failures. Time and economic costs are huge.

DeepSeek显然是逆行者。在一片认为大模型技术必然趋同,follow是更聪明捷径的喧哗声中,DeepSeek看重“弯路”中积累的价值,并认为中国的大模型创业者除应用创新外,也可以加入到全球技术创新的洪流中。
Deepseek is obviously retrograde. In a piece of big model technology that is inevitable, Follow is the noise of more smart shortcuts. DeepSeek values ​​the value accumulated in the "detours", and believes that in addition to application innovation, Chinese big model entrepreneurs can also join global technological innovation. In the torrent.

DeepSeek的很多抉择都与众不同。截至目前,7家中国大模型创业公司中,它是唯一一家放弃“既要又要”路线,至今专注在研究和技术,未做toC应用的公司,也是唯一一家未全面考虑商业化,坚定选择开源路线甚至都没融过资的公司。这些使得它经常被遗忘在牌桌之外,但在另一端,它又经常在社区被用户“自来水”式传播。
Many of DeepSeek's choices are different. As of now, among the seven major Chinese model startups, it is the only company that has given up the "both must and also" route and is focusing on research and technology. The open source route has not even finished the company. These are often forgotten from the table, but at the other end, it is often spread by users by users in the community.

DeepSeek究竟是如何炼成的?我们为此访谈了甚少露面的DeepSeek创始人梁文锋。
How is DeepSeek made? We interviewed Liang Wenfeng, the founder of Deepseek, who rarely appeared.

这位从幻方时代,就在幕后潜心研究技术的80后创始人,在DeepSeek时代,依旧延续着他的低调作风,和所有研究员一样,每天“看论文,写代码,参与小组讨论”。
This era of post -80s, who has been studying technology behind the scenes, still continues his low -key style in the DEEPSEEK era. Like all researchers, every day, "look at the dissertation, write code, and participate in group discussions."

和很多量化基金创始人都有过海外对冲基金履历,多出身物理、数学等专业不同的是,梁文锋一直是本土背景,早年就读的也是浙江大学电子工程系人工智能方向。
Different from the founders of many quantitative funds have the overseas hedge fund resumes. Different from the majors of physics and mathematics, Liang Wenfeng has always been a local background. In his early years, he also studied artificial intelligence in the Department of Electronic Engineering of Zhejiang University.

多位行业人士和DeepSeek研究员告诉我们,梁文锋是当下中国AI界非常罕见的“兼具强大的infra工程能力和模型研究能力,又能调动资源”、“既可以从高处做精准判断,又可以在细节上强过一线研究员”的人,他拥有“令人恐怖的学习能力”,同时又“完全不像一个老板,而更像一个极客”。
Several industry insiders and deepseek researcher told us that Liang Wenfeng is a very rare "strong Infra engineering ability and model research ability in the Chinese AI industry, but also can mobilize resources." Those who are more than a front -line researcher in details ", he has" terrifying learning ability ", and at the same time," is not like a boss at all, but more like a geek. "

这是一次尤为难得的访谈。访谈里,这位技术理想主义者,提供了目前中国科技界特别稀缺的一种声音:他是少有的把“是非观”置于“利害观”之前,并提醒我们看到时代惯性,把“原创式创新”提上日程的人。
This is a particularly rare interview. In the interview, this technical idealist provides a very scarce voice in the Chinese scientific and technological community: he is rare to put the "right or wrong view" before the "concept of interest", and remind us to see the inertia of the times. "Original Innovation" on the agenda.

一年前,DeepSeek刚下场时,我们初次访谈了梁文锋 :《疯狂的幻方:一家隐形AI巨头的大模型之路》 。如果说当时那句「务必要疯狂地怀抱雄心,且还要疯狂地真诚」还是一句美丽的口号,一年过去,它已经在成为一种行动。
One year ago, when DeepSeek first ended, we first interviewed Liang Wenfeng: "Crazy Fantasy Fang: The Road to a Big Model of an Invisible AI Giant". If the phrase "must be embraced madly, and to be madly sincere" is still a beautiful slogan, one year has passed, it is already becoming a action.

以下为对话部分:  The following is the dialogue part:

价格战第一枪是怎么打响的?  How did the first shot of the price war began?

「暗涌」:DeepSeek V2模型发布后,迅速引发一场血雨腥风的大模型价格战,有人说你们是行业的一条鲶鱼。
"Dark Surge": After the release of the Deepseek V2 model, it quickly triggered a big model price war with a bloody storm. Some people said that you are a catfish in the industry.

梁文锋:我们不是有意成为一条鲶鱼,只是不小心成了一条鲶鱼。
Liang Wenfeng : We don't intend to be a catfish, but we accidentally become a catfish.

「暗涌」:这个结果让你们意外吗?  "Dark": Is this result surprised you?

梁文锋:非常意外。没想到价格让大家这么敏感。我们只是按照自己的步调来做事,然后核算成本定价。我们的原则是不贴钱,也不赚取暴利。这个价格也是在成本之上稍微有点利润。
Liang Wenfeng : Very unexpected. I did not expect the price to be so sensitive. We just do things according to our own pace, and then calculate the cost. Our principles are not money or profit. This price is also a little profitable on the cost.

「暗涌」:5天后智谱AI就跟进了,之后是字节、阿里、百度、腾讯等大厂。
"Dark Surge": After 5 days, the wisdom spectrum AI followed up, and then large factories such as bytes, Ali, Baidu, Tencent and other large manufacturers.

梁文锋:智谱AI降的是一个入门级产品,和我们同级别的模型仍然收费很贵。字节是真正第一个跟进的。旗舰模型降到和我们一样的价格,然后触发了其它大厂纷纷降价。因为大厂的模型成本比我们高很多,所以我们没想到会有人亏钱做这件事,最后就变成了互联网时代的烧钱补贴的逻辑。
Liang Wenfeng : The wisdom spectrum AI reduces an entry -level product, and the model of our same level is still expensive. The byte is the first to follow up. The flagship model dropped to the same price as us, and then triggered other large manufacturers to reduce prices. Because the cost of the model of the big factory is much higher than us, we did not expect that someone would lose money to do this, and finally became the logic of burning subsidies in the Internet era.

「暗涌」:外部看来,降价很像在抢用户,互联网时代的价格战通常如此。
"Dark surge": It seems that the price reduction is very similar to being robbing users. The price war in the Internet era is usually the case.

梁文锋:抢用户并不是我们的主要目的。我们降价一方面是因为我们在探索下一代模型的结构中,成本先降下来了,另一方面也觉得无论API,还是AI,都应该是普惠的、人人可以用得起的东西。
Liang Wenfeng : Raising users is not our main purpose. On the one hand, our price cut is because in the structure of the next generation of models, the cost drops first, and on the other hand, we also feel that both the API or AI should be inclusive and everyone can use things.

「暗涌」:在这之前,大部分中国公司都会直接copy这一代的 Llama结构去做应用,为什么你们会从模型结构切入?
"Dark Surge": Before that, most Chinese companies will directly Copy's LLAMA structure to apply. Why do you cut in from the model structure?

梁文锋:如果目标是做应用,那沿用 Llama结构,短平快上产品也是合理选择。但我们目的地是AGI,这意味着我们需要研究新的模型结构,在有限资源下,实现更强的模型能力。这是scale up到更大模型所需要做的基础研究之一。除了模型结构,我们还做了大量其他的研究,包括怎么构造数据,如何让模型更像人类等,这都体现在我们发布的模型里。另外,Llama的结构,在训练效率和推理成本上,和国外先进水平估计也已有两代差距。
Liang Wenfeng : If the goal is to apply it, the LLAMA structure is used, and the product is also a reasonable choice. But our destination is AGI, which means that we need to study new model structures and achieve stronger model capabilities under limited resources. This is one of the basic research that Scale UP needs to do a larger model. In addition to the model structure, we have also done a lot of other studies, including how to construct data and how to make the model more like humans, which are reflected in the model we posted. In addition, the structure of LLAMA, in terms of training efficiency and reasoning costs, has two generations gap with advanced foreign levels.

「暗涌」:这种代差主要来自哪里?  "Dark Surge": Where does this difference come from?

梁文锋:首先训练效率有差距。我们估计,国内最好的水平和国外最好的相比,模型结构和训练动力学上可能有一倍的差距,光这一点我们要消耗两倍的算力才能达到同样效果。另外数据效率上可能也有一倍差距,也就是我们要消耗两倍的训练数据和算力,才能达到同样的效果。合起来就要多消耗4倍算力。我们要做的,正是不停地去缩小这些差距。
梁文锋:首先训练效率有差距。 We estimate that compared with the best level in China and the best abroad, there may be double the model structure and training dynamics. We have to consume twice the computing power to achieve the same effect. In addition, there may be double gap in data efficiency, that is, we have to consume twice the training data and computing power to achieve the same effect. It takes 4 times more computing power to close. What we have to do is constantly narrowing these gaps.

「暗涌」:大部分中国公司都选择既要模型又要应用,为什么DeepSeek目前选择只做研究探索?
"Dark Yong": Most Chinese companies choose to use both models and application. Why is DeepSeek choose to only do research and exploration?

梁文锋:因为我们觉得现在最重要的是参与到全球创新的浪潮里去。过去很多年,中国公司习惯了别人做技术创新,我们拿过来做应用变现,但这并非是一种理所当然。这一波浪潮里,我们的出发点,就不是趁机赚一笔,而是走到技术的前沿,去推动整个生态发展。
Liang Wenfeng : Because we feel that the most important thing now is to participate in the wave of global innovation. In the past many years, Chinese companies have been accustomed to making technological innovation. We have taken it for application monetization, but this is not a matter of course. In this wave of waves, our starting point is not to take the opportunity to make a fortune, but to the forefront of technology to promote the entire ecological development.

「暗涌」:互联网和移动互联网时代留给大部分人的惯性认知是,美国擅长搞技术创新,中国更擅长做应用。
"Dark Surge": The inertia cognition left by most people in the Internet and mobile Internet era is that the United States is good at engaging in technological innovation and China is better at applying.

梁文锋:我们认为随着经济发展,中国也要逐步成为贡献者,而不是一直搭便车。过去三十多年IT浪潮里,我们基本没有参与到真正的技术创新里。我们已经习惯摩尔定律从天而降,躺在家里18个月就会出来更好的硬件和软件。Scaling Law也在被如此对待。
Liang Wenfeng : We believe that with the development of the economy, China must gradually become contributors, rather than always taking stools. In the IT wave in the past 30 years, we have basically not participated in real technological innovation. We are accustomed to falling from the sky, and we will come out for better hardware and software when we lie at home for 18 months. Scaling Law is also treated like this.

但其实,这是西方主导的技术社区一代代孜孜不倦创造出来的,只因为之前我们没有参与这个过程,以至于忽视了它的存在。
But in fact, this was created by the Western -led technological community generation, because we did not participate in this process before, so that we ignored its existence.

真正的差距不是一年或两年,而是原创和模仿之差  The real gap is not one year or two years, but the difference between original and imitation

「暗涌」:为什么DeepSeek V2会让硅谷的很多人惊讶?
"Dark Surge": Why does DeepSeek V2 surprise many people in Silicon Valley?

梁文锋:在美国每天发生的大量创新里,这是非常普通的一个。他们之所以惊讶,是因为这是一个中国公司,在以创新贡献者的身份,加入到他们游戏里去。毕竟大部分中国公司习惯follow,而不是创新。
Liang Wenfeng : This is a very ordinary one in the large number of innovations in the United States every day. The reason why they were surprised was because it was a Chinese company, joining the game as an innovative contributor to their games. After all, most Chinese companies are used to Follow, not innovation.

「暗涌」:但这种选择放在中国语境里,也过于奢侈。大模型是一个重投入游戏,不是所有公司都有资本只去研究创新,而不是先考虑商业化。
"Dark Surging": But this choice is too luxurious in the context of China. Large models are a heavy -duty game. Not all companies have capital only to research innovation, rather than considering commercialization first.

梁文锋:创新的成本肯定不低,过去那种拿来主义的惯性也和过去的国情有关。但现在,你看无论中国的经济体量,还是字节、腾讯这些大厂的利润,放在全球都不低。我们创新缺的肯定不是资本,而是缺乏信心以及不知道怎么组织高密度的人才实现有效的创新。
Liang Wenfeng : The cost of innovation is definitely not low. The inertia of the past doctrine is also related to the past national conditions. But now, you can see that regardless of China's economy, or the profits of large factories such as bytes and Tencent, it is not low in the world. We must not be capital, but lack confidence and do not know how to organize high -density talents to achieve effective innovation.

「暗涌」:为什么中国公司——包括不缺钱的大厂,这么容易把快速商业化当第一要义?
"Dark Surge": Why does a Chinese company -including a large factory that is not short of money, so it is easy to take fast commercialization as the first priority?

梁文锋:过去三十年,我们都只强调赚钱,对创新是忽视的。创新不完全是商业驱动的,还需要好奇心和创造欲。我们只是被过去那种惯性束缚了,但它也是阶段性的。
Liang Wenfeng : In the past three decades, we have all emphasized to make money and ignore innovation. Innovation is not entirely commercially driven, but also needs curiosity and creativity. We are just bound by the inertia of the past, but it is also staged.

「暗涌」:但你们究竟是一个商业组织,而非一个公益科研机构,选择创新,又通过开源分享出去,那要在哪里形成护城河?像5月这次MLA架构的创新,也会很快被其他家copy吧?
"Dark": But you are a commercial organization, not a public welfare scientific research institution, choose innovation, and share it through open source. Where can you form a moat? Like the innovation of the MLA architecture in May, will it be Copy soon?

梁文锋:在颠覆性的技术面前,闭源形成的护城河是短暂的。即使OpenAI闭源,也无法阻止被别人赶超。所以我们把价值沉淀在团队上,我们的同事在这个过程中得到成长,积累很多know-how,形成可以创新的组织和文化,就是我们的护城河。
Liang Wenfeng : In the face of disruptive technology, the moat formed by the closed source is short. Even if the OpenAI is closed, it cannot be stopped by others. Therefore, we have precipitated value on the team. Our colleagues have grown in the process, accumulating a lot of Know-How, to form an innovative organization and culture, which is our moat.

开源,发论文,其实并没有失去什么。对于技术人员来说,被follow是很有成就感的事。其实,开源更像一个文化行为,而非商业行为。给予其实是一种额外的荣誉。一个公司这么做也会有文化的吸引力。
Open source, papers, actually did not lose anything. For technicians, it is very accomplished by Follow. In fact, open source is more like a cultural behavior, not a business behavior. Giving is actually an additional honor. A company does this will also be attractive.

「暗涌」:你怎么看类似朱啸虎的这种市场信仰派观点?
"Dark": What do you think of the market beliefs like Zhu Xiaohu?

梁文锋:朱啸虎是自洽的,但他的打法更适合快速赚钱的公司,而你看美国最赚钱的公司,都是厚积薄发的高科技公司。
Liang Wenfeng : Zhu Xiaohu is self -consistent, but his play is more suitable for fast -making companies, and you see that the most profitable companies in the United States are high -tech companies.

「暗涌」:但做大模型,单纯的技术领先也很难形成绝对优势,你们赌的那个更大的东西是什么?
"Dark Yong": But when making a big model, it is difficult to form an absolute advantage with simple technical leadership. What is the bigger thing you bet?

梁文锋我们看到的是中国AI不可能永远处在跟随的位置。我们经常说中国AI和美国有一两年差距,但真实的gap是原创和模仿之差。如果这个不改变,中国永远只能是追随者,所以有些探索也是逃不掉的。
Liang Wenfeng : What we see is that China AI cannot always follow. We often say that Chinese AI and the United States have a gap between one or two years, but the real Gap is the difference between original and imitation. If this does not change, China can only be followers, so some explorations cannot escape.

英伟达的领先,不只是一个公司的努力,而是整个西方技术社区和产业共同努力的结果。他们能看到下一代的技术趋势,手里有路线图。中国AI的发展,同样需要这样的生态。很多国产芯片发展不起来,也是因为缺乏配套的技术社区,只有第二手消息,所以中国必然需要有人站到技术的前沿。
Nvidia's lead is not just the efforts of a company, but the result of the joint efforts of the entire Western technology community and industry. They can see the technical trend of the next generation and have a roadmap in their hands. The development of Chinese AI also needs such an ecology. Many domestic chips cannot develop, and because of lack of supporting technology communities, there are only second -hand news, so China must need to stand at the forefront of technology.

更多的投入并不一定产生更多的创新  More investment does not necessarily produce more innovation

「暗涌」:现在的DeepSeek有一种OpenAI早期的理想主义气质,也是开源的。后边你们会选择闭源吗?OpenAI和Mistral都有过从开源到闭源的过程。
"Dark": Now Deepseek has an early idealism of OpenAI, which is also open source. Will you choose to close the source? Both Openai and Mistral have the process from open source to closed sources.

梁文锋:我们不会闭源。我们认为先有一个强大的技术生态更重要。
Liang Wenfeng : We will not close the source. We think it is more important to have a powerful technical ecology.

「暗涌」:你们有融资计划吗?看有媒体报道,幻方对DeepSeek有独立拆分上市的计划,硅谷的AI创业公司,最终也都难免要和大厂绑定。
"Dark": Do you have a financing plan? Seeing media reports, the magic party has a plan to separate the listing of Deepseek. The AI ​​startups in Silicon Valley will inevitably bind to the large manufacturers in the end.

梁文锋:短期内没有融资计划,我们面临的问题从来不是钱,而是高端芯片被禁运。
Liang Wenfeng : There is no financing plan in the short term. The problems we face are never money, but that high -end chips are embarked down.

「暗涌」:很多人认为,做AGI和做量化是完全不同的两件事,量化可以闷声去做,但AGI可能更需要高举高打,需要结盟,这样可以让你的投入变大。
"Dark Surge": Many people think that doing AGI and quantification are two things that are completely different. Quantitatives can be done with a stuffy voice, but AGI may need to hold high beating and all alliances, which can make your investment larger.

梁文锋:更多的投入并不一定产生更多的创新。否则大厂可以把所有的创新包揽了。
Liang Wenfeng : More investment does not necessarily produce more innovation. Otherwise, big manufacturers can take over all innovations.

「暗涌」:你们现在不做应用,是因为你们没有运营的基因吗?
"Undercurrent": You don't make applications now, is it because you don't have the genes to operate?

梁文锋:我们认为当前阶段是技术创新的爆发期,而不是应用的爆发期。长远来说,我们希望形成一种生态,就是业界直接使用我们的技术和产出,我们只负责基础模型和前沿的创新,然后其它公司在DeepSeek 的基础上构建toB、toC的业务。如果能形成完整的产业上下游,我们就没必要自己做应用。当然,如果需要,我们做应用也没障碍,但研究和技术创新永远是我们第一优先级。
Liang Wenfeng : We believe that the current stage is an explosion period of technological innovation, not an explosion period of application. In the long run, we hope to form an ecosystem in which the industry directly uses our technology and output. We are only responsible for basic models and cutting-edge innovations, and then other companies build toB and toC businesses based on DeepSeek. If we can form a complete upstream and downstream industry, we don’t need to make applications ourselves. Of course, if necessary, there is no obstacle for us to apply it, but research and technological innovation will always be our first priority.

「暗涌」:但选择API的话,为什么选择DeepSeek,而不是大厂?
"Undercurrent": But when it comes to choosing API, why choose DeepSeek instead of big manufacturers?

梁文锋:未来的世界很可能是专业化分工的,基础大模型需要持续创新,大厂有它的能力边界,并不一定适合。
Liang Wenfeng : The world of the future is likely to be one of specialization and division of labor. Basic large-scale models require continuous innovation. Large manufacturers have their own capability boundaries and may not necessarily be suitable.

「暗涌」:但技术真的可以拉开差距吗?你也说过并不存在绝对的技术秘密。
"Undercurrent": But can technology really widen the gap? You also said that there is no absolute technical secret.

梁文锋:技术没有秘密,但重置需要时间和成本。英伟达的显卡,理论上没有任何技术秘密,很容易复制,但重新组织团队以及追赶下一代技术都需要时间,所以实际的护城河还是很宽。
Liang Wenfeng : There is no secret in technology, but resetting requires time and cost. Nvidia's graphics cards theoretically do not have any technical secrets and are easy to copy, but it takes time to reorganize the team and catch up with next-generation technology, so the actual moat is still very wide.

「暗涌」:你们降价后,字节率先跟进,说明他们还是感受到某种威胁。你怎么看创业公司与大厂竞争的新解法?
"Undercurrent": After you lowered the price, Byte followed up first, which shows that they still feel some kind of threat. What do you think of the new solution for startups to compete with big companies?

梁文锋:说实话我们不太care这件事,只是顺便做了这件事。提供云服务不是我们的主要目标。我们的目标还是去实现AGI。
Liang Wenfeng : To be honest, we don’t care much about this matter, we just did it by the way. Providing cloud services is not our main goal. Our goal is still to achieve AGI.

目前没有看到什么新解法,但大厂也没有明显占优。大厂有现成的用户,但它的现金流业务也是它的包袱,也会让它成为随时被颠覆的对象。
I haven’t seen any new solutions so far, but the big manufacturers don’t have a clear advantage either. Big manufacturers have ready-made users, but their cash flow business is also a burden, making them vulnerable to subversion at any time.

「暗涌」:你怎么看DeepSeek之外的6家大模型创业公司的终局?
"Undercurrent": What do you think of the outcome of the six large-model startups besides DeepSeek?

梁文锋:可能活下来2到3家。现在都还处在烧钱阶段,所以那些自我定位清晰、更能精细化运营的,更有机会活下来。其它公司可能会脱胎换骨。有价值的东西不会烟消云散,但会换一种方式。
Liang Wenfeng : Maybe 2 to 3 families will survive. We are still in the money-burning stage, so those with clear self-positioning and more refined operations have a better chance of surviving. Other companies may be reinvented. Things of value will not disappear, but they will change.

「暗涌」:幻方时代,面对竞争的姿态就被评价为“我行我素”,很少在意横向比较。关于竞争,你思考的原点是什么?
"Undercurrent": In the era of magic square, the attitude in the face of competition was evaluated as "going one's own way" and rarely paying attention to horizontal comparisons. Regarding competition, what is the starting point of your thinking?

梁文锋:我经常思考的是,一个东西能不能让社会的运行效率变高,以及你能否在它的产业分工链条上找到擅长的位置。只要终局是让社会效率更高,就是成立的。中间很多都是阶段性的,过度关注必然眼花缭乱。
Liang Wenfeng : What I often think about is whether a thing can make society more efficient, and whether you can find a position where you are good at it in its industrial division of labor chain. As long as the end result is to make society more efficient, it is valid. There are many stages in between, and excessive attention will inevitably make you dizzy.

一群做“高深莫测”事的年轻人  A group of young people who do "unfathomable" things

「暗涌」:OpenAI前政策主管、Anthropic联合创始人Jack Clark认为DeepSeek雇佣了“一批高深莫测的奇才”,做出DeepSeek v2的是怎样一群人?
"Undercurrent": Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believes that DeepSeek hired "a group of unpredictable wizards". What kind of people made DeepSeek v2?

梁文锋:并没有什么高深莫测的奇才,都是一些Top高校的应届毕业生、没毕业的博四、博五实习生,还有一些毕业才几年的年轻人。
Liang Wenfeng : There are no mysterious geniuses. They are all recent graduates from top universities, interns with Ph.D. 4 and Ph. 5 who have not graduated, and some young people who have graduated only a few years ago.

「暗涌」:很多大模型公司都执着地去海外挖人,很多人觉得这个领域前50名的顶尖人才可能都不在中国的公司,你们的人都来自哪里?
"Undercurrent": Many large model companies are persistent in poaching people overseas. Many people think that the top 50 talents in this field may not be in Chinese companies. Where do your people come from?

梁文锋:V2模型没有海外回来的人,都是本土的。前50名顶尖人才可能不在中国,但也许我们能自己打造这样的人。
Liang Wenfeng : There are no people who came back from overseas in the V2 model, they are all local. The top 50 talents may not be in China, but maybe we can build such people ourselves.

「暗涌」:这次MLA创新是如何发生的?听说idea最早来自一个年轻研究员的个人兴趣?
"Undercurrent": How did this MLA innovation happen? I heard that the idea first came from the personal interest of a young researcher?

梁文锋:在总结出Attention架构的一些主流变迁规律后,他突发奇想去设计一个替代方案。不过从想法到落地,中间是一个漫长的过程。我们为此组了一个team,花了几个月时间才跑通。
Liang Wenfeng : After summarizing some mainstream changes in the Attention architecture, he suddenly wanted to design an alternative. However, it is a long process from idea to implementation. We formed a team for this and it took us several months to get through it.

「暗涌」:这种发散性灵感的诞生和你们完全创新型组织的架构很有关系。幻方时代,你们就很少自上而下地指派目标或任务。但AGI这种充满不确定性的前沿探索,是否多了管理动作?
"Undercurrent": The birth of this divergent inspiration is closely related to the structure of your completely innovative organization. In the Magic Square era, you rarely assign goals or tasks from top to bottom. But does AGI, a frontier exploration full of uncertainty, require more management actions?

梁文锋:DeepSeek也全是自下而上。而且我们一般不前置分工,而是自然分工。每个人有自己独特的成长经历,都是自带想法的,不需要push他。探索过程中,他遇到问题,自己就会拉人讨论。不过当一个idea显示出潜力,我们也会自上而下地去调配资源。
Liang Wenfeng : DeepSeek is also all bottom-up. Moreover, we generally do not pre-position division of labor, but natural division of labor. Everyone has their own unique growth experience and comes with their own ideas, so there is no need to push them. During the exploration process, when he encounters problems, he will invite others to discuss them. But when an idea shows potential, we will allocate resources from top to bottom.

「暗涌」:听说DeepSeek对于卡和人的调集非常灵活。
"Undercurrent": I heard that DeepSeek is very flexible in mobilizing cards and people.

梁文锋:我们每个人对于卡和人的调动是不设上限的。如果有想法,每个人随时可以调用训练集群的卡无需审批。同时因为不存在层级和跨部门,也可以灵活调用所有人,只要对方也有兴趣。
Liang Wenfeng : There is no upper limit for each of us to transfer cards and people. If you have an idea, everyone can call the card of the training cluster at any time without approval. At the same time, because there are no hierarchies or cross-departments, everyone can be flexibly called as long as the other party is also interested.

「暗涌」:一种松散的管理方式也取决于你们筛选到了一批强热爱驱动的人。听说你们很擅长从细节招人, 可以让一些非传统评价指标里优秀的人被选出来。
"Undercurrent": A loose management method also depends on you selecting a group of people who are driven by strong love. I heard that you are very good at recruiting people based on details, and can select some outstanding people based on non-traditional evaluation indicators.

梁文锋:我们选人的标准一直都是热爱和好奇心,所以很多人会有一些奇特的经历,很有意思。很多人对做研究的渴望,远超对钱的在意。
Liang Wenfeng : The criteria for choosing people have always been love and curiosity, so many people will have some strange experiences, which are very interesting. Many people's desire for research far exceeds their care of money.

「暗涌」: transformer诞生在谷歌的AI Lab,ChatGPT诞生在OpenAI,你觉得大公司的AILab 和一个创业公司对于创新产生的价值有什么不同?
"Undercurrent": Transformer was born in Google's AI Lab, and ChatGPT was born in OpenAI. What do you think is the difference in the value of innovation between a large company's AILab and a startup company?

梁文锋:不管是Google实验室,还是OpenAI,甚至中国大厂的AI Lab,都很有价值的。最后是OpenAI做出来,也有历史的偶然性。
Liang Wenfeng : Whether it is Google Labs, OpenAI, or even the AI ​​Labs of major Chinese companies, they are all valuable. In the end, OpenAI made it, and it was also a historical accident.

「暗涌」:创新很大程度也是一种偶然吗?我看你们办公区中间那排会议室左右两侧都设置了可以随意推开的门。你们同事说,这就是给偶然留出空隙。transfomer诞生中就发生过那种偶然经过的人听到后加入,最终把它变成一个通用框架的故事。
"Undercurrent": Is innovation largely an accident? I see that the row of conference rooms in the middle of your office area has doors on the left and right that can be pushed open at will. Your colleagues said that this is to leave room for chance. In the birth of transformer, there was a story where people passing by by chance heard about it and joined in, eventually turning it into a universal framework.

梁文锋:我觉得创新首先是一个信念问题。为什么硅谷那么有创新精神?首先是敢。Chatgpt出来时,整个国内对做前沿创新都缺乏信心,从投资人到大厂,都觉得差距太大了,还是做应用吧。但创新首先需要自信。这种信心通常在年轻人身上更明显。
Liang Wenfeng : I think innovation is first of all a matter of belief. Why is Silicon Valley so innovative? The first is to dare. When Chatgpt came out, the entire country lacked confidence in cutting-edge innovation. From investors to large manufacturers, everyone felt that the gap was too big, so they should just make applications. But innovation first requires confidence. This confidence is usually more pronounced in younger people.

「暗涌」:但你们不参与融资,很少对外发声,社会声量上肯定不如那些融资活跃的公司,怎么确保DeepSeek就是做大模型的人的首选?
"Undercurrent": But you don't participate in financing, rarely speak out to the outside world, and your social voice is definitely not as good as those companies that are active in financing. How can you ensure that DeepSeek is the first choice for people who want to build large models?

梁文锋:因为我们在做最难的事。对顶级人才吸引最大的,肯定是去解决世界上最难的问题。其实,顶尖人才在中国是被低估的。因为整个社会层面的硬核创新太少了,使得他们没有机会被识别出来。我们在做最难的事,对他们就是有吸引力的。
Liang Wenfeng : Because we are doing the most difficult thing. What attracts top talents the most is definitely solving the world’s most difficult problems. In fact, top talents are underestimated in China. Because there are too few hard-core innovations at the entire social level, they have no chance to be identified. We are doing the most difficult thing, which is attractive to them.

「暗涌」:前一段OpenAI的发布并没有等来GPT5,很多人觉得这是技术曲线明显在放缓,也很多人开始质疑Scaling Law,你们怎么看?
"Undercurrent": The release of OpenAI some time ago did not wait for GPT5. Many people think that the technology curve is obviously slowing down, and many people are beginning to question the Scaling Law. What do you think?

梁文锋:我们偏乐观,整个行业看起来都符合预期。OpenAI也不是神,不可能一直冲在前面。
Liang Wenfeng : We are optimistic, and the entire industry seems to meet expectations. Openai is not a god, it is impossible to rush ahead.

「暗涌」:你觉得AGI还要多久实现,发布DeepSeek V2前,你们发布过代码生成和数学的模型,也从dense模型切换到了MOE,所以你们的AGI路线图有哪些坐标?
"Undercurrent": How long do you think it will take for AGI to be realized? Before releasing DeepSeek V2, you released code generation and mathematical models, and also switched from dense models to MOE. So what are the coordinates of your AGI roadmap?

梁文锋:可能是2年、5年或者10年,总之会在我们有生之年实现。至于路线图,即使在我们公司内部,也没有统一意见。但我们确实押注了三个方向。一是数学和代码,二是多模态,三是自然语言本身。数学和代码是AGI天然的试验场,有点像围棋,是一个封闭的、可验证的系统,有可能通过自我学习就能实现很高的智能。另一方面,可能多模态、参与到人类的真实世界里学习,对AGI也是必要的。我们对一切可能性都保持开放。
Liang Wenfeng : It may be 2 years, 5 years or 10 years. In short, it will be realized in our lifetime. As for the roadmap, even within our company, there is no consensus. But we did bet in three directions. One is mathematics and code, the second is multimodality, and the third is natural language itself. Mathematics and code are the natural testing ground for AGI. It is a bit like Go. It is a closed and verifiable system, and it is possible to achieve high intelligence through self-learning. On the other hand, multi-modal learning that involves humans in the real world may also be necessary for AGI. We are open to all possibilities.

「暗涌」:你觉得大模型终局是什么样态?  "Undercurrent": What do you think the ending of the big model will be like?

梁文锋:会有专门公司提供基础模型和基础服务,会有很长链条的专业分工。更多人在之上去满足整个社会多样化的需求。
Liang Wenfeng : There will be specialized companies providing basic models and basic services, and there will be a long chain of professional division of labor. More people can meet the diverse needs of society as a whole.

所有的套路都是上一代的产物  All routines are products of the previous generation

「暗涌」:过去这一年,中国的大模型创业还是有很多变化的,比如去年开头还很活跃的王慧文中场退出了,后来加入的公司也开始呈现出差异化。
"Undercurrent": In the past year, there have been many changes in China's large model entrepreneurship. For example, Wang Huiwen, who was active at the beginning of last year, withdrew from the company mid-term, and the companies he joined later began to show differentiation.

梁文锋:王慧文自己承担了所有的损失,让其他人全身而退。他做了一个对自己最不利,但对大家都好的选择,所以他做人是很厚道的,这点我很佩服。
Liang Wenfeng : Wang Huiwen took all the losses and let others escape unscathed. He made a choice that was most detrimental to himself but best for everyone, so he is a very kind person, which I admire very much.

「暗涌」:现在你的精力最多放在哪里?  "Undercurrent": Where do you focus most of your energy now?

梁文锋:主要的精力在研究下一代的大模型。还有很多未解决的问题。
Liang Wenfeng : The main focus is on researching the next generation of large models. There are still many unanswered questions.

「暗涌」:其他几家大模型创业公司都是坚持既要又要,毕竟技术不会带来永久领先,抓住时间窗口把技术优势落到产品也很重要,DeepSeek敢于专注在模型研究上是因为模型能力还不够吗?
"Undercurrent": Several other large model startups insist on having both. After all, technology will not bring permanent leadership. It is also important to seize the time window to put the technical advantages into products. DeepSeek dares to focus on model research. Is it because the model capability is not enough?

梁文锋:所有的套路都是上一代的产物,未来不一定成立。拿互联网的商业逻辑去讨论未来AI的盈利模式,就像马化腾创业时,你去讨论通用电气和可口可乐一样。很可能是一种刻舟求剑。
Liang Wenfeng : All routines are products of the previous generation and may not be valid in the future. Use the business logic of the Internet to discuss the future profit model of AI, just like when Ma Huateng started his business, you discussed General Electric and Coca-Cola. It is probably a kind of carving a boat to seek a sword.

「暗涌」:过去幻方就有很强的技术和创新基因,成长也比较顺利,这是你偏乐观的原因吗?
"Undercurrent": In the past, Huanfang had strong technology and innovation genes, and its growth was relatively smooth. Is this why you are optimistic?

梁文锋:幻方某种程度上增强了我们对技术驱动型创新的信心,但也不都是坦途。我们经历了一个漫长的积累过程。外部看到的是幻方2015年后的部分,但其实我们做了16年。
Liang Wenfeng : Magic Square has enhanced our confidence in technology-driven innovation to some extent, but it is not always a smooth road. We have gone through a long accumulation process. What we see from the outside is the part of Magic Square after 2015, but in fact we have been doing it for 16 years.

「暗涌」:回到关于原创式创新的话题。现在经济开始进入下行,资本也进入冷周期,所以它对原创式创新是否会带来更多抑制?
"Dark": Back to the topic about original innovation. Now that the economy has begun to fall, capital also enters the cold cycle, so will it bring more suppression of original innovation?

梁文锋:我倒觉得未必。中国产业结构的调整,会更依赖硬核技术的创新。当很多人发现过去赚快钱很可能来自时代运气,就会更愿意俯身去做真正的创新。
Liang Wenfeng : I don't think it is necessary. The adjustment of China's industrial structure will rely more on the innovation of hardcore technology. When many people find that in the past, they are likely to come from the times, and they will be more willing to lean down to do real innovation.

「暗涌」:所以你对这件事也是乐观的?  "Dark Surging": So are you optimistic about this?

梁文锋:我是八十年代在广东一个五线城市长大的。我的父亲是小学老师,九十年代,广东赚钱机会很多,当时有不少家长到我家里来,基本就是家长觉得读书没用。但现在回去看,观念都变了。因为钱不好赚了,连开出租车的机会可能都没了。一代人的时间就变了。
Liang Wenfeng : I grew up in a fifth -tier city in Guangdong in the 1980s. My father was a primary school teacher. In the 1990s, there were many opportunities to make money in Guangdong. At that time, many parents came to my house. Basically, parents felt that reading was useless. But now when I go back, my concept has changed. Because the money is not easy to make, even the chance of driving a taxi may be gone. The time of a generation has changed.

以后硬核创新会越来越多。现在可能还不容易被理解,是因为整个社会群体需要被事实教育。当这个社会让硬核创新的人功成名就,群体性想法就会改变。我们只是还需要一堆事实和一个过程。
There will be more and more hard-core innovations in the future. It may not be easy to understand now because the entire social group needs to be educated on the facts. When this society allows hard-core innovative people to become successful, group thinking will change. We just need a bunch of facts and a process.

评论

此博客中的热门博文

苏超:贸易战令你苦不堪言?业余足球或许能帮上忙 江苏省党员干部用奇思妙想打造成功赛事

 经济学人: 江苏官员用奇葩创意打造成功赛事   苏州作为中国最重要的电子制造中心之一,其经济地位却被一场足球赛抢尽风头。6月29日,当苏州队以3-0轻取毗邻的扬州队时,整座城市都沉浸在千年竞争的现代延续中——自大运河时代两城就争夺漕运枢纽地位,此后数百年间又在园林艺术上较劲。这场胜利延续了宋代以来苏州作为更富裕大城的传统优势。   自今年五月江苏省地方联赛启动以来,这场足球狂潮席卷全国。场均四万座次的体育场门票售罄成常态,上座率甚至两倍于中国职业中超联赛,黄牛票最高溢价达60倍。在苏州赛场,包括本报记者在内的数百名观众因抢不到票,被迫在场外观看直播。   这个出口大省正经历着中美贸易战的阵痛。仅苏州一地去年的对美出口额就占全国总量9%,主要涉及电子机械产品。赛事举办地昆山即使在贸易战前,青年失业率就居高不下。足球联赛的适时出现堪称及时雨——当地官员今年初策划的这项赛事,带来了肉眼可见的经济效益:数万球迷涌入江苏,航班酒店预订量激增,景区餐馆客流如织,某外卖平台啤酒销量环比增长90%。   这种爆红现象在中国已有先例:从化工城市淄博烧烤走红,到西北小城天水油泼辣子汤的意外出圈,低成本旅游热潮总能让冷门城市短暂成为网红。但江苏联赛或许能打破"三月热度"魔咒,其成功密码在于巧妙利用了根深蒂固的地域矛盾。当地人戏称"散装江苏"——南京因方言饮食更接近安徽而被群嘲,各城市间历史积怨颇深。以往官员竭力淡化这种对立,但这次却主动推波助澜,让"地域梗"成为最佳营销利器。   宽松的赞助规则也功不可没。相较于中国足协对职业联赛赞助商的严苛要求,江苏联赛既吸引喜力等国际品牌,也接纳路边烧烤摊加盟。这种"土洋结合"的反差萌,与深陷腐败丑闻的中超形成鲜明对比。苏州球迷用"纯粹"评价赛事,甘愿包容业余水平的技战术。   尽管河南已跟风推出联赛,广东四川也在筹备中,但江苏凭借雄厚财力与先发优势稳居领跑地位——全省13个参赛城市均配备世界级场馆设施。6月29日的绿茵胜负已分,而整个江苏正在这场创意实验中成为最大赢家。  

保护美国民主的机构已沦为陷阱

彭博: 美国的两党制长期以来一直被认为是抵御极端主义的屏障。然而,两极分化却反而使其成为了极端主义的催化剂。  美国人为他们的民主历史感到自豪,但并不为他们的民主现状感到自豪: 皮尤研究中心 2024 年的 一项调查发现,72% 的美国受访者认为他们的国家曾经是世界效仿的良好榜样,但只有 19% 的人认为现在仍然如此。 什么改变了? 并非美国的政治制度,它们基本保持不变。改变的是它们运作的环境:数十年来 不断加剧的政治极化, 将制度力量的真正源泉变成了显而易见的弱点。美国制度的某些特征曾被专家视为抵御反民主或极端主义冲动的保护伞,如今却被用来巩固权力。一旦堡垒被它曾经阻拦的势力攻占,它的壁垒就可能变成陷阱。 特朗普第二届政府 大幅扩张了 总统权力。白宫试图 篡夺 国会的部分支出权力,并重新划分公务员队伍,以便 总统更容易解雇 他们。特朗普不顾加州州长的反对,以应对“ 叛乱 ” 的名义,在加州 部署了国民警卫队。 尽管法院一再谴责政府,但其权力也受到挑战:政府加大了 对法官的言辞攻击 ,并被指控执行 法院命令 迟缓。 一些学生被拘留 ,其中一例似乎是 因为撰写了一篇专栏文章 。我们现在面临的局面是,研究民主的顶尖学者更愿意 强调 ,美国在滑向独裁的道路上已经跨越了重要的(尽管是可逆的)界限。 就在几年前,还没有多少人认为这是可能的。尽管美国本土存在着一些威权主义聚居区——最著名的是吉姆·克劳法下的南方——但在国家层面,美国拥有几个世纪以来运作良好的民主制度的清晰记录,至少在满足民主的最低定义方面是如此:我们举行竞争性的、自由的选举,现任政党可以输掉选举并接受这些损失。 当前面临威胁的民主制度 经历 了数个时期的巨大压力,包括一场极其血腥的内战和尼克松政府的权力攫取。然而,民主制度通过应对这些挑战而得以生存和发展:扩大选举权;从官僚机构的“分肥”制度转变为更加专业的公务员制度;以及一系列旨在防止水门事件重演的立法行动。 随着时间的推移,这种过往经历导致了一种可以理解且几乎不可动摇的信念:美国民主受到坚不可摧的制度保障的捍卫。正如参议员约翰·麦凯恩在2016年被问及当时的候选人唐纳德·特朗普 对民主构成的潜在威胁时所说 :“我们不是罗马尼亚。” 无论2016年的情况如何,这种信心现在看来都是不明智的。一项调查显示 ,自特朗普连任以来,政治学家对美国民主的评价大幅下...

特朗普的“大而美”法案将如何让中国再次伟大

托马斯·弗里德曼 你听到从东方传来的巨响吗?那是 14 亿中国人嘲笑我们的声音。 中国人简直不敢相信自己的运气:在耗电的人工智能时代即将到来之际,美国总统和他的政党堪称能想象到的最严重的战略自残了。他们通过的一项庞大的法案有着各种疯狂的内容,其中包括故意破坏美国通过可再生能源发电的能力,尤其是太阳能、电池和风能。 为什么?因为他们认为这些是 “自由派” 能源,尽管如今这些能源是促进我们的电网发展以满足人工智能数据中心激增需求的最快捷、最廉价的方式。 这与中国的做法恰恰相反。事实上,北京可能要把 7 月 4 日定为自己的全国性节日才行——美国电力依赖日。这真是编都编不出的情节:就连沙特阿拉伯也在加倍投资太阳能,以满足其想从西方引进的人工智能数据中心的需求,而特朗普 “大而美法案” 实际上恰恰相反。该法案将迅速逐步取消公用事业规模的太阳能和风能享有的税收抵免,以及电动汽车的税收抵免。这几乎保证了中国将拥有太阳能、风能、电动汽车和卡车以及自动驾驶汽车的未来。 值得庆幸的是,特朗普和他的朋友们确实将拜登时代的一项主要税收抵免政策保留到了 2036 年,该政策适用于建造其他零排放技术的公司,比如核反应堆、水电站、地热发电厂和电池储能系统。问题在于,在美国建造一座核电站可能需要长达 10 年的时间,而且,正如《纽约时报》报道的那样,该法案给电池抵免额度增加了 “复杂的限制”,“禁止接受者与中国等‘被禁止的外国实体’有联系”。因此,“一些人担心,这些限制太过复杂,最终可能会导致很多项目无法使用抵免。” 总而言之,这个乱七八糟的法案没有举行任何独立能源专家——甚至没有一位科学家——的听证会就匆匆通过了,肯定会危及可再生能源领域数十亿美元的投资(其中大部分是在共和党控制的州),并有可能使上万美国工人失去工作。顺便说一下,该法案还首次禁止对石油和天然气生产过程中排放的过量甲烷征收费用,为期 10 年,甲烷是全球变暖的主要驱动因素。 因此,这一法案将使你的家更热,空调费用更高,清洁能源产业岗位更少,让美国汽车工业更弱,让中国更开心。这怎么说得通呢? 说不通。在美国,最了解这一点的人实际上是埃隆 · 马斯克。毫无疑问,马斯克是美国最伟大的制造业创新者之一,他创立了全球领先的公司,制造电动汽车、可回收火箭、电池存储和通讯卫星,但由于他与特朗普的暧昧关系,以及他的政府效率部反复无常地裁撤政府工...

特朗普的贸易协议如何瞄准中国

 彭博社: 为了安抚世界最大的市场,各国必须激怒世界最大的贸易商 在美苏 第一次冷战 中,两个超级大国通过代理人互相对抗。类似的事情也发生在美中贸易战中。在日内瓦和伦敦的和解谈判之后,双方不再用新的关税互相攻击。相反,美国正在通过不幸的第三国间接发动战争。 中国与越南的新协议以及对许多其他国家发出的新关税威胁,似乎旨在削弱中国在其供应链中的作用。那些原本希望置身于新冷战之外的国家,如今担心自己被迫选边站。为了安抚全球最大的市场,它们必须激怒全球最大的贸易商。 7月7日,美国总统唐纳德·特朗普 致信 日本、韩国和其他十几个贸易伙伴,将贸易谈判的最后期限从7月9日推迟到8月1日,并调整了谈判失败后这些国家将面临的关税。例如,日本和韩国将被征收25%的关税。柬埔寨将被征收36%的关税;缅甸和老挝将被征收40%的关税。信中还表示,任何从其他地方“转运”的商品都将面临他们试图避免的更高关税。虽然信中没有点名中国,但没有人怀疑特朗普先生所指的其他国家。 总统还威胁要对那些支持金砖国家“反美政策”的国家加征10%的关税。 金砖 国家是由中国、巴西、俄罗斯、印度以及后来的南非于2009年成立的。此前,他曾警告金砖国家不要试图取代美元作为世界主导货币的地位。 美国与越南的协议似乎将对这个亚洲国家的大部分商品征收20%的关税。不祥的是,美国还将对“任何转运”征收40%的关税。此前,美国于5月8日与英国达成了一项协议。该协议承诺,如果英国确保其供应链安全令美国满意,将对英国的铝、药品和钢铁产品给予优惠待遇。据推测,这意味着美国将减少从中国购买原材料,并允许美国对在越南的中资工厂进行审查。 萨塞克斯大学的阿楚斯·​​阿尼尔(Achyuth Anil)及其合著者指出,在贸易谈判中,一个国家如果对另一个国家实施惩罚,就给予对方好处,这“是新鲜事”。这种创新做法并未逃过中国的注意。中国商务部表示,中国坚决反对任何国家以牺牲自身利益为代价进行贸易交易。“中方不会接受,并将采取坚决反制措施。”各国必须“站在历史正确的一边”。 图表:《经济学人》 中国尚不清楚自己究竟面临什么。特朗普团队尚未明确解释转运的含义。但中国显然担心,中国会试图通过其他国家服务美国市场,以逃避关税。在特朗普发起的第一次贸易战期间,中国对美出口的产品减少,而对墨西哥、越南等国出口的产品增加。反过来,这些国家又向美国出口了...

美国新兴金融公司简街(Jane Street)资本介绍

简街是一家新兴的美国金融公司,成立于2000年,总部位于纽约,由蒂姆·雷诺兹(Tim Reynolds)和罗伯特·格兰诺夫(Robert Granovetter)等创立。它是一家量化交易公司,专注于高频交易(High-Frequency Trading, HFT)、市场制造(Market Making)和流动性提供,尤其在交易所交易基金(ETF)、债券、股票、期权和衍生品等领域表现出色。截至2025年5月,简街已成为全球金融市场中一支重要力量,其交易量在某些市场(如美国ETF市场)占据主导地位。 核心业务 : 市场制造 :简街通过提供买卖双方的报价,为市场提供流动性,尤其在ETF和固定收益产品领域表现突出。它利用复杂的算法和数学模型,确保在高波动市场中仍能提供高效的流动性。例如,2020年市场动荡期间,简街在债券ETF市场提供了关键流动性,防止了潜在的“流动性末日循环”( Jane Street: the top Wall Street firm ‘no one’s heard of’ )。 量化交易 :简街依赖量化策略,通过大数据分析和算法模型进行交易决策,追求低风险、高回报的投资机会。其交易策略通常基于统计套利和市场中性,尽量减少市场风险敞口。 技术驱动 :简街的交易系统高度依赖自主开发的软件和硬件,其技术平台能够处理海量的市场数据,并在微秒级别执行交易。几乎所有软件都使用OCaml编程语言编写,代码库约7000万行,体现了其技术深度( Jane Street Capital - Wikipedia )。 全球布局 :除了纽约总部,简街在伦敦、香港、新加坡和阿姆斯特丹设有办公室,覆盖全球主要金融市场。2025年3月,简街计划大幅扩展其香港办公室空间,显示其对亚洲市场的重视( US trading firm Jane Street seeks to rapidly expand Hong Kong office space - Reuters )。 公司文化与特色 : 技术与数学导向 :简街的员工多为数学、计算机科学或工程背景的顶尖人才,公司内部强调严谨的逻辑思维和概率分析。其招聘过程极为严格,录用率不到1%,重点招聘数学、计算机科学和金融领域的顶尖人才( Debunking The Myth: Is Jane Street A Hedge Fund? )。 低调...

美国经济如何躲避灾难

即使在关税的压力下,它仍然充满活力 4月2日,唐纳德·特朗普总统宣布“解放日”关税后,经济危机开始显现。股市暴跌;预测人士预测年内将出现经济衰退。三个月过去了,市场情绪有所缓和。商品价格 没有 明显 上涨 ,失业率持平, 标准普尔 500 指数回升 ,重回历史高位。特朗普在“解放日”一周后宣布暂停征收多项 关税90天,以安抚市场,这一 决定 将于7月9日结束。尽管他威胁要致信宣布谈判结束并恢复征收关税,但似乎没有人过于担心。 怎么回事?总统认为关税是从外国人身上榨取金钱的聪明办法,这种想法对吗?那些末日论者是不是有点儿过度了? 目前,企业、家庭和金融市场都陷入了一场复杂的观望游戏。年初,企业为应对关税而大量囤货。事实上,由于进口激增扭曲了数据,这些企业囤货的规模之大,足以拖累第一季度 GDP 增速出现赤字。 图表:《经济学人》 这些库存终将被耗尽。在很多情况下,库存已经耗尽,这意味着企业不得不再次转向进口。上个月,关税是近年来平均水平的三倍多(见图1)。从国外进口商品的公司现在面临着一个艰难的选择:要么承担关税并接受利润下降,要么将额外的成本转嫁给消费者。 到目前为止,他们大多选择了第一种方案。老板们正试图等待总统下台。如果特朗普可能会改变主意,让这一行动变得毫无意义,那为什么要用更高的价格疏远客户呢?即使在最新的消费者价格数据中,通胀率仍然略高于美联储2%的目标,也很难看出关税的影响。 图表:《经济学人》 事实上,这样做需要用经济学显微镜来观察。哈佛商学院的阿尔贝托·卡瓦洛及其合著者仔细观察了几家大型零售商受影响品类的价格,发现进口商品及其国产竞争对手的价格均略有上涨(见图2)。然而,这些价格仅上涨了一到两个百分点——远低于关税的涨幅。根据智库税务基金会的计算,美国的实际关税税率目前为12%,为近一个世纪以来的最高水平。恢复特朗普最初在解放日提出的方案将意味着大幅加息。 图表:《经济学人》 然而,奇怪的是,关税可能通过另一种机制——对经济造成冲击——压低价格。解放日的戏剧性事件打击了消费者信心,可能导致需求疲软。直到最近,这种情况才在“软”数据(调查等)中显现出来。现在,这种迹象也开始出现在“硬”数据中。最近发布的数据显示,5月份家庭支出环比下降。6月份就业数据强劲,但受到政府招聘(尤其是教师招聘)的支撑。私营部门的就业数据低于预期。 美联储亚特兰大分行发布的 G...

NYT:杭州,中国人工智能热潮的中心

  杭州西湖。在政府补贴和税收优惠政策的帮助下,这座城市已成为人工智能初创企业的聚集地。 Qilai Shen for The New York Times 那是一个阳光明媚的周六下午,数十人坐在一个后院舞台周围的草地上,怀揣科技创业梦想的创始人正在台上分享他们的想法。台下的人们懒散地伏在笔记本电脑前,一边抽着电子烟,一边喝着草莓星冰乐。一架无人机在头顶嗡嗡作响。而在室内,投资者在厨房里听取项目提案。 这一幕看起来像发生在硅谷,但它其实是良渚——中国南方城市杭州一个安静的郊区。这里因低廉的租金以及靠近阿里巴巴、DeepSeek 等科技公司,成为了吸引创业者和科技人才的热门地点。 “人们来这里探索自己的可能性,” 这次活动的主办者、36 岁的陶芳波说道,他曾在 Facebook 和阿里巴巴任职。 几乎所有这些可能性都与人工智能有关。随着中美在科技主导权上展开较量,杭州已成为中国人工智能热潮的中心。 十年前,浙江省和杭州市政府开始为新公司提供补贴和税收减免,这一政策帮助孵化了数以百计家初创企业。每逢周末,就会有人从北京、上海和深圳飞来这里招聘程序员。 最近,他们中的许多人都来到了陶芳波的后院。在阿里巴巴工作的时候,他参与创建了一个人工智能研究实验室,之后于 2022 年离职创办了自己的公司心识宇宙。现在,陶芳波的家成为了那些定居在良渚的程序员们的聚集地,他们大多是二三十岁的年轻人。他们自称 “村民”,白天在咖啡馆写代码,晚上一起打游戏,希望利用人工智能创建自己的公司。 不少科技巨头诞生在杭州,不仅有阿里巴巴和 DeepSeek,还有网易和海康威视。 今年 1 月,DeepSeek 发布了一款人工智能系统,声称成本只有硅谷企业开发同类系统的一小部分,结果震惊了科技界。自那时起,DeepSeek 和阿里巴巴开发的系统已跻身全球表现最佳的开源人工智能模型之列,这意味着任何人都可以基于它们进行开发。DeepSeek 创始人毕业于杭州的浙江大学,现在中国的科技企业争相招揽这所大学的毕业生。 “人们来这里探索自己的可能性,” 陶芳波说。在创办心识宇宙之前,他曾在 Facebook 和阿里巴巴任职。   Qilai Shen for The New York Times 浙江大学本部——玉泉校区的毛泽东塑像。一位创业者表示,政府曾帮助他与投资人建立联系。 ...

全球贸易战再次成为特朗普核心议程

 WSJ: 美国总统特朗普(Trump)周一重新点燃了他的全球贸易战,再次威胁要对贸易伙伴征收惩罚性关税,同时宣布将谈判期限延长三周以进行协议谈判。 特朗普签署了一项行政命令,延长了他所谓的对等关税的生效日期,此前的暂停期原定于周三凌晨12:01到期。此外,特朗普还致函一些国家,大体说明了在这些国家最迟8月1日不能与美国达成贸易协议的情况下将需要缴纳的关税税率。 “本届政府——总统和他的贸易团队——希望为美国人民和美国工人争取到最好的协议,”白宫新闻秘书卡罗琳·莱维特(Karoline Leavitt)说,她还表示,推迟最后期限符合美国的“最佳利益”。 到周一下午,特朗普在他的Truth Social平台上发布了致日本、韩国、马来西亚等国领导人的信函,通知他们8月1日这一最后期限。特朗普总共向14个国家发出了信函。莱维特此前曾表示,将有略微超过十二个国家收到信函,但没有具体说明这些国家是如何选择的。有超过80个国家受到4月2日首次宣布的“解放日”(Liberation Day)关税的影响。 “请理解,这些关税是必要的,是为了纠正日本多年来的关税、非关税政策和贸易壁垒,这些政策和壁垒对美国造成了不可持续的贸易逆差,”特朗普在致日本首相石破茂(Shigeru Ishiba)的信中写道。“这种逆差对我们的经济,乃至我们的国家安全,都是一个重大威胁!” 总统特朗普于周一下午发布了致日本、韩国、马来西亚等国领导人有关新关税的信函。 PHOTO: 图片来源:MARIO TAMA/GETTY IMAGES 致其他国家的信函内容几乎完全相同,只是税率因国家而异,从25%到40%不等。这些税率普遍与4月份设定的初步对等税率密切相关,不过有些略有不同(日本新的25%税率比4月份设定的24%税率高出一个百分点)。 周一晚些时候,特朗普公布了他将向突尼斯、波斯尼亚和黑塞哥维那、印度尼西亚、孟加拉国、柬埔寨、泰国和塞尔维亚领导人发出的信件,通知他们在不能与特朗普政府达成协议的情况下要在8月1日支付的关税税率。 根据Truth Social上的帖子,特朗普在信中说,他将把突尼斯的关税定为25%,波斯尼亚和黑塞哥维那的关税定为30%,印度尼西亚为32%,孟加拉国和塞尔维亚为35%,柬埔寨和泰国为36%。 在强调贸易问题之前,特朗普在过去两周取得了重大胜利,包括签署了他的税收和国内政策大型...

wsj:美国人的汽车情结正在消失

 WSJ: 我是写车评的,但我觉得自己更像是一个亲密关系协调师。五分之四的美国家庭依赖汽车来通勤、接送孩子和出行。据美国汽车协会(AAA)称,普通驾车者每天与爱车共处约一小时——比许多人与家人面对面的时间还要长。良好关系的前提是双方的般配。 但美国人对汽车的爱恋之情最近越来越淡。事实上,他们已经走到了摔盘子的阶段。自2016年以来,轻型汽车销量每年减少约170万辆,这反映出,许多比较年轻的消费者放弃了当车主的乐趣。更有数百万人仍然困在与虐心老旧汽车的有毒关系之中。根据标准普尔全球(S&P Global)的数据,目前在道路上跑的乘用车平均车龄为14.5年。 吵架的原因大都关乎金钱。根据美国劳工统计局(U.S. Bureau of Labor Statistics)的数据,2024年,养一辆汽车的总费用平均高达12,296美元,简直贵得吓人,较十年前上涨了30%左右。汽车服务和技术提供商Cox Automotive的最新数据显示,新车价格也在不断上涨——平均价格现在达到48,883美元。由于新车价格劝退了中等收入买家,二手车需求走强,目前的均价达到25,500美元左右。 去吧,摔盘子吧,你可能会觉得好受些。 汽车保险是主要压力来源之一。律商联讯风险信息公司(Lexis-Nexis Risk Solutions)的年度报告显示,平均保费继2023年飙升15%之后,2024年进一步上涨了10%。根据金融信息和服务公司Bankrate的数据,全险保费目前平均为每年2,680美元,较2024年6月上涨12%。 汽车价值缩水也是难言之隐。2024年,美国汽车协会估计,新车购入后头五年平均每年贬值4,680美元,简直让人想哭。汽车信息服务公司Edmunds报告称,在2024年最后一个季度,每四名消费者就有一名面临“车贷倒挂”——也就是说,他们所欠的贷款高于汽车的市值。 私人交通费用的飙升带来沉重的经济压力。成千上万的家庭面临被迫放弃汽车、实际上沦为二等公民的风险。我们该对被开车上班给弄得倾家荡产的一代人说什么?让他们去搭美国新造的漂亮火车吗? 亲爱的,这不仅仅事关金钱,还事关信任。让人心生疑虑的首先是新车越来越复杂的构造:涡轮增压混动和插电式混动动力总成,基于屏幕的显示和控制系统,还有高级安全系统。只要是拥有过笔记本电脑的人,都有理由质疑这些技术的“保质期”。 过高...

不要通过后视镜进行投资

 经济学人: 在一个更可预测的世界里,股票定价将易如反掌。股票赋予持有者获取一系列现金流(如股息和盈利)的权利。投资者只需预测各项现金流的未来价值,再根据现行利率、现金流风险及自身风险偏好将其折现为现值。加总所有现值,便是股票的理论价格。 然而在充满根本性不确定的现实世界中,事情要复杂得多。例如,几乎没有股票分析师会尝试预测三年后的盈利数据。但"现金流折现"模型仍具参考价值——用股价除以当前盈利,就能看出市场对未来现金流适用的折现率。历史证明,这个折现率虽不完美,却能合理指引股市长期回报:较低的折现率(即较高的市盈率)预示较低回报,反之亦然。这对投资者而言至关重要,无论是规划养老储蓄规模,还是确定股票相对于其他资产的配置比例。 如此简易的指标竟能预测未来,或许令人惊讶。更令人诧异的是,竟有如此多投资者对其视若无睹。这种前瞻性预期回报指标被学界和大型机构投资者广泛采用,事实上正是众多投资公司资本市场长期预测的基石。但散户投资者的逻辑却往往截然相反——多项调查显示,这个群体习惯以史为鉴,总是根据历史回报推演未来收益。 这种"后视镜投资法"的核心理念是:若股价近期飙升,涨势必将延续。必须承认,2009年以来的大多数时间里,这种判断确实比所谓的前瞻指标更准确。尽管2010年代美股估值持续攀升,牛市却始终未改。若因估值走高、学术模型预期回报下降而减仓,只会错失盈利机会。即便经历2022年熊市后,美股又在高于平均估值的起点重拾升势,继而一飞冲天。难怪今年每逢市场回调,散户投资者便蜂拥入场。 这种惯性思维绝非散户专利。股票分析师虽需精准预测所覆盖公司的盈利增长,却普遍采用历史数据推演法——尽管历史增长与未来增长的实际相关性实为负值。期权定价理论本应以交易者预期的未来波动率为基础,但外汇期权的隐含波动率往往与历史波幅如影随形。高盛分析师发现,过去一年这导致外汇期权交易者持续低估未来波动率,最终因经济环境剧变和地缘政治不确定性而判断失误。 "后视镜投资法"的真正隐患在于:风平浪静时无懈可击,意外来袭时溃不成军。1990年代末互联网泡沫破裂前,以及2021年股市暴跌前,押注牛市延续都显得无比英明。但这两个时期的前瞻指标均显示估值畸高、回报预期低迷,本应警示投资者控制股票仓位。当市场狂热时,这种预警会被视为扫兴的悲观论调——直到...