DeepSeek AI is Free, the Comparable Version of ChatGPT Costs $200 a Month

0
433

by Mish Shedlock, Mish Talk:

Wall Street is stunned and rightfully so.

DeepSeek’s AI Model Is the Top-Rated App in the U.S.

Scientific American comments Why DeepSeek’s AI Model Just Became the Top-Rated App in the U.S.

DeepSeek’s artificial intelligence assistant made big waves Monday, becoming the top-rated app in the Apple Store and sending tech stocks into a downward tumble. What’s all the fuss about?

TRUTH LIVES on at https://sgtreport.tv/

The Chinese start-up, DeepSeek, surprised the tech industry with a new model that rivals the abilities of OpenAI’s most recent model—with far less investment and using reduced-capacity chips. The U.S. bans exports of state-of-the-art computer chips to China and limits sales of chipmaking equipment. DeepSeek, based in the eastern Chinese city of Hangzhou, reportedly had a stockpile of high-performance Nvidia A100 chips from times prior to the ban—so its engineers could have used those to develop the model. But in a key breakthrough, the start-up says it instead used much lower-powered Nvidia H800 chips to train the new model, dubbed DeepSeek-R1.

On common AI tests in mathematics and coding, DeepSeek-R1 matched the scores of Open AI’s o1 model, according to VentureBeat.

DeepSeek-R1 is free for users to download, while the comparable version of ChatGPT costs $200 a month.

Because it requires less computational power, the cost of running DeepSeek-R1 is a tenth of the cost of similar competitors, says Hanchang Cao, an incoming assistant professor in Information Systems and Operations Management at Emory University. “For academic researchers or start-ups, this difference in the cost really means a lot,” Cao says.

DeepSeek achieved its efficiency in several ways, says Anil Ananthaswamy, author of Why Machines Learn: The Elegant Math Behind Modern AI. The model has 670 billion parameters, or variables it learns from during training, making it the largest open-source large language model yet, Ananthaswamy explains. But the model uses an architecture called “mixture of experts” so that only a relevant fraction of these parameters—tens of billions instead of hundreds of billions—are activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multi-head latent attention to boost the efficiency of its inferences; and instead of predicting an answer word-by-word, it generates multiple words at once.

Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Ananthaswamy says. (The training data remains proprietary.) This means that the company’s claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves.

“One of the big things has been this divide that has opened up between academia and industry because academia has been unable to work with these really large models or do research in any meaningful way,” Ananthaswamy says. “But something like this, it’s within the reach of academia now, because you have the code.”

DeepSeek Stuns Wall Street

The Wall Street Journal reports DeepSeek Stuns Wall Street With Capability and Cost

Who saw that coming? Not Wall Street, which sold off tech stocks on Monday after the weekend news that a highly sophisticated Chinese AI model, DeepSeek, rivals Big Tech-built systems but cost a fraction to develop. The implications are likely to be far-reaching, and not merely in equities.

Enter DeepSeek, which last week released a new R1 model that claims to be as advanced as OpenAI’s on math, code and reasoning tasks. Tech gurus who inspected the model agreed. One economist asked R1 how much Donald Trump’s proposed 25% tariffs will affect Canada’s GDP, and it spit back an answer close to that of a major bank’s estimate in 12 seconds. Along with the detailed steps R1 used to get to the answer.

More startling, DeepSeek required far fewer chips to train than other advanced AI models and thus cost only an estimated $5.6 million to develop. Other advanced models cost in the neighborhood of $1 billion. Venture capitalist Marc Andreessen called it “AI’s Sputnik moment,” and he may be right.

DeepSeek is challenging assumptions about the computing power and spending needed for AI advances. OpenAI, Oracle and SoftBank last week made headlines when they announced a joint venture, Stargate, to invest up to $500 billion in building out AI infrastructure. Microsoft plans to spend $80 billion on AI data centers this year.

CEO Mark Zuckerberg on Friday said Meta would spend about $65 billion on AI projects this year and build a data center “so large that it would cover a significant part of Manhattan.” Meta expects to have 1.3 million advanced chips by the end of this year. DeepSeek’s model reportedly required as few as 10,000 to develop.

DeepSeek’s breakthrough means these tech giants may not have to spend as much to train their AI models. But it also means these firms, notably Google’s DeepMind, might lose their first-mover, technological edge.

DeepSeek is vindicating President Trump’s decision to rescind a Biden executive order that gave government far too much control over AI. Companies developing AI models that pose a “serious risk” to national security, economic security, or public health and safety would have had to notify regulators when training their models and share the results of “red-team safety tests.”

DeepSeek should also cause Republicans in Washington to rethink their antitrust obsessions with big tech. Bureaucrats aren’t capable of overseeing thousands of AI models, and more regulation would slow innovation and make it harder for U.S. companies to compete with China. As DeepSeek shows, it’s possible for a David to compete with the Goliaths. Let a thousand American AI flowers bloom.

Ignoring AI’s Potential Is Ignorant

Nate Silver says It’s Time to Come to Grips with AI

Ignoring AI’s potential is, well, ignorant

For the real leaders of the left, the issue simply isn’t on the radar. Bernie Sanders has only tweeted about “AI” once in passing, and AOC’s concerns have been limited to one tweet about “deepfakes.”

Meanwhile, the vibe from lefty public intellectuals has been smug dismissiveness. Take this seven-word tweet from Ken Klippenstein, a left-leaning journalist formerly of The Intercept who now writes a popular Substack.

Read More @ MishTalk.com