The art of distillation

forumaster 发表于 2025-3-28 18:11:18

The art of distillation As weall aware that, in recent days, DeepSeek, a rising Chinese start-up techcompany has been gaining momentum and stirring an epic internet buzz, therelease of DeepSeek R1 model causes shock waves across AI industry and rattledSilicon Valley and Wall Street. As we’rewitnessing their reputation takes a meteoric rise across domestic China and theworld, the majority of Chinese people prided ourselves with the tremendousbreakthrough that DeepSeek has done so far, I just can’t help but parody a poetrycreated by Li Yu from the Tang dynasty, in which it says “问君能有几多酬，恰似一江春水向东流”and many of us fancying about “gone are the days of the dominationand monopolization of the US and it is high time we reap what we’ve been rippedoff. While inthe meantime, OpenAI said it has evidence that DeepSeek used “distillation” of its GPT models to train theopen-source V3 and R1 models at a fraction of the cost of what Western techgiants are spending on their own models, which reinforces my suspicionregarding there’s high possibility that DeepSeek, or a group liked to DeepSeekexfiltrated large amounts of data through an application programming interface(API) from OpenAI. Anyway,let’s not the major point for today, I think what really matters to us is howwe can better employ the reasoning process of AI models to our advantage, howwe can further understand and appreciate the art of distillation. Nowwithout further ado, I’d like to share with you the concept of introducingdistillation method as a way to develop a super brain so as to navigate bothinput and output in a more energy-centered and result-oriented fashion when itcomes to advance your English level and pursue language proficiency. Accordingto NLM (National library of medicine) , “The human brain is able to handle morethan 100tn parameters — or pieces of data,” which is alevel of computing power that has’t been matched by anysilicon computer. That is to say, our capacity to store data and handleparameters equals infinite. Now, the bullet point is, how can we maximize studyefficiency and sharpen the agility of kinesthetic response? Theanswer lies in “distill” and “train”, if DeepSeek takes advantage of the API ofOpenAI, training their smaller models to mimic the behaviour of larger, moresophisticated models. It’s just like shooting fish in other people’s barrel, inaddition, with the joint effort of a dozen super-talented Chinese mind, sooneror later, they knock Silicon Valley out of the park provided that DeepSeek orcompany as such has a sustainable means of finding a concrete object to distillupon, or in other word, steal. In fact,the distillation process is a simple concept which can be interpreted into twowords: “filter” and “extract”. From an English learner’s perspective and to myway of thinking, we need to have an empty-cup mindset and learn the rope ofindulging ourselves in solitude, only then can we cultivate a sheer focusthrough which we single out (filter) the best farmland available to ploughahead, especially when you’re at the preliminary stage. Now, let’smove on to “extract”, in the light of my hands-on experience working in the ITindustry and to a state-of-the-art AI company’s way of thinking, they collectoceans of data from every retrievable corner of the Internet, then setupvarious parameters to locate, identify and sort out building blocks for futureemployment of deduction, In addition to its Generative Pre-trained Transformerand Floating point operation, with hundreds of billions parameters at their disposal,they’re now able to accommodate the ever more sophisticated needs of the worldpopulation. And all we need is to have a good command of approximately 20thousand words and 5 thousand phrases, so that we can stand the test of the survivalmode of language independence. Roughlyspeaking, our input capacity heavily depends on the amount of information weconsume through listening and reading, while our output capability isdetermined by the number of rounds we've practiced through speaking andwriting. But this is not the whole case, we must nurture a distinctive andeffective way to process data and develop a singular, popular approach togenerate information. In the computer world, we call it Logical reasoning, itis powered by algorithm, the human mind, Intellectual exploration, it is drivenby mindset. How toconstruct a mindset which would allow us to elbow our way up to the top? “Amidstthe ocean of waters, only indulging in a sip from the ladle of humility. Amongthe mountain of books, only savoring a few from the eyes of serenity.” 【弱水三千，只取一瓢小饮；书山高矗，只挑经典细品。】 Fromhere on out, embrace solitude to savor the power of concentration anddedication. Repel distractions from the top of the pyramid who went the wholenine yards trying to imposed on you.

页: [1]

影视英语角's Archiver

The art of distillation