PinnedEverything You Need to Know About the GPT Series Models -From GPT1 to O1(Detailed Long-Form…1.1 GPT-1: The Founder of the Generative Pretraining ParadigmMar 22Mar 22
Published inData Science CollectiveLong Text Processing Method — YarnPreviously, we introduced the RoPE algorithm for handling long texts shown in the below essay. However, during inference, if the text…4d ago4d ago
Published inTowards AIThe Comparison between the Encoder and the DecoderThis article primarily discusses the advantages and disadvantages of large language models based on encoder and decoder architectures. Both…May 14May 14
Published inTowards AIThe Evolution of GRPO: DAPODynamic sAmpling Policy Optimization (DAPO) is actually a type of reinforcement learning optimization algorithm. To thoroughly understand…May 12May 12
Published inData Science CollectiveUnderstanding DeepSeek R1: A Personal PerspectiveTo ensure your reading experience, it is recommended that you read this article after reading DeepSeek-V1, DeepSeek-V2 and DeepSeek-v3.Apr 30Apr 30
Published inData Science CollectiveDeepSeek v3: My Take on What MattersTo ensure your reading experience, it is recommended that you read this article after reading DeepSeek-V1 and DeepSeek-V2.Apr 27Apr 27
Published inData Science CollectiveUnderstanding DeepSeek v1&MoE: My A Personal TakeDeepSeek v1 mainly includes two major versions. One is the conventional version with a model structure similar to that of Llama, and the…Apr 24Apr 24
Published inData Science CollectiveUnderstanding DeepSeek v2: My Personal TakeThe main improvements of DeepSeek-V2 compared to DeepSeek-V1 include Multi-head Latent Attention(MLA) and Group Relative Policy…Apr 24Apr 24
Published inTowards Explainable AIUnderstanding Llama2 and Llama3: A Personal PerspectiveLlama 2Apr 21Apr 21
Published inAI MindA personal Explaination of O1The O1 model is actually different from the previous GPT1–4 models. In my opinion, it is a model constructed based on the chain-of-thought…Apr 16Apr 16