Machibet777 Logintitle_temp

tangbasky

Pinned

Everything You Need to Know About the GPT Series Models -From GPT1 to O1(Detailed Long-Form…

1.1 GPT-1: The Founder of the Generative Pretraining Paradigm

Mar 22

Everything You Need to Know About the GPT Series Models -From GPT1 to O1(Detailed Long-Form…

Mar 22

Published in

Data Science Collective

Long Text Processing Method — Yarn

Previously, we introduced the RoPE algorithm for handling long texts shown in the below essay. However, during inference, if the text…

4d ago

Long Text Processing Method — Yarn

4d ago

Published in

Towards AI

The Comparison between the Encoder and the Decoder

This article primarily discusses the advantages and disadvantages of large language models based on encoder and decoder architectures. Both…

May 14

The Comparison between the Encoder and the Decoder

May 14

Published in

Towards AI

The Evolution of GRPO: DAPO

Dynamic sAmpling Policy Optimization (DAPO) is actually a type of reinforcement learning optimization algorithm. To thoroughly understand…

May 12

The Evolution of GRPO: DAPO

May 12

Published in

Data Science Collective

Understanding DeepSeek R1: A Personal Perspective

To ensure your reading experience, it is recommended that you read this article after reading DeepSeek-V1, DeepSeek-V2 and DeepSeek-v3.

Apr 30

Understanding DeepSeek R1: A Personal Perspective

Apr 30

Published in

Data Science Collective

DeepSeek v3: My Take on What Matters

To ensure your reading experience, it is recommended that you read this article after reading DeepSeek-V1 and DeepSeek-V2.

Apr 27

DeepSeek v3: My Take on What Matters

Apr 27

Published in

Data Science Collective

Understanding DeepSeek v1&MoE: My A Personal Take

DeepSeek v1 mainly includes two major versions. One is the conventional version with a model structure similar to that of Llama, and the…

Apr 24

Understanding DeepSeek v1&MoE: My A Personal Take

Apr 24

Published in

Data Science Collective

Understanding DeepSeek v2: My Personal Take

The main improvements of DeepSeek-V2 compared to DeepSeek-V1 include Multi-head Latent Attention(MLA) and Group Relative Policy…

Apr 24

Understanding DeepSeek v2: My Personal Take

Apr 24

Published in

Towards Explainable AI

Understanding Llama2 and Llama3: A Personal Perspective

Llama 2

Apr 21

Understanding Llama2 and Llama3: A Personal Perspective

Apr 21

Published in

AI Mind

A personal Explaination of O1

The O1 model is actually different from the previous GPT1–4 models. In my opinion, it is a model constructed based on the chain-of-thought…

Apr 16

A personal Explaination of O1

Apr 16

tangbasky

tangbasky

Following

Help

About

Careers

Press

Blog

Privacy

Rules

Terms