๐Ÿ“– Tutorial
June 9, 2026 ยท 8 min read

Getting Started with Chinese LLMs: A Developer's Guide

Everything you need to know about Chinese large language models โ€” from what they are to how to use them in your projects through a single OpenAI-compatible API.

๐Ÿ“‘ Table of Contents

1. What Are Chinese LLMs? 2. Why Developers Should Care 2.1 Benchmark Performance 2.2 Pricing Advantage 3. Getting Started with tokencnn 3.1 Python Example 3.2 Node.js Example 3.3 cURL Example 4. Recommended Models 5. Best Practices 6. Next Steps

1. What Are Chinese LLMs?

Chinese Large Language Models are AI models developed by Chinese technology companies. They're trained on massive datasets that include both English and Chinese text, giving them unique capabilities in understanding and generating content across languages and cultural contexts.

The major Chinese LLMs include:

2. Why Developers Should Care

Chinese LLMs have rapidly become some of the best-performing and most cost-effective models available globally. Here's why you should pay attention.

2.1 Benchmark Performance

Chinese models now compete head-to-head with GPT-4, Claude, and Gemini on key benchmarks:

Model MMLU HumanEval GSM8K Chinese
DeepSeek-V4 Pro 91.2% 92.7% 96.3% ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ
Qwen-Max 88.4% 86.5% 93.1% ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ
GLM-4 86.3% 82.1% 91.8% ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ
ERNIE 4.0 87.1% 79.4% 92.5% ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ

Many Chinese models rival or surpass leading Western models on standard benchmarks, while offering significantly better performance on Chinese-language tasks and Chinese cultural contexts.

2.2 Pricing Advantage

Chinese LLMs offer exceptional value. Here's a cost comparison with Western equivalents (per million tokens):

Task Best Chinese Model Price Comparable Western Model Price
Reasoning DeepSeek-R1 $0.83 o1 $15.00
General Chat DeepSeek-V4 Pro $0.21 GPT-4o $2.50
General Chat (Lite) Qwen-Plus $0.12 GPT-4o-mini $0.15
Free Tier GLM-4-Flash $0.00 โ€” โ€”

๐Ÿ’ก DeepSeek-R1 costs $0.83/1M tokens through tokencnn vs $15.00/1M for o1 โ€” that's 95% cheaper for comparable reasoning performance.

3. Getting Started with tokencnn

The easiest way to access Chinese LLMs is through the tokencnn API gateway (powered by AI Nexus). Simply replace your OpenAI base URL โ€” that's it.

Our standard base URL is:

https://www.tokencnn.com/v1

โš ๏ธ Prerequisites: Sign up at tokencnn.com to get your API key (sk-nex-...). Free credits included โ€” no payment method required.

3.1 Python Example

Using the OpenAI Python SDK (v1.0+):

# pip install openai
from openai import OpenAI

client = OpenAI(
  api_key="sk-nex-your-api-key-here",
  base_url="https://www.tokencnn.com/v1"
)

# Simple chat completion
response = client.chat.completions.create(
  model="deepseek-chat",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain Chinese LLMs in 3 bullet points."}
  ],
  temperature=0.7,
  max_tokens=500
)

print(response.choices[0].message.content)

Streaming example:

stream = client.chat.completions.create(
  model="deepseek-chat",
  messages=[{"role": "user", "content": "Write a short poem."}],
  stream=True
)
for chunk in stream:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

3.2 Node.js Example

Using the OpenAI Node.js SDK:

// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-nex-your-api-key-here",
  baseURL: "https://www.tokencnn.com/v1",
});

async function main() {
  const response = await client.chat.completions.create({
    model: "qwen-max",
    messages: [{ role: "user", content: "What's the capital of China?" }],
  });
  console.log(response.choices[0].message.content);
}
main();

3.3 cURL Example

For quick testing in the terminal:

curl https://www.tokencnn.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-nex-your-api-key-here" \
  -d '{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

5. Best Practices

Start with Free Models

Use GLM-4-Flash for development and testing. It's free and supports all standard OpenAI-compatible features. Only switch to paid models when you need the extra performance.

Use a Single API Key

With tokencnn, one API key gives you access to 30+ Chinese models. No need to manage separate keys for DeepSeek, Alibaba, Baidu, and Zhipu. Keep your key secure using environment variables:

# .env file
OPENAI_API_KEY=sk-nex-your-api-key-here
OPENAI_BASE_URL=https://www.tokencnn.com/v1

Model Selection Strategy

Error Handling

Our API returns standard HTTP status codes. Common ones to handle:

Rate Limiting

Start with conservative request rates and increase gradually. Use streaming for real-time responses. Implement retries with exponential backoff for production applications.

import time
import random

def call_with_retry(client, model, messages, max_retries=3):
  for attempt in range(max_retries):
    try:
      return client.chat.completions.create(
        model=model, messages=messages
      )
    except Exception as e:
      if attempt == max_retries - 1: raise
      time.sleep(2 ** attempt + random.uniform(0, 1))

Leverage Free Credits

New tokencnn accounts receive free credits on signup. Use them to experiment with different models before committing to a pricing plan. No payment method is required to get started.

6. Next Steps

You now have everything you need to start building with Chinese LLMs. Here's what to do next:

  1. Sign up at tokencnn.com โ€” get your API key instantly
  2. Try GLM-4-Flash โ€” it's free, no strings attached
  3. Experiment with DeepSeek-V4 Pro โ€” use your free credits
  4. Test multiple models โ€” switch model names to find your best fit
  5. Go to production โ€” one API key, one base URL, 30+ models
๐Ÿš€ Get Started Free

๐Ÿ“– Read the Documentation