We provide a worldwide collection of speech datasets that are diverse, scalable, and meticulously transcribed, perfect for training machines to accurately recognize and understand different types of languages.
Step 1 Machine annotate
Step 2 Human transcribe / Validate
Step 3 3 rounds QA by human & machine
Accuracy between 95%~98%
Surfing Tech applies its own algorithm during speech annotation to ensure high efficiency and accuracy. We achieve above 95% accuracy rate after three rounds of quality inspection, which makes the datasets more valuable for speech recognition, semantic understanding, and human-computer interaction.
Chinese Mandarin: 10,000 speakers
Chinese Conversation: 500 speakers
Children Mandarin: 10,000 speakers
Senior Mandarin: 800 speakers
Hakka Dialect: 2,000 speakers
Southwest China: 1,000 speakers
Central China: 1,000 speakers
Mandarin-English Mixed: 9,000 speakers
American English: 1,500 speakers
Australian English: 1,000 speakers
Singaporean English: 300 speakers
French Conversation: 500 speakers
Contact us if you need us.