back

Language detection using NLP

  Text classification is a process of classifying sentences or documents into pre-defined categories. Language detection is no different in terms of the method. During language detection, we classify text into categories, these categories however are our pre-defined languages.

Most of modern NLP frameworks use so called "text embedding" method where the text is being transformed into numerical vector and evaluated within mutli-dimensional space, other use maxent or naive bayes algorithms to evaluate the data.

No need to be an expert

You no longer need to be an NLP expert nor programmer to perform language detection on your own! You might use TEXT2DATA service with Excel Add-In or Goggle Sheets add-on to do it in literally 5 minutes!

Detecting over 50 languages in simple steps

Further in this article you will learn how to detect over 50 languages using TEXT2DATA service. Following languages are currently supported:

Amharic-am Arabic-ar Azerbaijani-az Belarusian-be Bulgarian-bg Bengali-bn Bosnian-bs Catalan-ca Chechen-ce Czech-cs Welsh-cy Danish-da German-de Greek-el English-en Spanish-es Estonian-et Basque-eu Persian-fa Finnish-fi French-fr Irish-ga Hebrew-he Croatian-hr Hungarian-hu Indonesian-id Italian-it Japanese-ja Georgian-ka Korean-ko Kazakh-kaz Latin-la Italian-it Macedonian-mk Mongolian-mn Malay-ms Dutch-nl Norwegian-no Polish-pl Portuguese-pt Romanian-ro Russian-ru Slovak-sk Slovenian-sl Serbian-sr Swedish-sv Thai-th Turkish-tr Ukrainian-uk Uzbek-uz Vietnamese-vi Chinese-zh


Simple steps

1. Once you register at text2data.com, simply go to your admin panel.

2. Under "Api classification models", find the drop-down list with pre-trained model list.

selecting model

3. Select "language_detection" and copy it to your classification models.

4. Install our Excel Add-In or Google Sheets Add-on.

5. If you have already any other custom models created, set "language_detection" model as default in TEXT2DATA admin panel or set the model name in Excel Add-In or Google Sheets add-on in service settings options.

excel model settings

6. Finally, right click/select text and click "Categorize"

run excel categorize

Once the analysis is done, open report from the right-hand side task pane.You should see the list of assigned languages next to analyzed documents. If we are not entirely sure of what language is to be detected, you will see multiple results per one document along with its probability score "category strength" (ranges from 0-1)

language detection results


Using other custom models

Using other custom models at TEXT2DATA is similar, just follow the above steps with different model selected.


Do not forget to sign up to text2data.com. Click on the image below and start testing out our sentiment analysis and text analytics tool.
sign up