I’ve recorded an Introduction to Cocoa segment about
NSLinguisticTagger in NSBrief Podcast episode #72. On the surface it is a four minute audio but the process of conceptualizing the idea, writing the script, and finally recording (with several takes) took about a day to complete.
These are the software that I used in the process
- BombingBrain Interactive’s Teleprompt+ for Mac – for displaying the script to read and time it properly.
- Ittiam Systems’ ClearRecord Premium for iPhone – so that the audio comes out relatively noise-free although I’ve recorded it in a relatively noisy apartment near to an intersection of two main streets.
For your benefit, I’ve included the script here and you can find my recording at the end of this post.
Hi! I’m Sasmito Adibowo and in the next four minutes you’re going to learn about
NSLinguisticTagger– what is it, when do you need it, and how to use it.
NSLinguisticTaggerenables your OS X or iOS app to determine parts of speech in a body of text.
- Tokens – word, punctuation, or whitespace
- Script – the alphabet that comprise the text, whether Latin – or standard keyboard characters plus a few accented letters, Cyrillic – used in Russia and neighboring countries that were members of the former Soviet Union, Han traditional or simplified – which are used in Chinese languages like Mandarin and other dialects, and so on.
- Lemma – that is the root form of a word, among which is the singular form of a plural word.
- Lexical Class – whether a word is a noun, verb, adjective, adverb, et cetera.
- Name Type – can be a place name, person name, or organization name.
Those are called “tag schemes” in
NSLinguisticTagger'sparlance and you can find out more about them in the API documentation.
NSLinguisticTaggeruseful when you need to process natural language text. As far as I know, both OS X and iOS have good support for English whereas OS X has support for some other languages. To see what parts of speech that
NSLinguisticTaggercan identify for your favorite language, call the static method
availableTagSchemesForLanguageand pass it the two-letter language code. It will return an array of strings saying what tag schemes it can parse for that language. Be sure to run the test separately on OS X and iOS.
One example that may call for
NSLinguisticTaggeris when you’re writing a note-taking app. You could try to recognize people’s names and company names in the text and then automatically tag the entry with those names found in the body text. You could also use NSLinguisticTagger to normalize your user’s tags and use lemmas as the canonical form for each tag – so that “people” and “person” aren’t considered as two different tags.
Another example if you’re writing an e-mail client or feed reading apps – Twitter or app.net apps comes to mind – your app can try to detect the language of the text that is being displayed and offers a machine translation if the text is not in the user’s preferred language. Of course
NSLinguisticTaggerwon’t help you translate text between languages but it can help determine which language does the text belongs to.
NSLinguisticTaggeralso works great for making word clouds. You can use it to filter out noise like texts from other languages, remove un-interesting words, as well as keeping people’s names together so that it displays as one word in the resulting word cloud.
NSLinguisticTagger, you create an instance of it and specify what tag schemes that you want the instance to parse. Then call setString on it to provide the instance with the text that you want to process. Finally to start tagging the text, you call
enumerateTagsInRange: scheme: options: usingBlock:. It will call your block repeatedly and pass it each token that it finds with its corresponding tag.
I’ve posted an example project that shows you how you can use
NSLinguisticTagger. In short the app is a syntax highlighter for natural language. It takes in an English Language text and color the nouns and names in the text. Search for ColorizeWords in Github or take a look at the show notes.
That’s all I have now for
NSLinguisticTagger. Happy tagging!
Here is my recording of the podcast – you can find the complete episode in NSBrief’s podcast page.