On Utilizing Parallel Sentence Corpora

Most of us have relied on a dictionary when working on an essay or writing assignment.  In my studies in Chinese I have often relied on the dictionary app Pleco to complete my assignments.  But sometimes we come to see the limitations of what a dictionary can do.

For example, the word “lost” can have range of meanings in English:
“I lost my wallet.” vs “I lost money in the stock market.” vs “I lost the competition.”

If you look up the word “lost” in a dictionary, you might find:
失去的 or 丢失的

If you hastily select one of these words you may fail to realize that this word in Chinese does not have the same range of meaning as it does in English.  For example, the words above cannot be used in the sentence “I lost the competition”.  This is where utilizing parallel sentence corpora can be very helpful.

What are parallel sentence corpora?
Parallel sentence corpora or parallel text corpora are large databases that store both original sentences and translated sentences from various sources (i.e. textbooks, popular books, songs, etc.)  By searching these databases, you can see how words are used in authentic ways.  For example:

http://jukuu.com/search.php?q=lost+wallet
http://jukuu.com/search.php?q=lost+stock+market
http://jukuu.com/search.php?q=lost+competition

Jukuu is primarily for English/Chinese and English/Japanese sentences.  Tatoeba is another website that has accumulated millions of sentences (and their translation) across 200+ languages.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s