Country Name Data Set

From Data Quality Wiki
Jump to: navigation, search
Country Name Data Set
Type:
Data Set
Version:
26 (2014-09-18)
Author:
David Fichtmueller (BGBM)
Status:
Beta
Format:
XML
License:
Mozilla Public License 2.0 (MPL-2)
Download Data Set
Related Tools:


The Country Names Data Set provides country names in various languages and is based on the Unicode CLDR data set. Only the relevant data of this data set was condensed into the Country Name Data Set. The CLDR and the Country Name Data Set are both copyrighted by Unicode Inc. (Copyright © 1991-2014 Unicode, Inc. All rights reserved. See Copyright Notice for details.)

The Country Name Data Set has over 26000 different country name strings from over 700 different languages and their corresponding ISO 3266-1 alpha2 country codes.

Structure

The XML file has a fairly simple structure:

<countries>
	<country code="AC">
		<name lang="af">Ascensioneiland</name>
		<name lang="am">አሴንሽን ደሴት</name>
		<name lang="ar">جزيرة أسينشيون</name>
		<name lang="ast">Islla Ascensión</name>
		<name lang="az">Yüksəliş Adası</name>
		<name lang="bg">остров Възнесение</name>
		<name lang="bn">অ্যাসসেনশন আইল্যান্ড</name>
		<name lang="br">Enez Ascension</name>
		<name lang="bs">Ostrvo Asension</name>
		<name lang="bs_Cyrl">Острво Асенсион</name>
		<!-- ... -->
		<name lang="zu">i-Ascension Island</name>
	</country>
	<country code="AD">
		<name lang="af">Andorra</name>
		<name lang="agq">Àndolà</name>
		<!-- ... -->
	</country>
	<!-- ... -->
</countries>

Duplicate Country Names

There are a few cases where the same name in different languages can refer to different counties. Though this is quite rare, any application handling this data set, should be aware of it and handle such cases accordingly. In the current version the following name are affacted:

  • 'Guyane' refers to
    • 'GF' (Guyane) in 'de' (German) and 'en' (English)
    • 'GY' (Guyana) in 'rn' (Rundi)
  • 'Nigeri' refers to
    • 'NE' (Niger) in 'naq' (Nama)
    • 'NG' (Nigeria) in 'sq' (Albanian)
  • 'Sint Martin' refers to
    • 'MF' (Saint Martin) in 'af' (Afrikaans)
    • 'SX' (Sint Maarten) in 'sv_FI' (Swedish [Finnland])
  • 'Sveti Martin' refers to
    • 'MF' (Saint Martin) in 'hr' (Croatian)
    • 'SX' (Sint Maarten) in 'sr_Latn' (Serbian)
  • 'Конго' refers to
    • 'CD' (Congo [DRC]) in 'kk' (Kazakh)
    • 'CG' (Congo [Republic]) in 'bg' (Bulgarian) and 'ru' (Russian)


Other Information

  • The version number of the Country Name Data Set corresponds to the version number of the CLDR data set it is based on.