3

Closed

Unicode Character Support in Attribute Data

description

There is a problem in showing unicode characters, UTF-8, Persian and other languages character sets when loading .dbf file attribute data fields. These characters are shown by "?" character.
Please provide help on this issue.

file attachments

Closed by

comments

mudnug wrote Apr 20, 2012 at 5:20 PM

Can you attach a sample dbf and shape file?

rozaraja wrote Apr 20, 2012 at 6:25 PM

Thanks for your answer.
I attached the states sample data. I have created new datacolumn "new_label" to the .dbf file for saving Persian name of states. for example I saved "داکوتای شمالی" in 4th row (North Dakota). But when I close and reload the map and labels It shows "??????? ?????" instead of "داکوتای شمالی".

mudnug wrote May 10, 2012 at 5:21 PM

This error seems to be related to the way the data is saved.

rozaraja wrote May 10, 2012 at 7:07 PM

Thanks for your answer.
But when I open the attribute file using software like arcGIS it shows the content correctly. Is there any way to see these unicode chars correctly using dotspatial library?

mudnug wrote May 14, 2012 at 7:21 PM

We tried opening this in Arc GIS and other products with the same results. It is most likely the .dbf part of a shapefile that holds the attributes is not written in unicode: http://forums.arcgis.com/threads/35454-Cannot-display-arabic-font

mudnug wrote May 14, 2012 at 7:21 PM

Same results meaning: ???

mudnug wrote May 29, 2012 at 8:17 PM

FObermaier wrote May 30, 2012 at 9:44 AM

This is how ESRI handles the codepage issue internally
http://support.esri.com/en/knowledgebase/techarticles/detail/21106
The essence:
There is a LanguageDriver Id (LDID) (byte 29) that refers to some codepage.
(see http://code.google.com/p/sharpmapv2/source/browse/trunk/SharpMap.Data.Providers/ShapeFileProvider/DbaseEncodingRegistry.cs)
Alternativly there is an ASCII *.cpg file that holds that value.
If neither is present/set Windows ANSI/MultiByte is assumed.

FObermaier wrote May 30, 2012 at 9:45 AM

Btw, for sharpmap, we've found it helpful to manually set the Encoding since there seem to be (a lot of) shapefile/dbase combinations where that does not help either.

mudnug wrote Jul 3, 2012 at 3:12 PM

arnolde wrote Jan 5 at 12:50 AM

The attached shapefile (states_UnicodeIssue.shp) will work (in ArcMap) on a system that is localized for Arabic, but not on other systems. This is why it works for rozaraja and not mudnug.

You can set the ANSI code page your system (assuming Windows 7) at Control Panel->Region and Language->Administrative under "Language for non-Unicode programs". ESRI apparently supports this setting when the DBF file does not have it's Language Driver ID (LDID) correctly set. The attached shapefile has the LDID set to 0x00 which is not valid. This is basically what FObermaier states above ("If neither is present/set Windows ANSI/MultiByte is assumed.")

arnolde wrote Jan 18 at 6:28 PM

...sorry, I meant Persian. :)

arnolde wrote Jan 22 at 11:36 PM

I've attached source code to support the LanguageDriverID in shapefiles along with their associated text encoding. There are two files: A modified DotSpatial.Data.AttributeTable, and a new support class: DotSpatial.Data.DbaseLocaleRegistry taken and modified from SharpMap v.2.

I will also upload a sample file with Chinese. Note that now that I've examined the Persian example previously attached (states_UnicodeIssue.shp) at the byte level, I see that Persian text is already lost in this file. My guess is it was converted to question marks (which is what is actually in the file) when the file was written. Therefore, it is not a valid file for testing.

arnolde wrote Jan 22 at 11:37 PM

DbaseLocaleRegistry attached.

arnolde wrote Jan 22 at 11:46 PM

This Chinese file is compressed using 7z (http://www.7-zip.org/) as it supports Unicode filenames.

arnolde wrote Jan 23 at 6:51 PM

I found a bug in the Write portion which corrupts files written by the new code. I've therefore deleted the AttributeTable.cs attachment. I will update and test again before posting.

Also: there are unit tests for DotSpatial, correct? Do they require VS2012? I have VS2010, and do not see them when loading the VS2010 solution.

arnolde wrote Jan 23 at 10:04 PM

AttributeTable.cs corrected and attached.