Feb 9, 2011 at 5:13 PM

I noticed that DotSpatial uses a new data provider: OgrVectorProvider. By default, OgrVectorProvider is used for opening shapefiles.

There is also a ShapefileDataProvider in the DotSpatial.Data source code. I have couple of questions:

  1. How does DotSpatial decide which VectorProvider to use?
  2. How does OgrVectorProvider and ShapefileDataProvider compare in terms of speed and memory usage?



Feb 9, 2011 at 5:38 PM

1) first come first serve.  So if you want to change the priority, you need to change the order in the DataProviders list on the DataManager.

2) I have no idea.  This was I believe an experimental implementation by some folks and I have no idea how good the OgrVectorProvider is when working with shapefiles.  I know our default implementation likes to load the vectors into ram, so once loaded should be fairly quick.  I suspect that the OgrVectorProvider would just do the same thing, though I haven't looked at it.  If it loads it into memory, and also loads the attributes into memory, some medium sized shapefiles (100k records or so) will likely trip your memory limits.  If it behaves like our default implementation, (which I doubt) it will avoid loading attributes if there are more than 50k records.  In such a case, memory is likely not any worse for using Ogr.  Performance I woudn't count on.  Personally, I think doing anything through a proxy like GDAL ends up being slower, but if it is written well, you shouldn't experience a major change.  If it is poorly written, then it could be a disaster.  It might be worth doing some tests jiri.  It is only one line of code to change the priority in the DataManager so you could get a test shapefile and switch back and forth between enabling and disabling OGR.  You could even completely remove the OGR Data provider from the list just to be sure.  Do some tests and keep whichever one works best.



Feb 9, 2011 at 6:28 PM
I recommend using http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx for any such tests.
Feb 9, 2011 at 6:35 PM

Yes, that is how I usually do it.  I'm afraid I have never been so well-off that I have had a variant of visual studio with actual performance tracking stuff.  I think java has it built in for everyone to use, but I have not tried it yet.  That type of stuff is more about finding what the bottlenecks are in code that should be running faster but isn't.  For something like this, you are not concerned with WHY ogr is slow, so much as a simple comparison between the two, so a stopwatch is precise enough to give you results.  The worst performance is expected for polygons, so you can probably prioritize the polygon testing first, and see how that goes.  Lines and points may be simpler and it might be harder to find discrepancies.



Feb 10, 2011 at 10:24 AM

While stopwatch tells you it is fast or not, you still might not know where the bottleneck is?

Why don't we apply for an OpenSource license for JetBrains' dotTrace Profiler (performance/memory) and maybe dotCover might be worth it, too.


Feb 16, 2011 at 1:01 AM

Thanks for your comments.

I'm not saying that OgrVectorProvider is slow by any means but I'd be interested in comparing it with ShapefileDataProvider in terms of speed and memory usage.

Is there a way of specifying via code, which vector provider to use? I tried DataManager.DefaultDataManager.OpenVector but this always seems to pick the default vector provider based on the file extension.


Feb 20, 2011 at 6:45 PM

FYI, I added the OrgVectorProvider.  I actually added it to get KML support.  Shape just happens to come for free.

When the project was added I turned off compilation for this this project in the solution build configuration so it would not replace the standard shape provider.  If it is getting in the way just remove it form the projects that are being built.

As for speed.  I did very non scientific comparisons on relatively small data sets. (100K linear features with large # of vertices per feature.) and found it perhaps 10%-15% slower than the standard dot spatial shp provider.

The OgrProvider simply loads the features to a FeatureSet so after initial load it should be the same performance as the standard provider.... except... it also loads all attribute data on initial load so in cases where you need the attributes it may actually be faster.

I'm on to another project right now but I hope I'll get back to this before long.