PostGIS: ST_Equals false when ST_Intersection=100% of geometry?

المشرف العام

Administrator
طاقم الإدارة
I have 2 datasets consisting of cadastral parcel data - roughly 125,000 rows each. The geometry column is WKB polygons representing parcel boundaries; all data are geometrically valid (the polygons are closed etc).

Some recent data arrived in a different projection to the base data being used for a comparison job - so I reprojected the newer one (base was 4326; the other was WGA94 that got brought into PostGIS as 900914... I reprojected it to 4326).

The first stage of the analysis was to find and store non-matching parcels; part of that is to identify and store parcels with identical geometry.

So I ran a very standard query (the code block below abstracts away schema details etc):

create table matchdata as select a.* from gg2014 a, gg2013 b where ST_Equals(a.g1,b.g1)ZERO results.

"Odd..." I thought. "Perhaps there have been tiny vertex shifts caused by the reprojection: that would be annoying, and really shouldn't happen."

Fortunately there is abundant aspatial data (5 identifier columns) that enable me to establish parcels that should be spatially identical: those with the same identifier, whose change-date in the 2014 table was before the max change-date in the 2013 data. That amounted to 120,086 distinct rows.

I stored the identifiers and geometries in a separate table (match_id), and ran the following query:

select apid, bpid, ST_Area(ag::geometry) as aa, ST_Area(bg::geometry) as ab, ST_Area(ST_Intersection(ag,bg)::geometry)/ST_Area(ag::geometry) as inta, ST_Area(ST_Intersection(ag,bg)::geometry)/ST_Area(ag::geometry) as intbfrom match_idorder by intaThe first 16 values for inta and intb were identically zero, the next 456 were 0.99999999-ish (min 0.99999999999994, max 0.999999999999999), and rows 473 onwards were 1 - until row 120050, when the area of the intersection was greater than either geometry (the greatest value for inta and intb was 1.00000000000029, but still).

So here's my conundrum: if two geometries intersect spatially by between 99.999999999994% and 100.000000000029% of their respective areas, I would like "ST_Equals" to say "Yep.... I'll give you that one. Close enough".

After all, it's equivalent to being out by about 1 part in 16 trillion... i.e., as if the US national debt was off by less than 93 cents.

In the context of the circumference of the Earth (at ~40,000km), it's like being off by 0.0000000025km, tops (since to result in an area difference that small, any vertex shift must be even smaller).

According to TFD (which I have R'd) the tolerance for ST_Intersects() is notionally 0.00001m (1mm), so the implied changes in the vertices (which I confess I have not checked: I will ST_Dump() them and do so) would seem to be smaller than the tolerance. (I realise that ST_Intersects !== ST_Intersection(), but it's the only tolerance mentioned).

I have not been able to find out the corresponding tolerance for the vertex comparison undertaken by ST_Equals()... but it seems really odd that at least 120,000 of my rows ought to pass any sensible assessment of spatial identity, but don't.

(Note: I also did the same exercise using ::geography - with results that had more variability, but still more than 110,000 entries with a nice clean '1').

Is there a way to loosen the tolerance of ST_Equals, that doesn't require digging into the interstices of the code? I am not keen on doing that.

If not, is there a kludge that anyone is aware of?

Note: it would be good if the 'kludge' wasn't doing a bilateral comparison like

where ST_within(g1, ST_Buffer(g2, 0.0000001)) and ST_within(g2, ST_Buffer(g1, 0.0000001)) - I've done that: sure, it works... but it's a gigantic documentation PITA).I can work around this, but writing the 20 pages to document the workaround - which will only ever come up again if we get dodgy data - is a PITA that I would rather not have to do given that it's likely to be a one-off.

(Versions: Postgresql 9.3.5; PostGIS 2.1.3)



أكثر...
 
أعلى