Skip to content

Commit

Permalink
Add WrappedProcessor to help reading MVTs (#222)
Browse files Browse the repository at this point in the history
My ultimate aim is to read an mbtiles file with a single zoom level and
extract features from it. It contains compressed MVT data. To that end,
I wrote
https://github.com/acteng/will-it-fit/blob/main/data_prep/fix_osmm/src/main.rs
using geozero and some other crates. It works like this:

1) Use the `mbtiles` crate to open the file
2) Calculate all tiles in the file
3) For each tile, read the raw data
4) Use `flate2` to gunzip it
5) Decode into the `geozero::mvt::Tile` proto
6) Use the existing mvt geozero reader to process all layers.
7) But because the geometry in each tile is scaled to fit that tile, I
need to transform the coordinates back into WGS84. IIUC the geozero
approach correctly, doing that translation as I go is more performant
than collecting into something like an FGB writer first and then trying
to go back and transform everything. So, I wrote `WrappedProcessor` to
delegate to another `FeatureProcessor` (an `FgbWriter` in this example),
but call a function on every coordinate first

This PR adds the `WrappedProcessor`, in case it might be helpful for
other use cases. But I'm not sure this approach is the nicest one --
maybe the
[mvt::process](https://github.com/georust/geozero/blob/c8a5f9103fc5ecc0ae9c7fcd2663b094e620da38/geozero/src/mvt/mvt_reader.rs#L17)
function should instead take optional info about the current tile (the
tile x and y, the zoom level, and the extent), plumb it through the
private methods in that file, and apply in
[process_coord](https://github.com/georust/geozero/blob/c8a5f9103fc5ecc0ae9c7fcd2663b094e620da38/geozero/src/mvt/mvt_reader.rs#L104)?

I'm not opinionated about the best way to read an mvt tile doing this
coordinate transformation, so happy to implement the other approach or
something else entirely. Thanks!

---------

Co-authored-by: Michael Kirk <[email protected]>
  • Loading branch information
dabreegster and michaelkirk committed Sep 4, 2024
1 parent 73902dd commit c30f80e
Show file tree
Hide file tree
Showing 4 changed files with 312 additions and 0 deletions.
5 changes: 5 additions & 0 deletions geozero/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

* Added `WrappedXYProcessor` for pre-processing XY coordinates.
* <https://github.com/georust/geozero/pull/222>

## 0.13.0 - (2024-05-17)

* Fixed converting 2D geos::Geometry to ewkt
Expand Down
37 changes: 37 additions & 0 deletions geozero/src/geometry_processor.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use crate::error::{GeozeroError, Result};
use crate::WrappedXYProcessor;

/// Dimensions requested for processing
#[derive(Default, Clone, Copy)]
Expand Down Expand Up @@ -380,6 +381,42 @@ pub trait GeomProcessor {
fn tin_end(&mut self, idx: usize) -> Result<()> {
Ok(())
}

/// Combinator which inserts a call to `transform_xy` during processing, before [GeomProcessor::xy]
/// or [GeomProcessor::coordinate] is called.
///
/// Useful for pipelining multiple processors, e.g. to project your coordinates before outputting
/// to a particular format.
///
/// ```
/// # #[cfg(all(feature = "with-wkt", feature = "with-geojson"))]
/// # {
/// use geozero::geojson::GeoJson;
/// use geozero::wkt::WktWriter;
/// use crate::geozero::GeozeroGeometry;
/// use crate::geozero::GeomProcessor;
/// let input = GeoJson(r#"{ "type": "Point", "coordinates": [1.1, 1.2] }"#);
///
/// let mut output = vec![] ;
/// let mut wkt_writer = WktWriter::new(&mut output).pre_process_xy(|x: &mut f64, y: &mut f64| {
/// // likely you would do something more interesting here, like project your coordinates
/// *x += 1.0;
/// *y += 1.0;
/// });
///
/// input.process_geom(&mut wkt_writer).unwrap();
/// assert_eq!(String::from_utf8(output).unwrap(), "POINT(2.1 2.2)");
/// # }
/// ```
fn pre_process_xy<F: Fn(&mut f64, &mut f64)>(
self,
transform_xy: F,
) -> WrappedXYProcessor<Self, F>
where
Self: Sized,
{
WrappedXYProcessor::new(self, transform_xy)
}
}

#[test]
Expand Down
2 changes: 2 additions & 0 deletions geozero/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,14 @@ mod feature_processor;
mod geometry_processor;
mod multiplex;
mod property_processor;
mod wrap;

pub use api::*;
pub use feature_processor::*;
pub use geometry_processor::*;
pub use multiplex::*;
pub use property_processor::*;
pub use wrap::*;

#[cfg(feature = "with-csv")]
pub mod csv;
Expand Down
268 changes: 268 additions & 0 deletions geozero/src/wrap.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
use crate::{
error::Result, ColumnValue, CoordDimensions, FeatureProcessor, GeomProcessor, PropertyProcessor,
};

/// Wraps another [`FeatureProcessor`], first transforming coordinates.
pub struct WrappedXYProcessor<T, F: Fn(&mut f64, &mut f64)> {
/// The underlying FeatureProcessor
pub inner: T,
pre_process_xy: F,
}

impl<T, F: Fn(&mut f64, &mut f64)> WrappedXYProcessor<T, F> {
/// Wraps an inner [`FeatureProcessor`], calling `transform_coordinates` on [GeomProcessor::xy]
/// and [GeomProcessor::coordinate] first. The function takes and returns `(x, y)`.
pub fn new(inner: T, pre_process_xy: F) -> Self {
Self {
inner,
pre_process_xy,
}
}

pub fn into_inner(self) -> T {
self.inner
}
}

// The trait has many default implementations, but every single call must be specified here to
// delegate
impl<T: GeomProcessor, F: Fn(&mut f64, &mut f64)> GeomProcessor for WrappedXYProcessor<T, F> {
fn dimensions(&self) -> CoordDimensions {
self.inner.dimensions()
}
fn multi_dim(&self) -> bool {
self.inner.multi_dim()
}
fn srid(&mut self, srid: Option<i32>) -> Result<()> {
self.inner.srid(srid)
}
fn xy(&mut self, mut x: f64, mut y: f64, idx: usize) -> Result<()> {
(self.pre_process_xy)(&mut x, &mut y);
self.inner.xy(x, y, idx)
}
fn coordinate(
&mut self,
mut x: f64,
mut y: f64,
z: Option<f64>,
m: Option<f64>,
t: Option<f64>,
tm: Option<u64>,
idx: usize,
) -> Result<()> {
(self.pre_process_xy)(&mut x, &mut y);
self.inner.coordinate(x, y, z, m, t, tm, idx)
}
fn empty_point(&mut self, idx: usize) -> Result<()> {
self.inner.empty_point(idx)
}
fn point_begin(&mut self, idx: usize) -> Result<()> {
self.inner.point_begin(idx)
}
fn point_end(&mut self, idx: usize) -> Result<()> {
self.inner.point_end(idx)
}
fn multipoint_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.multipoint_begin(size, idx)
}
fn multipoint_end(&mut self, idx: usize) -> Result<()> {
self.inner.multipoint_end(idx)
}
fn linestring_begin(&mut self, tagged: bool, size: usize, idx: usize) -> Result<()> {
self.inner.linestring_begin(tagged, size, idx)
}
fn linestring_end(&mut self, tagged: bool, idx: usize) -> Result<()> {
self.inner.linestring_end(tagged, idx)
}
fn multilinestring_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.multilinestring_begin(size, idx)
}
fn multilinestring_end(&mut self, idx: usize) -> Result<()> {
self.inner.multilinestring_end(idx)
}
fn polygon_begin(&mut self, tagged: bool, size: usize, idx: usize) -> Result<()> {
self.inner.polygon_begin(tagged, size, idx)
}
fn polygon_end(&mut self, tagged: bool, idx: usize) -> Result<()> {
self.inner.polygon_end(tagged, idx)
}
fn multipolygon_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.multipolygon_begin(size, idx)
}
fn multipolygon_end(&mut self, idx: usize) -> Result<()> {
self.inner.multipolygon_end(idx)
}
fn geometrycollection_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.geometrycollection_begin(size, idx)
}
fn geometrycollection_end(&mut self, idx: usize) -> Result<()> {
self.inner.geometrycollection_end(idx)
}
fn circularstring_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.circularstring_begin(size, idx)
}
fn circularstring_end(&mut self, idx: usize) -> Result<()> {
self.inner.circularstring_end(idx)
}
fn compoundcurve_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.compoundcurve_begin(size, idx)
}
fn compoundcurve_end(&mut self, idx: usize) -> Result<()> {
self.inner.compoundcurve_end(idx)
}
fn curvepolygon_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.curvepolygon_begin(size, idx)
}
fn curvepolygon_end(&mut self, idx: usize) -> Result<()> {
self.inner.curvepolygon_end(idx)
}
fn multicurve_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.multicurve_begin(size, idx)
}
fn multicurve_end(&mut self, idx: usize) -> Result<()> {
self.inner.multicurve_end(idx)
}
fn multisurface_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.multisurface_begin(size, idx)
}
fn multisurface_end(&mut self, idx: usize) -> Result<()> {
self.inner.multisurface_end(idx)
}
fn triangle_begin(&mut self, tagged: bool, size: usize, idx: usize) -> Result<()> {
self.inner.triangle_begin(tagged, size, idx)
}
fn triangle_end(&mut self, tagged: bool, idx: usize) -> Result<()> {
self.inner.triangle_end(tagged, idx)
}
fn polyhedralsurface_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.polyhedralsurface_begin(size, idx)
}
fn polyhedralsurface_end(&mut self, idx: usize) -> Result<()> {
self.inner.polyhedralsurface_end(idx)
}
fn tin_begin(&mut self, size: usize, idx: usize) -> Result<()> {
self.inner.tin_begin(size, idx)
}
fn tin_end(&mut self, idx: usize) -> Result<()> {
self.inner.tin_end(idx)
}
}

impl<T: PropertyProcessor, F: Fn(&mut f64, &mut f64)> PropertyProcessor
for WrappedXYProcessor<T, F>
{
fn property(&mut self, idx: usize, name: &str, value: &ColumnValue<'_>) -> Result<bool> {
self.inner.property(idx, name, value)
}
}

impl<T: FeatureProcessor, F: Fn(&mut f64, &mut f64)> FeatureProcessor for WrappedXYProcessor<T, F> {
fn dataset_begin(&mut self, name: Option<&str>) -> Result<()> {
self.inner.dataset_begin(name)
}
fn dataset_end(&mut self) -> Result<()> {
self.inner.dataset_end()
}
fn feature_begin(&mut self, idx: u64) -> Result<()> {
self.inner.feature_begin(idx)
}
fn feature_end(&mut self, idx: u64) -> Result<()> {
self.inner.feature_end(idx)
}
fn properties_begin(&mut self) -> Result<()> {
self.inner.properties_begin()
}
fn properties_end(&mut self) -> Result<()> {
self.inner.properties_end()
}
fn geometry_begin(&mut self) -> Result<()> {
self.inner.geometry_begin()
}
fn geometry_end(&mut self) -> Result<()> {
self.inner.geometry_end()
}
}

#[cfg(all(feature = "with-csv", feature = "with-geojson"))]
#[cfg(test)]
mod test {
use crate::csv::CsvWriter;
use crate::geojson::GeoJsonString;
use crate::{GeomProcessor, GeozeroDatasource};
use serde_json::json;

fn geojson_fixture_data() -> GeoJsonString {
GeoJsonString(
json!({
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"population": 100
},
"geometry": {
"type": "Point",
"coordinates": [1.0, 2.0]
}
},
{
"type": "Feature",
"properties": {
"population": 200
},
"geometry": {
"type": "Point",
"coordinates": [3.0, 4.0]
}
}
]
})
.to_string(),
)
}

#[test]
fn test_pre_process() {
let mut geojson = geojson_fixture_data();
let mut out = Vec::new();
{
let mut transforming_csv_writer =
CsvWriter::new(&mut out).pre_process_xy(|x: &mut f64, y: &mut f64| {
*x += 1.0;
*y += 2.0;
});
geojson.process(&mut transforming_csv_writer).unwrap();
}
assert_eq!(
String::from_utf8(out).unwrap(),
"geometry,population\nPOINT(2 4),100\nPOINT(4 6),200\n"
);
}

#[test]
fn multiple_transforms() {
let mut geojson = geojson_fixture_data();
let mut out = Vec::new();
{
let mut transforming_csv_writer = CsvWriter::new(&mut out)
.pre_process_xy(|x: &mut f64, y: &mut f64| {
*x += 1.0;
*y += 2.0;
})
.pre_process_xy(|x: &mut f64, y: &mut f64| {
// It might be surprising that this second transformation is applied before the first.
// It makes sense if you think about each subsequent call as an "insert first", but
// it's admittedly potentially confusing.
*x *= 2.0;
*y *= 2.0;
});
geojson.process(&mut transforming_csv_writer).unwrap();
}

assert_eq!(
String::from_utf8(out).unwrap(),
"geometry,population\nPOINT(3 6),100\nPOINT(7 10),200\n"
);
}
}

0 comments on commit c30f80e

Please sign in to comment.