Using osmium https://docs.osmcode.org/osmium/latest/osmium-tags-filter.html on my local machine, I've been able to filter and keep all nodes/relations/ways on the globe that have the aeroway
tag by running the following on the complete planet file:
(base) jarvis@MacBook-Pro-4 data % gtime -v osmium tags-filter planet-231002.osm.pbf aeroway -o planet-aeroways-231002-5.osm
[======================================================================] 100%
Command being timed: "osmium tags-filter planet-231002.osm.pbf aeroway -o planet-aeroways-231002-5.osm"
User time (seconds): 1967.62
System time (seconds): 191.60
Percent of CPU this job got: 796%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:31.04
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3297744
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7
Minor (reclaiming a frame) page faults: 956777
Voluntary context switches: 156
Involuntary context switches: 2594349
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 16384
Exit status: 0
Running on a 2023 MacBook Pro(Sonoma 14.0) with Apple M2 Pro chip and 32G of RAM. I can see that the tool is able to leverage the multiple cores on the machine. Great stuff, takes 4.5 minutes(wall clock).
I want to run the same on EC2 and I have tried multiple machine setups, thus far none have worked.
My last attempt is on a debian-12-arm64-20230711-1438 AMI, running on a c7g.large instance type(has enough RAM according to gtime above). I've started it over an hour ago, but according to top -i
, it has only been allocated 10 min of CPU time:
Tasks: 112 total, 1 running, 111 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.7 us, 0.2 sy, 0.0 ni, 44.4 id, 49.7 wa, 0.0 hi, 0.0 si, 0.0
MiB Mem : 3830.5 total, 3575.7 free, 274.2 used, 116.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 3556.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
535 admin 20 0 180484 32704 4844 S 12.0 0.8 10:29.54
What I find fishy as well here is that the %CPU never go above 8%(but top sometimes reports 20%) when I look at the monitor on EC2:
All of this begs the questions:
EDIT:
Below is the osmium version on the EC2 debian instance:
admin@ip-10-0-147-70:~$ osmium --version
osmium version 1.15.0
libosmium version 2.18.0
Supported PBF compression types: none zlib lz4
Aha!
The culprit was me choosing a legacy volume for storage! In an attempt to be frugal, I chose a "standard" magnetic volume type! I tried running the same osmium tags filter command again with all specs identical except a gp3 volume instead. The top output is a lot more healthy:
top - 15:06:51 up 52 min, 3 users, load average: 1.45, 0.77, 0.62
Tasks: 112 total, 1 running, 111 sleeping, 0 stopped, 0 zombie
%Cpu(s): 50.4 us, 2.2 sy, 0.0 ni, 33.0 id, 14.4 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3830.5 total, 287.8 free, 375.9 used, 3354.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 3454.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2732 admin 20 0 180884 61780 4448 S 105.7 1.6 3:43.10 osmium