How do we store data. EAM/CMMS, GIS, IoT, Blobs

Created: 2024.07.19

Some companies seem to think that everything should be stored the same way in the same database. We respectfully disagree regardless of which place you pick, and if you run performance tests and look at database sizes, we submit you will see why we are correct. We currently split our thought process into several optimizations:

  1. EAM/CMMS maintenance records and related, so names, dates, history records of changes, transactional data we think SQL.

  2. Geospatial data is very unique data and requires optimized indexing and storage methods to be efficient. At the same time, if you store a lot of non-geospatial data, you are going to hurt performance of both your geospatial data and your non-geospatial data. We think ArcGIS, SQL optimized for geospatial. (Back around 1990, the president wrote a GeoSpacial index based GIS database called MapEasy that competed with the likes of ArcInfo)

  3. PDF Files, images, videos, audio records: Blob storage/File system with meta data in a sql database.

  4. IoT with millions per second of readings: database optimized to run literally more than 100 times faster than the alternatives like standard SQL- with setup sharding etc.., heavily optimized for millions of readings per second and for processing (for reports, gauges, charts, big data requests) of 100's of millions to multi-billion record queries (not RETURNING multi-billions though due to compression it will do that faster too, but more optimized for processing billions and returning summary results.). Part of the performance is due to intelligent data aware compression which means the database is typically 91 to 96% smaller than storing in a non-optimized database. That means that 100 Terabytes of IoT data in a traditional SQL database (often even worse if you store it in a geospatial optimized data) only takes 4-9 Terabytes, that means reading into memory is 10 times or more faster just because the time to transmit the data from the database to the processing engine is so much faster. Of course, if you stored regular data or worse geospatial data in this database or likely even worse blob data, it would be worse in this database.

  5. Engineering data. AutoCAD files, with meta data stored in SQL

  6. Accounting data. While CMMS works with and generates lots of accounting data, that accounting data (which is also CMMS data) should not be stored in the GIS or have the 'single source of truth' being the GIS.

  7. Audit data, Long term changes of every change, older Audit data moved to slower, cheaper storage, for access when needed.

  8. Logs, very large append only files, routinely deleted after a short time.

The point is that each storage method is optimized for the storage, given the best overall performance, limits, requirements, access, data rules (double entry requirements in Accounting.) Having UI systems working with different data sources is a problem that was solved decades ago, it is minutes of setup for magnitudes of order better results for decades.

If someone tells you they are storing your EAM/CMMS data in a GIS optimized database - check very carefully their performance and data sizes, it means they either think your non GIS data is trivial or they didn't design their solution with EAM/CMMS in mind.