Data Profiles and handling for small Internet-of-Things devices
(Part of WP1 Research, D15 Datasets and data processing approach. To identify what data needs to be captured and in what format to be transmitted. Not specific to the selected sensors, more generic information. ED - Research exercise with some of ED's customers and partners regarding indoor monitoring. Research will probably focus around healthy living and working environments ED - Research exercise with some of ED's customers and partners regarding outdoor monitoring. Research will probably focus on energy related outdoor environment.)
Data to be processed potentially includes:
Aspects of processing potentially include:
Note that it is a useful abstraction to separate the place and thing that is being measured (eg like a "meter point") from the sensor cluster that happens to be doing the measuring, eg so that data can come via various routes manually and automatically, and to allow replacements and upgrades in the field. In the comms model as of 2014/05/29 such abstraction happens downstream of the concentrator/redistributor and might be done by logically associating a combination of (sensor and) leaf node ID, concentrator ID and time-window with a particular "meter point" or equivalent. Redeployment or replacement of a sensor/node should create a new association, and forcing the sensor/node to change its ID (eg to a new random one) may help with this so ensuring that a "meter point" maps to one or more sensor/node IDs but usually one sensor/node ID (with all its readings) is unique to one meter point.
See also some previous discussions around protocols and formats for OpenTRV:
See the notes in 2015/05/29 Data Sets and Processing meeting notes for an illustrative cursory cut of the sorts of inputs (sensor and other data set) and outputs (metrics/KPIs) for a building health use case.
Note that for the mobile sensors, eg parked on people, a chat with Paul Tanner (2015/05/29) suggest that some variant on the BuggyAir technology with data backhaul through staff phones or WiFi or Bluetooth Smart (and/or providing location as beacons) might be suitable, and that measures NO2, CO2, PM2.5.
Bruno's (EnergyDeck CTO) D15 note 2015/06/07...
# D15: Datasets and data processing # ## Sensor data sets ## - Time stamp in ISO 8601 format with time zone - Globally unique ID for the sensor (this can be a combination of a concentrator ID + sensor number or any other value that is globally unique) - Value - Unit (this may be sent in initial frame only or following request from data platform) - Frame number (used to identify missing frames, sequential number that can potentially loop, in which case we need to identify the looping logic to ensure missing frame detection works at the loop point) ## Data processing on the platform ## 1. Check that the frame can be parsed, if not return an error (HTTP ???). 2. Check the sensor ID, if known fetch sensor meta-data, otherwise assume new sensor. The platform may auto-create the new sensor or send an error code back depending on internal logic. Note that this internal logic can also depend on how the sensors have been commissioned. 3. If existing sensor, check that frame number received = last frame + 1. 4. If unit is specified check that it matches the unit known for the sensor. If unit mismatch, send back hard error code. 5. Store the value and send return code: - All OK: HTTP 200 or equivalent - Step 3 shows missing frame: non-critical error code asking for missing frames, specifying last received frame + last received time stamp. - Step 4 no unit stored against sensor and none provided: non-critical error asking for unit to be sent in next frame. Note that we need to have a unit value for unit-less numbers. See SenML for unit values. We may want to extend on what SenML provide but we should also be compatible with it. ## Data processing of return codes on the device ## 1. If all OK, stop processing. 2. If missing frame code, send missing frames in one or multiple messages. If some of the frames are no longer available, send a frame specifying "unknown" for those. 3. If unit required code, mark the sensor as needing to send unit on next frame. ## Commissioning ## To be done in D38.
Interoperability and discoverability are important for large IoT deployments, where there is no time to hand-craft solutions for integrating sensor data sources.
Bruno's (EnergyDeck CTO) D15 note 2015/065/16: SenML and HyperCat.
# D15: Hypercat / SenML representation # In order for the data manipulated by the system to be distributed over HyperCat[1], it needs to be serialisable as SenML[2] ## SenML representation ## SenML is a simple representation that can carry a number of data points for multiple data series in a single data frame. It can be serialised to XML or JSON. A SenML frame has the ability to specify base values for a number of attributes to avoid having them repeated in individual data point entries. This can be particularly useful to convey concentrator and device level attributes. JSON generated from a OpenTRV device for a single sensor: [ "2015-06-16T00:01:17Z", "", {"@":"4b62","+":7,"vac|h":74,"v|%":0,"tT|C":5} ] Possible SenML representation (assuming 12345678 is the ID of the concentrator): { "bn": "urn:dev:id:12345678/4b62/", "bt": 1433894477, "e": [ { "n": "+", "v": 7 }, { "n": "vac", "v": 74, "u": "h" }, { "n": "v", "v": 0, "u": "%" }, { "n": "tT", "v": 5, "u": "Cel" } ] } When dealing with multiple entries with different time stamps, such as: [ "2015-06-16T00:01:17Z", "", {"@":"4b62","+":7,"vac|h":74,"v|%":0,"tT|C":5} ] [ "2015-06-16T00:01:39Z", "", {"@":"6363","+":3,"vac|h":26,"v|%":0,"tT|C":7} ] [ "2015-06-16T00:02:39Z", "", {"@":"6363","+":4,"vC|%":328,"T|C16":329,"O":1} ] We could move the device ID to the "n" attribute: { "bn": "urn:dev:id:12345678/", "bt": 1433894477, "e": [ { "n": "4b62/+", "v": 7, "t": 0 }, { "n": "4b62/vac", "v": 74, "t": 0, "u": "h" }, { "n": "4b62/v", "v": 0, "t": 0, "u": "%" }, { "n": "4b62/tT", "v": 5, "t": 0, "u": "Cel" }, { "n": "6363/+", "v": 3, "t": 22 }, { "n": "6363/vac", "v": 26, "t": 22, "u": "h" }, { "n": "6363/v", "v": 0, "t": 22, "u": "%" }, { "n": "6363/tT", "v": 7, "t": 22, "u": "Cel" }, { "n": "6363/+", "v": 4, "t": 22 }, { "n": "6363/vC", "v": 328, "t": 22, "u": "%" }, { "n": "6363/T", "v": 20.5625, "t": 22, "u": "Cel" }, { "n": "6363/O", "v": 1, "t": 22 } ] } Or when dealing with multiple entries for a single sensor: { "bn": "urn:dev:id:12345678/4b62/tT", "bt": 1433894477, "bu": "Cel", "e": [ { "v": 5, "t": 0 }, { "v": 6, "t": 5 } ] } Note on units: SenML support a limited number of units as standard. However, those units can be extended by using any unit in the UCUM standard[3] by prefixing the name with "UCUM:". This has an implication for non-standard units such as C16 that should be transformed into a standard unit. Note on my notes: I didn't convert the "h" unit as I can't remember what it is. Note on timestamps: they are formatted as an integer that is the number of seconds in the UNIX epoch. If using this format, we should ensure that those values are always in the UTC time zone. ## HyperCat catalogue ## HyperCat adds catalogue capability on top of the data at a well known URL for a particular service. That URL has a top level catalogue that points to other catalogues. So for example, the following catalogue has one sub-catalogue for devices: { "item-metadata": [ { "rel": "urn:X-tsbiot:rels:isContentType", "val": "application/vnd.tsbiot.catalogue+json" }, { "rel": "urn:X-tsbiot:rels:hasDescription:en", "val": "all catalogues" } ], "items": [ { "href": "/cats/devices", "i-object-metadata": [ { "rel": "urn:X-tsbiot:rels:isContentType", "val": "application/vnd.tsbiot.catalogue+json" }, { "rel": "urn:X-tsbiot:rels:hasDescription:en", "val": "Devices" } ] } ] } And that sub-catalogue lists the sensors: { "item-metadata": [ { "rel": "urn:X-tsbiot:rels:isContentType", "val": "application/vnd.tsbiot.catalogue+json" }, { "rel": "urn:X-tsbiot:rels:hasDescription:en", "val": "Devices" } ], "items": [ { "href": "https://config28.flexeye.com/v1/iot_Default/dms/Eseye_DM/devices/Device_1131", "i-object-metadata": [ { "rel": "urn:X-tsbiot:rels:hasDescription:en", "val": "Funky sensor" }, { "rel": "http://purl.oclc.org/NET/ssnx/ssn#SensingDevice", "val": "Sensor" }, { "rel": "urn:X-tsbiot:rels:isContentType", "val": "application/json" }, { "rel": "urn:X-senml:u", "val": "https://config28.flexeye.com/v1/iot_Default/dms/Eseye_DM/devices/Device_1131/senML/json" }, { "rel": "http://www.loa-cnr.it/ontologies/DUL.owl#hasLocation", "val": "https://config28.flexeye.com/v1/iot_Default/applications/eyeHack" } ] } ] } Note that there may be several levels of catalogues and that the leaf catalogue tends to list individual sensors on a single leaf node. The hierarchy of catalogues could be something like: /cats/concentrator/XXX/device/YYY/sensors/ Or /cats/device/XXX/YYY/sensors/ We should also include in the "i-object-metadata" structure important information such as unit, metric, name, etc. Some of those may be repeated in the SenML data but are useful in the catalogue to enable filtering. One option at the catalogue level, rather than specify the unit, would be to specify the metric (e.g. "temperature" rather than "°C") as this is enough for a platform to understand haw to handle the sensor, assuming it can handle all units in that metric. EnergyDeck will probably have a catalogue URL that follows the following pattern (this will be confirmed during implementation): /cats | root catalogue of catalogues /cats/assets | catalogue of assets /cats/metering-points | catalogue of metering points across all assets /cats/asset/x | catalogue of catalogues for asset x /cats/asset/x/assets | sub-catalogue of assets for asset x /cats/asset/x/metering-points | catalogue of metering points (~devices) attached to an asset /cats/asset/x/metering-point/y | catalogue of catalogues for MP y associated with asset x /cats/metering-point/y | ... with direct access shortcut /cats/metering-point/y/linked | catalogue of MPs related to MP y /cats/metering-point/y/series | catalogue of data series for MP y /cats/metering-point/y/series/z/ | catalogue of catalogues for series z in MP y /cats/metering-point/y/series/z/raw | raw data points for series z in MP y /cats/metering-point/y/series/z/1m | data points at 1 minute granularity /cats/metering-point/y/series/z/30m | data points at 30 minute granularity /cats/metering-point/y/series/z/1Y | data points at 1 year granularity ## References ## [1] http://www.hypercat.io/ [2] https://tools.ietf.org/id/draft-jennings-senml-10.txt [3] http://unitsofmeasure.org/ucum.html
2015/06/22:
Bruno and Damon discussed the desirability of completely mechanical conversions
from JSON sensor units to UCUM/SenML units to minimise or eliminate magic
'mappings' that require sophisticated developer time (in line with IBM
suggestions). One particular issue that cam eup is being able to represent
value as integers (for brevity and to keep code small on the sensors)
and scale to integers for transit, when the natural scaling is a power of two,
eg temperatures from common sensors with four significant bits after the binary
point, ie that are currently being sent with units |C16
for
"Celsius times 16". Bruno was going to investigate. One possible escape
hatch is with the Ki/Mi/Gi/Ti "special prefix symbols for powers of 2".
2015/06/27: Note: mechanical translatability from any binary formats used (such as OpenThings, TinyHAN profiles or application-specific hand-crafted) is highly desirable for the same reasons, eg so that the concentrator/redistributor can convert them mechanically for downstream fan-out and make the data discoverable. That may imply a plug-in at the concentrator per upstream (binary) format to convert to a common presentation and processing format such as JSON and/or SenML.
Bruno's (EnergyDeck CTO) D15 note 2015/06/19: config file format.
# Disk based configuration format # The current `-dhd` option in the code automatically creates a number of stats handlers. All the command line driven code does the same so the core of the configuration should be a list of handlers with options. Using a JSON format, we could have something like: { "handlers": [ { "name": "handler name", "type": "uk.org.opentrv.comms.statshandlers.builtin.DummyStatsHandler", "options": { "option1": "value1" } } ] } The list of options is then specific to a particular handler type. Questions: - Is it sensible to have a fully qualified Java class name as the type? - Should the name be mandatory? We need an anonymous handler option for wrapped handlers anyway so we could also rely on the index in the handlers array. Example with a RKDAP handler: { "handlers": [ { "name": "EnergyDeck stats handler", "type": "uk.org.comms.http.RkdapHandler", "options": { "dadID": "ED256", "url": "https://energydeck.com" } } ] } Example with a wrapped handler: { "handlers": [ { "name": "My async handler", "type": "uk.org.opentrv.comms.statshandlers.filter.AsyncStatsHandlerWrapper", "options": { "handler": { "type": "uk.org.opentrv.comms.statshandlers.builtin.SimpleFileLoggingStatsHandler", "options": { "statsDirName": "stats" } }, "maxQueueSize": 32 } } ] } Full `-dhd` flag example: { "handlers": [ { "name": "File log", "type": "uk.org.opentrv.comms.statshandlers.builtin.SimpleFileLoggingStatsHandler", "options": { "file": "out_test/stats" } }, { "name": "Twitter Temp b39a", "type": "uk.org.opentrv.comms.statshandlers.builtin.twitter.SingleTwitterChannelTemperature", "options": { "hexID": "b39a" } }, { "name": "Twitter Temp 819c", "type": "uk.org.opentrv.comms.statshandlers.builtin.twitter.SingleTwitterChannelTemperature", "options": { "hexID": "819c" } }, { "name": "Recent stats file", "type": "uk.org.opentrv.comms.statshandlers.filter.SimpleStaticFilterStatsHandlerWrapper", "options": { "handler": { "type": "uk.org.opentrv.comms.statshandlers.builtin.RecentStatsWindowFileWriter", "options": { "targetFile": "out_test/edx.json" } }, "allowedIDs": [ "b39a", "819c" ] } }, { "name": "EMON CMS", "type": "uk.org.opentrv.comms.statshandlers.builtin.openemon.OpenEnergyMonitorPostSimple", "options": { "credentials": "emonserver1", "sourceIDIn": "819c", "statsTypeIn": "{", "mapping": { "T|C16": "Temp16", "B|cV": "BattcV", "L": "L" }, "emonNodeOut": "819c" } } ] } The implication of that configuration file is that the existing stats handlers will need to be refactored so that all handlers use a one argument constructor that takes a configuration object. A possible extension to that format would be to have a mapping between short name and fully qualified Java class at the beginning of the file to simplify the handler definitions.
DHD note: it should eventually also be possible to inline credentials in the config optionally rather than have them out of line as now. In part that should make remote management more sane and simple.