GSOC 2019 : Improve smbcmp - Best format to use for diffs

Tue Jul 2 14:35:52 UTC 2019

"P Mairo via samba-technical" <samba-technical at lists.samba.org> writes:
> Hello,
> Yesterday I discovered the json output of Tshark and as I have to implement
> better diffs, I thought why not?
> So looking at it more closely today, I realized that it would be more
> appropriate because all the irrelevant fields we discussed about previously
> are automatically removed and the data structure is convenient for
> navigation.
> As a sum up, I want to deviate a little bit from the initial plan (use XML)
> to reach the same goal (better diffs).
> What are your insights on this?

I've had a quick look and I see a couple of problems with the JSON
output(s):

* Using -T json (sample)

        "smb2": {
          "SMB2 Header": {
            "smb2.server_component_smb2": "",
            "smb2.header_len": "64",
            "smb2.credit.charge": "0",
            "smb2.channel_sequence": "0",
            "smb2.reserved": "00:00",
            "smb2.cmd": "0",
            "smb2.credits.requested": "2",
            "smb2.flags": "0x00000000",
            "smb2.flags_tree": {
              "smb2.flags.response": "0",
              "smb2.flags.async": "0",
              "smb2.flags.chained": "0",
              "smb2.flags.signature": "0",
              "smb2.flags.priority_mask": "0",
              "smb2.flags.dfs": "0",
              "smb2.flags.replay": "0"
            },

- No summary lines

You will need to use some other ouput for them. PDML doesnt have it
either which is why we also need PSML.

- No human readable field name and description

It uses the field abbreviated names but we want the human readable
ones. e.g.

    "smb2.negotiate_context.hash_algorithm": "0x00000001",

vs

    <field name="smb2.negotiate_context.hash_algorithm"
           showname="HashAlgorithm: SHA-512 (0x0001)" <----- this
           size="2" pos="186" show="0x00000001" value="0100"/>

- JSON dictionnary entries are not ordered

e.g. if you parse

{"foo":"0", "bar":"1"}

And try to dump it again, depending the implementation you cannot
guarantee "foo" will be printed before "bar". In python it seems [1] you
can tell the parser to use OrderedDict as the underlying storage so this
might be doable.

I've also looked at the other json output format "ek" and "jsonraw" but
the same limitations apply unfortunately.

1: https://stackoverflow.com/questions/6921699/can-i-get-json-to-load-into-an-ordereddict

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)