Convenient Representations of XML/JSON

Phil Last · Post by **Phil Last** » Sat Nov 22, 2014 12:40 am

A good summary. No disagreement but I came away with a different emphasis.

I think we all came away agreeing that the tabular form of ⎕XML was not necessarily the easiest thing to work with and that something a bit closer to - if not an exact copy of - APL data would be more appropriate.

JSON to APL.
Agreed JSON object keys ("strings") don't necessarily map to APL names so in general converting JSON objects to APL spaces is out. We need one or other of the array formats enumerated above.

APL to JSON.
Wouldn't it be nice if we could serialize absolutely anything. Without some pre-agreed protocol - Stephen suggested a distinguished first element (perhaps "Dyalog 8.0") - we can't serialize:

''

⍬

JSON

null

But spaces, albeit without functions or methods, would seem ideal candidates to be represented as JSON objects. The corollary would be that JSON objects should map to APL spaces.

So, a contradiction. Stephen pointed out that a "round trip" doesn't necessarily imply a pair of mutual inverses. We can start with a space:

      s←{⍵⊣⍵.(a b c)←0 1 2}⎕ns'' ⍝ (i)

produce intermediate JSON

      '{a: 0, b: 1, c: 2}' ⍝ (ii)

and end up back in APL with array

      s←((,'a')(,'b')(,'c'))(0 1 2) ⍝ (iii)

But if we also start with array

      a←((,'a')(,'b')(,'c'))(0 1 2) ⍝ (iv)

two things that started out different are now the same.

It was unclear to me whether Morten's original question implied an eventual one-size-fits-all solution or perhaps an option or two. Given that objects originating from sources other than APL are quite unlikely to have string keys being valid APL names perhaps the default should be an associative array as (iii) with an option to render objects as spaces (i) to be used if we are sure of the source.

Chris pointed out that as most of the contention surrounds the representation of dictionaries then perhaps an APL enhanced protocol for JSON should be called JOHNSON. We were, after all, meeting within three or four furlongs of the great man's house.

Phil Last · Post by **Phil Last** » Sat Nov 22, 2014 7:09 pm

Of course there's another aspect to this that I don't think any of us noticed.

If JSON objects are decoded as APL arrays such that JSON string

      '{"one": 1, "two": "two", "three": [0,1,2]}'    ⍝ (a)

becomes APL array

      .→--------------------------------------------.
      | .→--------------------. .→----------------. |
      | | .→--. .→--. .→----. | |   .→--. .→----. | |
      | | |one| |two| |three| | | 1 |two| |0 1 2| | |
      | | '---' '---' '-----' | |   '---' '~----' | |
      | '∊--------------------' '∊----------------' |
      '∊--------------------------------------------'

what is APL array

      ('one' 'two' 'three')(1 'two'(0 1 2))       ⍝ (b)

going to look like in JSON? Given that it is an array one would expect it to map to a JSON array rather than an object:

      '[["one", "two", "three"],[1, "two",[0,1,2]]]'    ⍝ (c)

After all, there can be no justification to convert (b) into an object merely because it happens to have two items. It might have three next time around.
So from APL we have a round trip that starts out as an APL space, is encoded into a JSON object which is decoded to an APL array.
From JSON we have a round trip that starts out as a JSON object, is decoded to an APL arrays which is encoded to a JSON array.
It appears that our consensus has led us into a situation where everything, no matter how or where it starts decays to an array after only one or two transfers.

sjt · Post by **sjt** » Fri Nov 28, 2014 11:00 am

On further reflection I find two flaws in my earlier proposal; one annoying, the other fatal. That proposal was to represent both dictionaries and tables as CSV-style matrices and to extend ⊃ to treat the strings in the first row as as the keys.

The other representations of a dictionary – two n-length vectors (2n); or an n+1 length vector in which the first element is an n-length vector of keys – can both easily provide a default value for an invalid key. Simply append the default value to the list of values. A matrix representation of a dictionary does not lend itself to this convenience.

The values in a table are its columns. But the matrix representation does not distinguish between a dictionary and a 1-row table. So it is unclear whether ⊃ should return elements from the second row or those same elements encased and ravelled to length-1 vectors. As far as I can see, this flaw is fatal.

That leaves the 2n and n+1 structures. Our app uses n+1 with the help of two syntax sweeteners:

      pop←{(⊃⍵)(1↓⍵)}
      push←{⎕IO←1 ⋄ (1⊃⍵),2⊃⍵}

But a dictionary is also a mapping. We use another syntax sweetener

      map1←{k v←pop ⍺ ⋄ v[k⍳⍵]} ⍝ n+1 structure
      map2←{k v←⍺ ⋄ v[k⍳⍵]} ⍝ 2n
      (push 'alias' 'name'⊃table) map1 aliases ⍝ n+1 structure
      ('alias' 'name'⊃table) map2 aliases ⍝ 2n structure

Now we seem to be getting somewhere. The utility of a dictionary as a map favours the 2n representation.

Does the with-default usage help us?

      ((push 'alias' 'name'⊃table),⊂'unknown') map1 aliases ⍝ n+1 structure
      (('alias' 'name'⊃table),¨⍬ (⊂'unknown')) map2 aliases ⍝ 2n structure

Attaching the default to an n+1 is slicker than to the 2n but not enough outweigh the push required to convert the result of indexing table into a dictionary ⍺ for map1.

In any case, it begs a further syntax sweetener to clarify the writer’s intent. For 2n dictionaries:

      def←{⍺,¨⍬ (⊂⍵)} ⍝ default

Now we can chain multiple lookups. Suppose people and postcodes are two CSV matrixes.

      ⊃map/('' def⍨'postcode' 'city'⊃postcodes)('name' 'postcode'⊃people) names

Conclusion

Extend the ⍺ of ⊃ to depth-1 or depth-2 character vectors where ⍵ is a CSV matrix or a 2n array. From a matrix ⍵ return the 1↓ of columns. From a 2n array, return corresponding elements from the second vector.

Now we can dispense with map:

      names ⊃ 'name' 'postcode'⊃people

and if we allow ⊃ to return an n+1th value from a dictionary:

      'goldfish' ⊃ ('cow' 'horse')('food' 'transport')
INDEX ERROR
      'goldfish' ⊃ ('cow' 'horse')('food' 'transport' 'pet')
pet

Then we can chain our lookups thus:

      ⊃(⊃⍨)/('' def⍨'postcode' 'city'⊃postcodes)('name' 'postcode'⊃people) names

Tomas Gustafsson · Post by **Tomas Gustafsson** » Fri Nov 28, 2014 6:44 pm

(Edit: As said below, 7 cents - i am aware of ⎕XML, so this is more me thinking loud)

Without really trying to understand all complexities mentioned in the earlier posts... i kind of intuitively feel that there should be a native, direct mapping between APL namespaces (optionally with data) <--> XML.

In both directions:

- reading an XML string should create a namespace structure (starting from any one namespace i specify, and onwards).
- exporting data from that (or any else?) namespace structure should create an XML string.

One thing that immediately comes to my mind is that XML can have multiple same-named keys at same level, but we cannot have a namespace holding identically named sub-namespaces. Or perhaps we could? After all, in this one case the name conflict would be intentional. Would be sufficient to use, index, pass the same-named namespaces, no need to explicitly point at any of them using a name. Features would be like for the unnamed namespaces, but they would be named with the XML key instead of unnamed.

I didn't consider the deep thoughts in the earlier posts here. Was only thinking of my practical work, where for example i read, experiment, analyze the National Land Survey's terrain databases (those that hold among others the 1 million buildings i process in the simulator). Here we talk about GML. Getting that directly in as namespaces + data - THAT sounds helpful. To control the manipulation, i'd seek after some kind of directory object, that i could both read from an XML string, or create from an APL namespace structure. Perhaps i could then control the XML import and export using that directory object - for example import only a part of the XML string, by doing markings in the directory object?

Or, perhaps, if i had the option to do the import XML -> APL without getting the actual big data? Just get the namespace (XML keys) structure, and then somehow use that to control the data import/export? Marking 0/1 (active/passive), whatever..

Maybe that's possible already with various toolsets, and ofc with own (slowish) custom code anything is possible.

We know how convenient it is to manipulate the APL data residing in a namespace. Just modd it, increase, decrease. Sound like a rather cool way to manipulate XML; import it (or part of, using the "directory control"), modify it, and export (all or part of it)...

The last level of each XML branch apparently holds the data. Sure that could be mapped to APL somehow? Key = variable name. Multiple same-named? Apparently something special is needed here, but still i talk for a native, direct mapping. It's always possible to say DOMAIN ERROR if the data isn't fit.

7 cents :-).

Dick Bowman · Post by **Dick Bowman** » Tue Dec 02, 2014 3:31 pm

Coming late to this and wondering...

There seem to be two issues here, a PROCESS which turns the (alien-to-APL) JSON format into a REPRESENTATION which we can manipulate in the workspace.

My sense (and I could easily be wrong) is that the REPRESENTATION is almost immaterial - it ought to be relatively easy for the APL programmer to mutate data between the various forms being discussed above, and I can envisage that different application scenarios might be amenable to different representations. And the language extensions sketched above could prove useful for any data, not just JSON-originated.

But what might be harder for an APL programer is to deal (sensibly/efficiently) with reading and writing JSON files. I seem to recall doing something with them a year or so back, settling on a minimal case that satisfied my immediate needs. Either I'm still using it and have now forgotten where (because it's trouble-free) or else I just stopped bothering. But I think there are some special cases and edge conditions (mostly to do with missing/void data), and I think that this is where Dyalog can bring special skills to the table.

So, my punchline is...

Don't much mind how the data gets represented, but I would like a quick/easy/reliable way to read/write JSON files.

Dyalog Forums

Convenient Representations of XML/JSON

Re: Convenient Representations of XML/JSON

Re: Convenient Representations of XML/JSON

Re: Convenient Representations of XML/JSON

Re: Convenient Representations of XML/JSON

Re: Convenient Representations of XML/JSON