Page 1 of 2

Convenient Representations of XML/JSON

Posted: Sat Nov 15, 2014 2:23 am
by Morten|Dyalog
The Dyalog development team is considering adding a system function to convert the popular JSON format to and from one or more APL representations. Several alternative representations are being considered. Very briefly, the most likely candidates are:

1. The format used by ⎕XML (more or less).
2. A similar format, but instead of indicating the depth of nesting with an integer column, use APL array nesting and enclose lower levels.
3. Vectors of nested name/value pairs.
4. Use namespaces to represent JSON objects.

In particular, given that ⎕XML has been out for some years now, we are interested to hear from people who have used this tool "in anger", to hear whether the matrix format has been convenient, or a nested format would have been more useful.

If you do not wish to post responses in public, please send responses to support@dyalog.com or to me personally.

Thanks in advance!

Morten

Re: Convenient Representations of XML/JSON

Posted: Sat Nov 15, 2014 11:01 am
by Budgie
As someone who has used A+ in anger, I would like to recommend to you the concept of the "slotfiller". Obviously I'm not going to suggest it be added to APL exactly as it is in A+, but it is a useful concept which would make some APL processing much simpler. This would require an enhancement to be made to the dyadic mixed function Pick, to allow it to take character strings in its left argument.

For anyone unfamiliar with the concept of a slotfiller, it is basically a nested array of name/value pairs which in APL would be represented as a vector of unique character vectors representing the slot names, followed by a vector of the data (which can contain more slotfillers); for example

      sf←('abc' 'def' 'ghij' 'klmnop')((2 2⍴⍳4)('Mary had a little lamb')1.234(0 0 0 0 0))

In A+ Pick is used to extract items from the slotfiller, similar to
      'def'⊃sf
Mary had a little lamb


JSON objects are similar to these slotfillers, but they are "inverted" in that the data occur in name/value pairs rather than all the names coming first followed by all the values, but the concept is the same.

So if one has the following JSON object (taken from the Wikipedia article, with some additions) in variable JSmith
      {
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"height_cm": 167.6,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"children": ["John","Paul","George","Ringo"],
"spouse": "Mary"
}

one could use an enhanced ⊃ (Pick) to extract the data, as follows:

      'spouse'⊃JSmith
Mary
'address' 'streetaddress'⊃JSmith
21 2nd Street
'children' 3⊃JSmith
George


So, to sum up, I am recommending your No.3 (name/value pairs), together with an enhancement to dyadic mixed function Pick.

HTH.

Re: Convenient Representations of XML/JSON

Posted: Sat Nov 15, 2014 7:32 pm
by Phil Last
I agree with Jane that native support for dictionaries, associative arrays, whatever you want to call them is essential but I find it difficult to imagine a better way than dotted notation, (using Jane's example)
      sf←⎕ns''
sf.(abc def ghij klmnop)←(2 2⍴⍳4)('Mary had a little lamb')1.234(0 0 0 0)
sf.def ⍝ or sf.⍎'def'
Mary had a little lamb

Apart from JSON's use of brackets, commas and doublequotes
      ["zero","one","two",[3,4,5]]
where APL would use parentheses, blanks and quotes,
      ('zero' 'one' 'two' (3 4 5))
and its restriction to a single dimension, I see so little difference between JSON and APL notation that the entire thing could be rather a transliteration than a conversion.

APL only needs a better (function-free) notation for dataspaces and such a transliteration would produce an executable expression to return the data.

And as to that
      {"zero": 0, "one": 1, "two": 2}
is so suggestive that APLSharp's proposal of
      [[zero ← 0 ⋄ one ← 1 ⋄ two ← 2]]
looks as if it were designed with this purpose in mind.

Oh yes, I'd go for an enhanced no. 4.

Re: Convenient Representations of XML/JSON

Posted: Sat Nov 15, 2014 10:50 pm
by Phil Last
Something like this maybe?
      ⎕cr'decode'
decode←{⍺←⊢
arg←1⌽'][',⍵/⍨1+⍵=q←''''
e←{⍵>¯1↓0,⍵}¯1↓0,arg='\'
d←arg='"'
((d>e)/arg)←q
arg←arg/⍨1⌽d⍲e
q←≠\arg=q
((q<arg='[')/arg)←'('
((q<arg=']')/arg)←')'
((q<arg='-')/arg)←'¯'
(q c arg)/⍨←⊂1+c←q<arg=','
(c/arg)←(+/c)⍴')('
⍎arg
⍝ Phil Last 2014-11-15
}
js
[1, "\"phil's\" decoder",2,3,["one","two",3]],[-7,8]
]display decode js
┌→──────────────────────────────────────────────────────┐
│ ┌→───────────────────────────────────────────┐ ┌→───┐ │
│ │ ┌→───────────────┐ ┌→──────────────┐ │ │¯7 8│ │
│ │ 1 │"phil's" decoder│ 2 3 │ ┌→──┐ ┌→──┐ │ │ └~───┘ │
│ │ └────────────────┘ │ │one│ │two│ 3 │ │ │
│ │ │ └───┘ └───┘ │ │ │
│ │ └∊──────────────┘ │ │
│ └∊───────────────────────────────────────────┘ │
└∊──────────────────────────────────────────────────────┘

Re: Convenient Representations of XML/JSON

Posted: Sun Nov 16, 2014 10:32 am
by Phil Last
Or this?
      ⎕cr'json2apl.decode'
decode←{⍺←⊢ ⋄ ⎕ML←0
ps←⍺⊣# ⍝ parent of spaces - dflt #
arg←⊃,/' ',' ',¨⍵ ⍝ maybe nested
arg←(' ',arg)[(⎕TC,arg)⍳arg] ⍝ maybe CR LF
arg←1⌽'][',arg/⍨1+arg=q←'''' ⍝ double the quotes and bracket
e←{⍵>¯1↓0,⍵}¯1↓0,arg='\' ⍝ escapes
d←arg='"' ⍝ doublequote
((d>e)/arg)←q ⍝ replace unescaped " → '
arg←arg/⍨1⌽d⍲e ⍝ remove escapes
q←≠\arg=q ⍝ quoted

repl←{(q a)(o n)←⍵ ⍺
(q c a)/⍨←⊂1⌈(≢n)×c←q<a=o
(c/a)←(+/c)⍴n
q a
} ⍝ transliterations
swap←⌽'[(' '])' '-¯'(',' ')(')('}' '))')('{' '(f(')(':' '{⍺⍵}')
q arg←⊃repl/swap,⊂q arg

c←{⍵∨¯1↓0,⍵}q<'()'⍷arg
(c/arg)←(+/c)⍴'⍬ ' ⍝ replace () → ⍬
f←{
s←ps.⎕NS'' ⍝ space
arg←↓⍉↑,∘⊂⍣(1=≡,⊃⍵)⊢⍵ ⍝ invert pairs
0∊⍴arg:s ⍝ none?
s⊣s.{⍎⍕⍺,'←⍵'}/arg ⍝ associate data
}
(true false null)←1 0 ⎕NULL
⍎arg ⍝ evaluate
⍝ Phil Last 2014-11-16
}
⍝ Jane's example:
⍪jsmith
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"height_cm": 167.6,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"children": ["John","Paul","George","Ringo"],
"spouse": "Mary"
}
#.json2apl.decode jsmith
#.[Namespace]
{⍵.⎕nl¨⍳10} #.json2apl.decode jsmith
age address
children
firstName
height_cm
isAlive
lastName
phoneNumbers
spouse
{⍵.({⍵,⍪⍎¨⍵}↓⎕nl 2)}json2apl.decode jsmith
age 25
children John Paul George Ringo
firstName John
height_cm 167.6
isAlive 1
lastName Smith
phoneNumbers #.[Namespace] #.[Namespace]
spouse Mary
{⍵.address.({⍵,⍪⍎¨⍵}↓⎕nl 2)}json2apl.decode jsmith
city New York
postalCode 10021-3100
state NY
streetAddress 21 2nd Street
{⍵.phoneNumbers.({⍵,⍪⍎¨⍵}↓⎕nl 2)}json2apl.decode jsmith
number 212 555-1234 number 646 555-4567
type home type office

Re: Convenient Representations of XML/JSON

Posted: Sun Nov 16, 2014 6:31 pm
by gil
I vote for option 3 and 4.

Nested vectors, but inverted like the format Jane mentions for the slotfiller approach, is a familiar model for many APLers and would allow existing tools to be used to work with them.

A namespace is (in my opinion) a natural way to represent objects. One drawback is that JSON allows keys that are not valid APL names, but I still think the option should be there. If I know the data I will be dealing with uses valid APL names I prefer the ns approach to the nested vectors. Of course, using namespaces you also need to deal with potential circular references (as Phil can testify).

The JSON parser included in the latest version of MiServer has a clever way of dealing with boolean values (setting/using the display format on an empty ns). How would that be dealt with in a system function?

I've used the matrix format for XML but found that I ended up partitioning and nesting it more often than not. Typically I will pass branches of the tree to different subroutines. If it was already nested that would save me the effort.

Re: Convenient Representations of XML/JSON

Posted: Mon Nov 17, 2014 11:01 am
by sjt
Worth reflecting on what Arthur has done in K, where dictionaries and tables are first-order objects.

His dictionaries are like Jane’s slotfillers – unsurprising, since he wrote A. They are keyed by ‘symbols’ – immutable strings. A list of three symbols: `a`b`c. Dictionary elements are accessed by the indexing function @, e.g. d@`b`a`c. Unlike the name of Dyalog namespace members, symbols can contain spaces, but I think rarely do.

A table is a special case of a dictionary, in which the members are same-length vectors. The @ function returns vectors.

Lacking namespaces, we’ve emulated K in APL+Win, getting much clearer application code. Our at function works on dictionaries and tables. Keys don’t have to be valid APL names, but we usually make them so.

Because we handle dictionaries with large and volatile lists of keys we return a default result for an invalid key rather than an error. But for tables an invalid key produces an error.

Although I would have used namespaces at the start to represent dictionaries and tables I now have some reservations. Namespaces are a good deal more general than I need for a dictionary. I would need to stay alert for functions and for executable expressions showing up in the keys. We often manipulate lists of keys. These would get resolved through ⍎, recursing the interpreter. Bit heavy for indexing. Extending ⊃ appeals to me.

If going that way, consider extending it to CSV-style tables. Eg

      t←'one' 'two' 'three'⍪ 10 100 1000 ∘.×1 2 3
'two'⊃t
20 200 2000
'two' 'three'⊃t ⍝ is ¨ necessary?
20 200 2000 30 300 3000

Stephen

Re: Convenient Representations of XML/JSON

Posted: Mon Nov 17, 2014 1:38 pm
by Phil Last
Remembering that this has to be a two way process that, as far as possible, constitutes a pair of mutual inverses, and looking at the reverse process of encoding APL into JSON might shed some light on which way to go if one is to be chosen.
It would be very nice if all APL data could be represented in as simple a manner as possible in JSON. We have a problem with multidimensional arrays that can be resolved in a number of ways that almost certainly do not involve mapping to objects. But it seems that if not directly then in some indirect way spaces must be mapped to objects if they are to be represented at all.
Say we have a space defined as:
      space←{⍵⊣⍵.(zero one two three)←⍬ 1 'two' (3 4 5)}⎕NS''
Or verbosely as:
      space←⎕NS''
space.zero←⍬
space.one←1
space.two←'two'
space.three←3 4 5
If APL spaces map directly to JSON objects then the encoding might be:
      {"one": 1, "three": [3,4,5], "two": "two", "zero": []}
If not then we need some way to convert a space into something that does.
Once we do that we have made an equivalence between the space and that (let's call it a) "dictionary" that can never be resolved on decoding the JSON without some other external meta knowledge. Did it start as a space with named members or was it a dictionary to start with.
If instead, in encoding the above space we gather the identifiers as (say) "ids" and the values as "values" we can encode:
      {"ids": ["one","three","two","zero"], "values": [1,[3,4,5],"two",[]]}
But is that the definition of a space with two named members: "ids" and "values", or is it another dictionary? Are we in an endless loop?
But if associative data is to be represented in APL as an array rather than a space then there is no need to invoke the JSON object to represent it as JSON is perfectly capable of representing all APL scalars or vectors of length greater than one as JSON arrays. In the above we're always going to have two lists so there is no benefit in naming them; we can merely adopt the protocol of ids first and define:
      [["one","three","two","zero"], [1,[3,4,5],"two",[]]]

Re: Convenient Representations of XML/JSON

Posted: Mon Nov 17, 2014 2:23 pm
by crishog
Not heard them called "slotfillers" before but Ziggi & I have used the 2 element names-in-first / values-in-second for some years - and before that Phil & I had all the names in element one & the values in subsequent items (I think that originated with Phil), also as "associative arrays" (values with named indices) as described by Jane are a feature of PHP and other web-based scripting languages.

I've been known to generate XML via a matrix then convert through ⎕XML, but I think I always convert to a namespace structure to read/use within APL.

I seem to be falling into the 3/4 camp too. If you know the structure well then 4 would appear to be easier; if you are exploring (or branches/values can be missing) then 3 might give better results.

Re: Convenient Representations of XML/JSON

Posted: Fri Nov 21, 2014 8:39 pm
by sjt
Further thoughts following a stimulating discussion at today’s BAA meeting – thanks Phil, Jane, Chris & Jake!

JSON keys don’t need to be APL names. We considered and rejected any transformation of keys to APL names. That rules out a namespace as the target for importing JSON. Has to be an array.

We reviewed three array structures for representing name/value pairs.

  1. A 2-element vector: keys and values
  2. An n+1 vector: (⊂n keys) followed by n values
  3. A 2-row matrix
Although (2) is what I use in my day job, (3) has three virtues the others lack.

  • The structure ensures a 1:1 mapping of keys to values. You can’t form a dictionary with more keys than values or vice versa.
  • The default format is legible and compact. With many keys, ⍉ will improve it.
  • The notation extends naturally to more rows, emulating CSV tables.
The central role of dictionaries and tables in kdb, and the use of the @ function to index both, suggests seeing what we can borrow.

Call a dictionary a matrix in which the first row consists of character vectors. Call the first row the keys. Call a CSV a dictionary with more than 2 rows. The corresponding values are:

  • for a 2-row dictionary: ,1 0↓dict
  • for a >2-row dictionary: ↓⍉1 0↓
Intuitively, the keys of a CSV are its top row, and its values are its columns.

Pick Extend the left and right domains of ⊃: on the left simple or nested character vectors; requiring on the right a dictionary.

Execute Phil pointed out that dot notation has made dyadic ⍎ redundant. Extend the right domain of ⍎ to dictionaries where all the keys are valid APL names. The result is a namespace. Make the left domain be nested character vectors: a list of keys. The result is a namespace of variables corresponding to the intersection of ⍺ and keys of ⍵.

Format Extend the right domain of ⍕ to namespaces; the result is a dictionary of its variables. (Serialisations of objects capture only properties.) In the domain of dictionaries and namespaces of variables, ⍎ and ⍕ become inverses of each other.

Index Extend the right domain of monadic ⌷ to dictionaries. Return all the keys.

In terms of importing JSON to APL, suppose IJ takes a character vector string of well-formed JSON. Then

      ≢d←IJ '{cow:food;sheep:food;cat:pet;dog:pet;}' ⍝ dictionary
2
d
cow sheep cat dog
food food pet pet
d ≡ ⍕ns←⍎d ⍝ all keys in string are APL names
1
(≢⍉d) = ≢ns.⎕NL 2
1
⌷d ⍝ the keys of d
cow sheep cat dog
ns.cow ←→ 'cow' ⊃ d
ns.(cow cat) ←→ 'cow' 'cat' ⊃ d
d ←→ d⊃⍨⌷d


The ⊃ function extended to CSVs would give me the at function I presently define in APL+Win for handling tables.