The Meaning of Unique
Posted: Tue Sep 14, 2021 3:20 pm
This might be a post more for English language mavens than APL language mavens...
So in V18, we can easily flag the first occurrence of every item in a vector using the "unique mask" function:
All well and good. However, I also want to flag the unique items of the vector. In the above case, 1 and 4 are not unique, as they obviously occur multiple times. So I can do:
Lets define these as:
(Bonus points: is there a better expression for UniqueMask2 I didn't even attempt the key operator, as I assume it will be longer. There is {1=+/⍵∘.=⍵}, but it is of course slow and wsfull prone.)
The unique (and unique mask) function in APL do not return the unique values of the array, but rather an array whose values are unique. I guess the mathematicians might say it returns a set from a bag.
I have a reporting requirement where both concepts (and their complements) are required. The term "unique" could be applied to either concept, and then a name or phrase must be coined for the other concept. Or two new names entirely.
The context is a database table. Given a key column(s), show me the unique rows (or non unique rows). Show me the rows that occur only once (or show me the rows that occur multiple times).
My feeling is to reserve the term unique for UniqueMask2:
UniqueMask1: ? (?)
UniqueMask2: Unique Rows (Non-Unique Rows)
(Or "duplicate" for non-unique)
But this leaves the question of what to call UniqueMask1. I don't think the word "distinct" helps as I think it is synonymous with "unique" in this case. Everything gets a bit wordy. The best I can to is:
UniqueMask1: First Occurrence Rows (Subsequent Occurrence Rows)
Any thoughts on the terminology? The UI will have to have something that explains things in more detail, but the short names for all the results should not be confusing.
So in V18, we can easily flag the first occurrence of every item in a vector using the "unique mask" function:
a←1 4 1 1 2 4 3
≠a
1 1 0 0 1 0 1
All well and good. However, I also want to flag the unique items of the vector. In the above case, 1 and 4 are not unique, as they obviously occur multiple times. So I can do:
{~⍵∊⍵/⍨~≠⍵}a
0 0 0 0 1 0 1
Lets define these as:
UniqueMask1←≠
UniqueMask2←{~⍵∊⍵/⍨~≠⍵}
(Bonus points: is there a better expression for UniqueMask2 I didn't even attempt the key operator, as I assume it will be longer. There is {1=+/⍵∘.=⍵}, but it is of course slow and wsfull prone.)
The unique (and unique mask) function in APL do not return the unique values of the array, but rather an array whose values are unique. I guess the mathematicians might say it returns a set from a bag.
I have a reporting requirement where both concepts (and their complements) are required. The term "unique" could be applied to either concept, and then a name or phrase must be coined for the other concept. Or two new names entirely.
The context is a database table. Given a key column(s), show me the unique rows (or non unique rows). Show me the rows that occur only once (or show me the rows that occur multiple times).
My feeling is to reserve the term unique for UniqueMask2:
UniqueMask1: ? (?)
UniqueMask2: Unique Rows (Non-Unique Rows)
(Or "duplicate" for non-unique)
But this leaves the question of what to call UniqueMask1. I don't think the word "distinct" helps as I think it is synonymous with "unique" in this case. Everything gets a bit wordy. The best I can to is:
UniqueMask1: First Occurrence Rows (Subsequent Occurrence Rows)
Any thoughts on the terminology? The UI will have to have something that explains things in more detail, but the short names for all the results should not be confusing.