Page 1 of 1

Dyadic grade up (X ⍋ Y)

Posted: Tue Sep 25, 2018 2:29 pm
by Yves
Dear All,
i need to sort word list with some diacriticals ( ie äéàèùç ...).
with something like that
      ]display tabTri3D[ 1 ; ; ]
┌→──────────────────────────┐
↓ abcdefghijklmnopqrstuvwxyz│
│ ã │
│ à │
│ á │
│ â │
│ ä │
│ å │
└───────────────────────────┘
]display tabTri3D[ 2 ; ; ]
┌→──────────────────────────┐
↓ ABCDEFGHIJKLMNOPQRSTUVWXYZ│
│ Á │
│ Â │
│ Ã │
│ À │
│ Ä │
│ Å │
└───────────────────────────┘

To help you to experiment it :
      ⎕ucs 'aãàáâäå'
97 227 224 225 226 228 229
⎕ucs 'AÁÂÃÀÄÅ'
65 193 194 195 192 196 197


i am not sure... is it better/efficient/elegant with
      Plan1 ,[1] Plan2

or with
      tabTri3D←2 7 27⍴''
tabTri3D[ 1 ; ; ] ← Plan1
tabTri3D[ 2 ; ; ] ← Plan2

local help (with F1) about dyadic grade up show the result i need. i just need to evolve with diacritical sign.

Thanks you for your helps
Yves

Re: Dyadic grade up (X ⍋ Y)

Posted: Wed Sep 26, 2018 7:06 pm
by Roger|Dyalog
I am not sure what you are asking exactly (e.g. does "A" precede "a"?), but there is a recent post on the Dyalog blog on dyadic grade which includes an example involving diacritical marks.

Re: Dyadic grade up (X ⍋ Y)

Posted: Wed Sep 26, 2018 9:58 pm
by Yves
Hi Roger,
Nice to hear you.
your link is very interesting, i study it.
i come back with next question, specially for you :)

Regards,
Yves

Re: Dyadic grade up (X ⍋ Y)

Posted: Wed Sep 26, 2018 10:58 pm
by Yves
Dear Roger & All,
For language not using latin characters, we use systematically latin sign diacritical ornament. it is transliteration.

first example (exist in unicode) :
with ṭ (⎕ucs 7789), we have only one code.
but this glyph is writable with t follow by ̣ (⎕ucs 116 803).
same glyph, 1 or 2 code.

second example (not exist in unicode) :
same glyph with accent. not exist in unicode.
i do t with underpoint, follow by accent ?
or t with accent, follow by underpoint ?
or t follow by accent, follow by underpoint ?
or t follow by underpoint, follow by accent ?
in this case, we have 2 or 3 code, and more combinations.

how is it possible to put ṭ in array with all combinations (all combination give the same weight for ⍋) ?
all combination give one weight for the same glyph, and return the better combination, and the same each time for this glyph.
the translitteration for sanskrit need 15 letters with different ornament.
i hope ⍺⍋⍵ is as simply as letters with diacritical.

Regards,
Yves

Re: Dyadic grade up (X ⍋ Y)

Posted: Thu Sep 27, 2018 4:25 am
by Roger|Dyalog
You have described an interesting problem. If I understand correctly, the problem you described is not one for dyadic ⍋. One way to solve it is as follows:

0. Identify the "symbols". From your description a symbol can be denoted by multiple characters, for example "t" or ⎕ucs 116 803.

1. Transform each symbol to an integer value, or pair of integers if that make things easier. Beforehand, you can make a table of symbols and corresponding numeric value. For example,

Code: Select all

      Symbol          Value
   A                   97 0
   À                   97 1
   a                   97 0 
   à                   97 1   
       ...       
   t                  116 0
   ⎕ucs 116 803       116 1
       ...

(The values depend on how you want to order the symbols.)

2. Grade the array of integer values.

Putting it all together: {⎕io←0 ⋄ ⍋0 2 1⍉Value[Symbol⍳⍪symbolize ⍵;]} (Based on the "Alternatives" section of the Dyadic Grade blog post.) Of these steps, by far the trickiest will be step 0, the "symbolize" step.

Re: Dyadic grade up (X ⍋ Y)

Posted: Thu Sep 27, 2018 7:13 pm
by Yves
Hi Roger,
you wellunderstanding the difficulty.

To help you, here it is all vowels, in official order, for sanskrit alphabet in transliteration.
      chn ← (97)  (97 772)  (105)  (105 772)  (117)  (117 772)
(114 803) (114 803 772) (108 803) (108 803 772) (101) (97 105) (111) (97 117)
(109 775) (58)
parenthesis are just delimiters of group.

for t sample :
      ⎕ucs¨ (116 803) (116 803 104) (116 803 769)
┌→─────────────────┐
│ ┌→─┐ ┌→──┐ ┌→──┐ │
│ │ṭ │ │ṭh │ │ṭ́ │ │
│ └──┘ └───┘ └───┘ │
└∊─────────────────┘
this H is not independant. it is glue at the T to indicate "hard breath". i prefere the third option : flexion is indicate by diacritical sign, not a letter.
confusion and difficulty are increased when the letter H herself play in the game.

i suggest to see https://unicode-table.com/fr/blocks/combining-diacritical-marks/ and more here https://unicode-table.com/fr/blocks/combining-diacritical-marks-supplement/.

i try your suggestions, and i come back.

Regards,
Yves

Re: Dyadic grade up (X ⍋ Y)

Posted: Fri Sep 28, 2018 5:11 am
by Roger|Dyalog
Good luck.

Dyadic grade ⍺⍋⍵ works on individual characters in ⍵, but you want to compare (for example) 't' vs. ⎕ucs 116 803. Therefore you have to use something other than dyadic grade.

Re: Dyadic grade up (X ⍋ Y)

Posted: Fri Sep 28, 2018 12:44 pm
by PGilbert
Bonjour Yves, in case it is helping I have contributed a function to 'normalize' some text using .Net at the APL Wiki: https://aplwiki.com/netUpperLowerCase#R ... diacritics)

The goal would be to apply the sorting index of the 'normalize' text to the 'non-normalize' text.

Bonne chance,

Pierre Gilbert

Re: Dyadic grade up (X ⍋ Y)

Posted: Thu Oct 04, 2018 9:56 pm
by Adam|Dyalog
You may be able to do some preprocessing which replaces the appropriate character sequences with single placeholder characters. Then use dyadic ⍋. Look e.g. here.