Dyadic grade up (X ⍋ Y)

General APL language issues
Post Reply
Yves
Posts: 39
Joined: Mon Nov 30, 2015 11:33 am

Dyadic grade up (X ⍋ Y)

Post by Yves »

Dear All,
i need to sort word list with some diacriticals ( ie äéàèùç ...).
with something like that
      ]display tabTri3D[ 1 ; ; ]
┌→──────────────────────────┐
↓ abcdefghijklmnopqrstuvwxyz│
│ ã │
│ à │
│ á │
│ â │
│ ä │
│ å │
└───────────────────────────┘
]display tabTri3D[ 2 ; ; ]
┌→──────────────────────────┐
↓ ABCDEFGHIJKLMNOPQRSTUVWXYZ│
│ Á │
│ Â │
│ Ã │
│ À │
│ Ä │
│ Å │
└───────────────────────────┘

To help you to experiment it :
      ⎕ucs 'aãàáâäå'
97 227 224 225 226 228 229
⎕ucs 'AÁÂÃÀÄÅ'
65 193 194 195 192 196 197


i am not sure... is it better/efficient/elegant with
      Plan1 ,[1] Plan2

or with
      tabTri3D←2 7 27⍴''
tabTri3D[ 1 ; ; ] ← Plan1
tabTri3D[ 2 ; ; ] ← Plan2

local help (with F1) about dyadic grade up show the result i need. i just need to evolve with diacritical sign.

Thanks you for your helps
Yves
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Dyadic grade up (X ⍋ Y)

Post by Roger|Dyalog »

I am not sure what you are asking exactly (e.g. does "A" precede "a"?), but there is a recent post on the Dyalog blog on dyadic grade which includes an example involving diacritical marks.
Yves
Posts: 39
Joined: Mon Nov 30, 2015 11:33 am

Re: Dyadic grade up (X ⍋ Y)

Post by Yves »

Hi Roger,
Nice to hear you.
your link is very interesting, i study it.
i come back with next question, specially for you :)

Regards,
Yves
Yves
Posts: 39
Joined: Mon Nov 30, 2015 11:33 am

Re: Dyadic grade up (X ⍋ Y)

Post by Yves »

Dear Roger & All,
For language not using latin characters, we use systematically latin sign diacritical ornament. it is transliteration.

first example (exist in unicode) :
with ṭ (⎕ucs 7789), we have only one code.
but this glyph is writable with t follow by ̣ (⎕ucs 116 803).
same glyph, 1 or 2 code.

second example (not exist in unicode) :
same glyph with accent. not exist in unicode.
i do t with underpoint, follow by accent ?
or t with accent, follow by underpoint ?
or t follow by accent, follow by underpoint ?
or t follow by underpoint, follow by accent ?
in this case, we have 2 or 3 code, and more combinations.

how is it possible to put ṭ in array with all combinations (all combination give the same weight for ⍋) ?
all combination give one weight for the same glyph, and return the better combination, and the same each time for this glyph.
the translitteration for sanskrit need 15 letters with different ornament.
i hope ⍺⍋⍵ is as simply as letters with diacritical.

Regards,
Yves
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Dyadic grade up (X ⍋ Y)

Post by Roger|Dyalog »

You have described an interesting problem. If I understand correctly, the problem you described is not one for dyadic ⍋. One way to solve it is as follows:

0. Identify the "symbols". From your description a symbol can be denoted by multiple characters, for example "t" or ⎕ucs 116 803.

1. Transform each symbol to an integer value, or pair of integers if that make things easier. Beforehand, you can make a table of symbols and corresponding numeric value. For example,

Code: Select all

      Symbol          Value
   A                   97 0
   À                   97 1
   a                   97 0 
   à                   97 1   
       ...       
   t                  116 0
   ⎕ucs 116 803       116 1
       ...

(The values depend on how you want to order the symbols.)

2. Grade the array of integer values.

Putting it all together: {⎕io←0 ⋄ ⍋0 2 1⍉Value[Symbol⍳⍪symbolize ⍵;]} (Based on the "Alternatives" section of the Dyadic Grade blog post.) Of these steps, by far the trickiest will be step 0, the "symbolize" step.
Yves
Posts: 39
Joined: Mon Nov 30, 2015 11:33 am

Re: Dyadic grade up (X ⍋ Y)

Post by Yves »

Hi Roger,
you wellunderstanding the difficulty.

To help you, here it is all vowels, in official order, for sanskrit alphabet in transliteration.
      chn ← (97)  (97 772)  (105)  (105 772)  (117)  (117 772)
(114 803) (114 803 772) (108 803) (108 803 772) (101) (97 105) (111) (97 117)
(109 775) (58)
parenthesis are just delimiters of group.

for t sample :
      ⎕ucs¨ (116 803) (116 803 104) (116 803 769)
┌→─────────────────┐
│ ┌→─┐ ┌→──┐ ┌→──┐ │
│ │ṭ │ │ṭh │ │ṭ́ │ │
│ └──┘ └───┘ └───┘ │
└∊─────────────────┘
this H is not independant. it is glue at the T to indicate "hard breath". i prefere the third option : flexion is indicate by diacritical sign, not a letter.
confusion and difficulty are increased when the letter H herself play in the game.

i suggest to see https://unicode-table.com/fr/blocks/combining-diacritical-marks/ and more here https://unicode-table.com/fr/blocks/combining-diacritical-marks-supplement/.

i try your suggestions, and i come back.

Regards,
Yves
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Dyadic grade up (X ⍋ Y)

Post by Roger|Dyalog »

Good luck.

Dyadic grade ⍺⍋⍵ works on individual characters in ⍵, but you want to compare (for example) 't' vs. ⎕ucs 116 803. Therefore you have to use something other than dyadic grade.
User avatar
PGilbert
Posts: 440
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: Dyadic grade up (X ⍋ Y)

Post by PGilbert »

Bonjour Yves, in case it is helping I have contributed a function to 'normalize' some text using .Net at the APL Wiki: https://aplwiki.com/netUpperLowerCase#R ... diacritics)

The goal would be to apply the sorting index of the 'normalize' text to the 'non-normalize' text.

Bonne chance,

Pierre Gilbert
User avatar
Adam|Dyalog
Posts: 143
Joined: Thu Jun 25, 2015 1:13 pm

Re: Dyadic grade up (X ⍋ Y)

Post by Adam|Dyalog »

You may be able to do some preprocessing which replaces the appropriate character sequences with single placeholder characters. Then use dyadic ⍋. Look e.g. here.
Post Reply