I have an APL file.
I collect stock market data, and APPEND the new array to the next component.
I am trying to save some space ... and NULL OUT earlier components ...
So I keep a rolling 10 days of stock data.
I see the []FSIZE in bytes used.
I REPLACE one of the current components (250K in that component)
with a NULL value.
But, when I check []FSIZE ... the used file space bytes do not get smaller.
I have replaced 7 of the older components .. each 250K bytes .. .with a NULL value ...
and yet, the APL File space remains the same.
Can I compress the APL FILE to gain back the disk space?
Or must I create a NEW APL FILE ...
and reconstruct the data ... (writing NULL in the older components) ...
and then RENAME the new file to the old file name ... in order to recover the bytes ?
Thanks in advance for your thoughts on this.
//W
APL FILES - can I reduce the bytes used?
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
- Morten|Dyalog
- Posts: 460
- Joined: Tue Sep 09, 2008 3:52 pm
Re: APL FILES - can I reduce the bytes used?
Hi Woodley! It would be incredibly inefficient if the file system always tried to reclaim space, so this requires an action on your part. You can use monadic (⎕FRESIZE TN) to reclaim unused space from a component file.
You might also want to experiment with creating your files with the 'Z' property, so that components will be compressed. This costs a little bit of extra CPU but, depending on the speed of your network, can sometimes actually speed things up due to significantly reduced network traffic:
Of course, in the above example, a lot of the compressing comes from having a relatively inefficient representation, with a nested array in which each item is an enclosed array. If you use a more compact form, the compression gain is less (this example is a bit of a "worst case", in that the data is completely random - but you still get about 25%):
Finally, if your components are all exactly the same size (or you can easily pad them to ensure this), you might consider a "circular" set of components, where you keep a directory of which dates are in the file, and then always overwrite the oldest component with a new one of exactly the same size. Let me know if that doesn't make sense and I'll explain in more detail. And of course, if you enable compression, there is (probably) no way to guarantee that the components will have exactly the same size, so this strategy cannot be combined with compression.
You might also want to experiment with creating your files with the 'Z' property, so that components will be compressed. This costs a little bit of extra CPU but, depending on the speed of your network, can sometimes actually speed things up due to significantly reduced network traffic:
Code: Select all
data←(↓⎕A[?100000 3⍴26]),⍪100+0.01×?100000⍴10000
4↑data
ZVB 112.07
VNV 142.62
MFT 114.26
OKL 195.82
⎕size 'data' ⍝ commas inserted manually in output for readability
8,800,040
'c:\tmp\compressed' (⎕FCREATE⍠'Z' 1) 1
data ⎕fappend 1
⎕fsize 1
1 2 1,331,056 1.844674407E19
Of course, in the above example, a lot of the compressing comes from having a relatively inefficient representation, with a nested array in which each item is an enclosed array. If you use a more compact form, the compression gain is less (this example is a bit of a "worst case", in that the data is completely random - but you still get about 25%):
Code: Select all
data←(⎕A[?100000 3⍴26]) (100+0.01×?100000⍴10000)
4↑¨data
WDH 190.71 120.67 194.57 165.09
GAW
TMQ
KDS
⎕size 'data'
1,100,120
data ⎕fappend 1
⎕fsize 1
1 2 716,560 1.844674407E19
Finally, if your components are all exactly the same size (or you can easily pad them to ensure this), you might consider a "circular" set of components, where you keep a directory of which dates are in the file, and then always overwrite the oldest component with a new one of exactly the same size. Let me know if that doesn't make sense and I'll explain in more detail. And of course, if you enable compression, there is (probably) no way to guarantee that the components will have exactly the same size, so this strategy cannot be combined with compression.
Re: APL FILES - can I reduce the bytes used?
Excellent!
Sincere thanks.
Sincere thanks.
- Richard|Dyalog
- Posts: 44
- Joined: Thu Oct 02, 2008 11:11 am
Re: APL FILES - can I reduce the bytes used?
A couple more things to note:
When you replaced your large components with small ones, the areas of the file originally occupied by the large components were "freed", and large unused gaps were left in the file. Although you did not see any immediate file size reduction, you could use ⎕FRESIZE to shuffle the file contents and remove the gaps so that you would - but if you simply left the gaps there, they would subsequently be reclaimed when adding new components and you would find that adding new components would not make it any bigger (until they were used up).
So, if you adopt your policy of clearing out old components by replacing them with with small ones when you add new ones, you should see the file increase in size only slowly over time.
Morten's suggestion of replacing a component with another of exactly the same size is tricky to get right - especially as a result of work that we've done to make the file handling more robust. However, that doesn't really matter because space eventually gets reused, as noted above. If you do use his technique of cycling through a circular set of components you would see some benefit because you would not be filling the file with small components, and the only size increases you would get over time would then be the result of (a) components actually being larger, and (b) wasted space due to fragmentation. You could occasionally ⎕FRESIZE the file to eliminate the fragmentation.
When you replaced your large components with small ones, the areas of the file originally occupied by the large components were "freed", and large unused gaps were left in the file. Although you did not see any immediate file size reduction, you could use ⎕FRESIZE to shuffle the file contents and remove the gaps so that you would - but if you simply left the gaps there, they would subsequently be reclaimed when adding new components and you would find that adding new components would not make it any bigger (until they were used up).
So, if you adopt your policy of clearing out old components by replacing them with with small ones when you add new ones, you should see the file increase in size only slowly over time.
Morten's suggestion of replacing a component with another of exactly the same size is tricky to get right - especially as a result of work that we've done to make the file handling more robust. However, that doesn't really matter because space eventually gets reused, as noted above. If you do use his technique of cycling through a circular set of components you would see some benefit because you would not be filling the file with small components, and the only size increases you would get over time would then be the result of (a) components actually being larger, and (b) wasted space due to fragmentation. You could occasionally ⎕FRESIZE the file to eliminate the fragmentation.