Discussion:
[Gambas-user] My quest for efficiency
Fernando Cabral
2017-07-15 14:08:17 UTC
Permalink
Hi

I've found a file whose text has been obfuscated by subtracting 11 from
every byte. Now I want to bring it back to regular text. To do this I have
to add 11 to each byte read from that file. Now, I have tried several ways
to do it, and they all seemed every inefficient to me. Two examples follow











*j = 0For i = 0 To Len(RawText)str &= Chr(CByte(Asc(RawText, i) + 11)) '
either this or the following'Mid(Rawtext, i, 1) = Chr(CByte(Asc(RawText, i)
+ 11))Inc jIf j = 100000 Then Print i; Now j = 0EndifNext*

In the first option (uncommented) I am building a new string byte by byte.
In the second option (commented) I am replacing each character in place.
I expected the second option to be way faster, especially because there is
no need for the string to be reallocated. Nevertheless, it showed to be a
snail.
The first option, in spite of the fact that it grows slower and slower as
the string grows, is still way faster than the second option.


To me it does not make sense. Does it for you?
Also, is there a faster way to do this?
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Fernando Cabral
2017-07-15 17:36:28 UTC
Permalink
Well, after 5 hours the most efficient version is still running. Only 1/5
of the file has been processed. The less efficient version has only
processed 1 MB, or 1/ 42 of the file.

So I decided to write a C program to do the same task. Since I have not
been using C in the last 20 years, I did not try any fancy thing. I know C
has to be more efficient, so I expected to find find, perhaps, 10 minutes,
5 minutes. Not so. To my surprise, the program bellow did the whole thing
in ONE SECOND!

I found this to be quite inexpected.

















*#include <stdio.h>int main(void){ FILE *fp; int c; fp =
fopen("/home/fernando/temp/deah001.dhn", "r"); while((c = fgetc(fp)) !=
EOF) { putchar(c + 11); } fclose(fp); return 0;}*

I am sure there is a way to do this efficiently in Gambas.Certainly not in
1 second, as it happened here, but perhaps in 5 or 10 minutes instead of
the several hours it is now taking.

- fernando
Post by Fernando Cabral
Hi
I've found a file whose text has been obfuscated by subtracting 11 from
every byte. Now I want to bring it back to regular text. To do this I have
to add 11 to each byte read from that file. Now, I have tried several ways
to do it, and they all seemed every inefficient to me. Two examples follow
*j = 0For i = 0 To Len(RawText)str &= Chr(CByte(Asc(RawText, i) + 11)) '
either this or the following'Mid(Rawtext, i, 1) = Chr(CByte(Asc(RawText, i)
+ 11))Inc jIf j = 100000 Then Print i; Now j = 0EndifNext*
In the first option (uncommented) I am building a new string byte by byte.
In the second option (commented) I am replacing each character in place.
I expected the second option to be way faster, especially because there is
no need for the string to be reallocated. Nevertheless, it showed to be a
snail.
The first option, in spite of the fact that it grows slower and slower as
the string grows, is still way faster than the second option.
To me it does not make sense. Does it for you?
Also, is there a faster way to do this?
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
Telegram: +55 (37) 99988-8868 <(37)%2099988-8868>
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868 <(37)%2099988-8868>
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183 <(37)%203521-2183>
Telefone celular: +55 (37) 99988-8868 <(37)%2099988-8868>
Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Tony Morehen
2017-07-15 18:49:34 UTC
Permalink
Did you try Benoit's suggestion:

Public Sub Main()

Dim sIn as String
Dim sOut as String

sIn = File.Load("/home/fernando/temp/deah001.dhn")
sOut = Add11(sIn)
File.Save("/home/fernando/temp/deah001.11Added.dhn", sOut)

End

Public Sub Add11(InputString as String) as String
Dim bArray As Byte[]
Dim String11 As String
Dim i As Integer

bArray = Byte[].FromString(InputString)
For i = 0 To bArray.Max
bArray[i] += 11
Next
Return bArray.ToString
End
Post by Fernando Cabral
Well, after 5 hours the most efficient version is still running. Only 1/5
of the file has been processed. The less efficient version has only
processed 1 MB, or 1/ 42 of the file.
So I decided to write a C program to do the same task. Since I have not
been using C in the last 20 years, I did not try any fancy thing. I know C
has to be more efficient, so I expected to find find, perhaps, 10 minutes,
5 minutes. Not so. To my surprise, the program bellow did the whole thing
in ONE SECOND!
I found this to be quite inexpected.
*#include <stdio.h>int main(void){ FILE *fp; int c; fp =
fopen("/home/fernando/temp/deah001.dhn", "r"); while((c = fgetc(fp)) !=
EOF) { putchar(c + 11); } fclose(fp); return 0;}*
I am sure there is a way to do this efficiently in Gambas.Certainly not in
1 second, as it happened here, but perhaps in 5 or 10 minutes instead of
the several hours it is now taking.
- fernando
Post by Fernando Cabral
Hi
I've found a file whose text has been obfuscated by subtracting 11 from
every byte. Now I want to bring it back to regular text. To do this I have
to add 11 to each byte read from that file. Now, I have tried several ways
to do it, and they all seemed every inefficient to me. Two examples follow
*j = 0For i = 0 To Len(RawText)str &= Chr(CByte(Asc(RawText, i) + 11)) '
either this or the following'Mid(Rawtext, i, 1) = Chr(CByte(Asc(RawText, i)
+ 11))Inc jIf j = 100000 Then Print i; Now j = 0EndifNext*
In the first option (uncommented) I am building a new string byte by byte.
In the second option (commented) I am replacing each character in place.
I expected the second option to be way faster, especially because there is
no need for the string to be reallocated. Nevertheless, it showed to be a
snail.
The first option, in spite of the fact that it grows slower and slower as
the string grows, is still way faster than the second option.
To me it does not make sense. Does it for you?
Also, is there a faster way to do this?
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
Telegram: +55 (37) 99988-8868 <(37)%2099988-8868>
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868 <(37)%2099988-8868>
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183 <(37)%203521-2183>
Telefone celular: +55 (37) 99988-8868 <(37)%2099988-8868>
Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Caveat
2017-07-15 20:28:38 UTC
Permalink
Something is horribly wrong, or you're running on a 286 :-)

I just tested here, and the program runs on a 51 MB test file in about 5
seconds.

Some reasonably well commented code for you...

Public Sub Main()

Dim inFile, outFile As File
Dim buff As New Byte[1024]
Dim idx, remBytes, readSize As Integer

' CHANGE THIS to your input file
inFile = Open "/home/caveat/Downloads/mytestfile" For Read

' CHANGE THIS to your output file
outFile = Open "/home/caveat/Downloads/mytestfile.out2" For Create

' Remaining bytes starts as the total length of the file
remBytes = Lof(inFile)

' Until we reach the end of the input file...guess you could instead
check on remBytes...
While Not Eof(inFile)
If remBytes > buff.length Then
' Limit reading to the size of our buffer (the Byte[])
readSize = buff.length
Else
' Only read the bytes we have left into our buffer (the Byte[])
readSize = remBytes
Endif
' Read from the input file into our buffer, starting at offset 0 in
the buffer
buff.Read(inFile, 0, readSize)
' Update the number of bytes remaining...
remBytes = remBytes - readSize
' Run round each byte in our buffer
For idx = 0 To buff.length - 1
' Dunno if you need any conditions, I check for > 30 as I can put
newlines in the file to make it more readable for testing
If buff[idx] > 30 Then
' This is the 'trick' you need to apply... subtract 11 from
every byte in the file
' Not sure how you deal with edge cases... if you have a byte
of 5, is your result then 250?
buff[idx] = buff[idx] - 11
Endif
Next
' Write the whole buffer out to the output file
buff.Write(outFile, 0, readSize)
Wend

Close #inFile
Close #outFile

End


Kind regards,
Caveat
Post by Tony Morehen
Public Sub Main()
Dim sIn as String
Dim sOut as String
sIn = File.Load("/home/fernando/temp/deah001.dhn")
sOut = Add11(sIn)
File.Save("/home/fernando/temp/deah001.11Added.dhn", sOut)
End
Public Sub Add11(InputString as String) as String
Dim bArray As Byte[]
Dim String11 As String
Dim i As Integer
bArray = Byte[].FromString(InputString)
For i = 0 To bArray.Max
bArray[i] += 11
Next
Return bArray.ToString
End
You don't have to use Byte[].FromString.
You can use the Bute[].Read() method instead, to load the file
directly into the array. You save an intermediate string that way.
Regards,
Fernando Cabral
2017-07-16 12:13:39 UTC
Permalink
Thank you, Caveat [emptor?].
The code you proposed worked very, very well.
In fact, I timed it against two versions of the C program and the result
was quite good.
In C, reading from the standard input and writing to the standard output
took a trifle beyond half a second (0.6?? real time).

Meanwhile, your version in Gambas ran in 2.5?? (real time). The question
mark means little variation from trial to trial.

Out of curiosity, I wrote a Gambas version similar to the C version. Here
are the two codes:

*Dim b As Byte*




* While Not Eof() b = Read As Byte Write (b + 11) As Byte Wend*

- - - - - - -





* int c; while((c = getchar()) != EOF) putchar(c + 11);*

The C version ran in 0.59 against the Gambas version that ran in 7.6
seconds (real time). Not too bad in my opinion!

Then I tried to stretch it a little bit and wrote:


* Write (( Read As Byte) + 11) As Byte*
Alas! this is something Gambas does not understand.

These are just for the sake of experience. I am happy with the solution
Caveat proposed.
Thank you.
Post by Caveat
Something is horribly wrong, or you're running on a 286 :-)
I just tested here, and the program runs on a 51 MB test file in about 5
seconds.
Some reasonably well commented code for you...
Public Sub Main()
Dim inFile, outFile As File
Dim buff As New Byte[1024]
Dim idx, remBytes, readSize As Integer
' CHANGE THIS to your input file
inFile = Open "/home/caveat/Downloads/mytestfile" For Read
' CHANGE THIS to your output file
outFile = Open "/home/caveat/Downloads/mytestfile.out2" For Create
' Remaining bytes starts as the total length of the file
remBytes = Lof(inFile)
' Until we reach the end of the input file...guess you could instead
check on remBytes...
While Not Eof(inFile)
If remBytes > buff.length Then
' Limit reading to the size of our buffer (the Byte[])
readSize = buff.length
Else
' Only read the bytes we have left into our buffer (the Byte[])
readSize = remBytes
Endif
' Read from the input file into our buffer, starting at offset 0 in
the buffer
buff.Read(inFile, 0, readSize)
' Update the number of bytes remaining...
remBytes = remBytes - readSize
' Run round each byte in our buffer
For idx = 0 To buff.length - 1
' Dunno if you need any conditions, I check for > 30 as I can put
newlines in the file to make it more readable for testing
If buff[idx] > 30 Then
' This is the 'trick' you need to apply... subtract 11 from every
byte in the file
' Not sure how you deal with edge cases... if you have a byte of
5, is your result then 250?
buff[idx] = buff[idx] - 11
Endif
Next
' Write the whole buffer out to the output file
buff.Write(outFile, 0, readSize)
Wend
Close #inFile
Close #outFile
End
Kind regards,
Caveat
Post by Tony Morehen
Public Sub Main()
Dim sIn as String
Dim sOut as String
sIn = File.Load("/home/fernando/temp/deah001.dhn")
sOut = Add11(sIn)
File.Save("/home/fernando/temp/deah001.11Added.dhn", sOut)
End
Public Sub Add11(InputString as String) as String
Dim bArray As Byte[]
Dim String11 As String
Dim i As Integer
bArray = Byte[].FromString(InputString)
For i = 0 To bArray.Max
bArray[i] += 11
Next
Return bArray.ToString
End
You don't have to use Byte[].FromString.
You can use the Bute[].Read() method instead, to load the file directly
into the array. You save an intermediate string that way.
Regards,
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Loading...