[Gambas-user] any way to convert Result to Collection more faster than copy?

Discussion:

PICCORO McKAY Lenz

2017-06-29 22:57:29 UTC

can i convert directly or more faster than copy each row, a Result from
database to a collection or a VArian matrix?

i'm taking about 200.000 rows in a result... the problem its that the odbc
db object support only cursor with forward only..

so with a matrix or a collection i cant emulate the cursor behaviour

Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

a***@gmail.com

2017-06-30 08:09:54 UTC

Permalink

On Thu, 29 Jun 2017 18:57:29 -0400

Post by PICCORO McKAY Lenz
can i convert directly or more faster than copy each row, a Result from
database to a collection or a VArian matrix?
i'm taking about 200.000 rows in a result... the problem its that the odbc
db object support only cursor with forward only..
so with a matrix or a collection i cant emulate the cursor behaviour
Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

Interesting.

Well the row by row copy is how we do it here. I added some quick timer Prints to a program we run each day to verify that the
database updates done overnight were "clean".
The data loaded is a fairly complex join of several tables, the transactional table is 754,756 rows today and the master table is 733,723 rows long and the transactional data is compared to the master data to test a set of possible inconsistencies. ( The actual query returned a set of the transaction and master records that were actioned overnight - this generally returns about 5,000 to 10,000 rows - so I jigged it to return the pairs that were not actioned overnight thereby getting row counts of the sizes you are talking about.) So the jigged query just returned 556,000 rows. Here's the timing output.

17:05:59:706 Connecting to DB
17:06:00:202 Loading Data <---- so 406 mSec to establish the db connection
17:06:31:417 556502 rows <---- so 31,215 mSec to execute the query and return the result
17:06:31:417 Unmarshalling result started
17:06:44:758 Unmarshalling completed 556502 rows processed <--- so 13,341 mSec to unmarshall the result into an array of structs

So, it took roughly 31 seconds to execute the query and return the result of half a million rows.
To unmarshall that result into the array took just over 13 seconds. The unmarshalling is fairly well a straight field by field copy.
(Also I must add, I ran this on a local db copy on my old steam driven laptop, 32 bits and about 1G of memory.)

That's about 42 mSec unmarshalling time per row.
I don't think that is too bad. From my perspective it is the query that is eating up my life, not the unmarshalling.

What sort of times to you get?

b

(p.s. the query has been optimised until its' eyes bled. )

--
B Bruen <***@gnail.com (sort of)>

PICCORO McKAY Lenz

2017-06-30 12:41:49 UTC

Permalink

i get more than 30 minutes, due i must parse to a low end machine, not to
your 4 cores, 16Gb ram super power machine.. i'm taking about a 1G ram and
single core 1,6GHz atom cpu

i need to convert from Result/cursor to other due the problem of the odbc
lack of cursor/count ..

i thinking about use a sqlite memory structure, how can i force it?
documentation said "If Name is null, then a memory database is opened." for
sqlite..

so if i used a memory structure can be a good idea? *tested yesterday took
about 10 minutes but i dont know if i have a problem in my gambas
installation!*

Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

Post by a***@gmail.com
On Thu, 29 Jun 2017 18:57:29 -0400

odbc

Post by PICCORO McKAY Lenz
db object support only cursor with forward only..
so with a matrix or a collection i cant emulate the cursor behaviour
Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

Interesting.
Well the row by row copy is how we do it here. I added some quick timer
Prints to a program we run each day to verify that the
database updates done overnight were "clean".
The data loaded is a fairly complex join of several tables, the
transactional table is 754,756 rows today and the master table is 733,723
rows long and the transactional data is compared to the master data to test
a set of possible inconsistencies. ( The actual query returned a set of the
transaction and master records that were actioned overnight - this
generally returns about 5,000 to 10,000 rows - so I jigged it to return the
pairs that were not actioned overnight thereby getting row counts of the
sizes you are talking about.) So the jigged query just returned 556,000
rows. Here's the timing output.
17:05:59:706 Connecting to DB
17:06:00:202 Loading Data <---- so 406 mSec to establish the db connection
17:06:31:417 556502 rows <---- so 31,215 mSec to execute the query
and return the result
17:06:31:417 Unmarshalling result started
17:06:44:758 Unmarshalling completed 556502 rows processed <--- so
13,341 mSec to unmarshall the result into an array of structs
So, it took roughly 31 seconds to execute the query and return the result
of half a million rows.
To unmarshall that result into the array took just over 13 seconds. The
unmarshalling is fairly well a straight field by field copy.
(Also I must add, I ran this on a local db copy on my old steam driven
laptop, 32 bits and about 1G of memory.)
That's about 42 mSec unmarshalling time per row.
I don't think that is too bad. From my perspective it is the query that is
eating up my life, not the unmarshalling.
What sort of times to you get?
b
(p.s. the query has been optimised until its' eyes bled. )
--
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user

a***@gmail.com

2017-07-01 07:36:21 UTC

Permalink

On Fri, 30 Jun 2017 08:41:49 -0400

Post by PICCORO McKAY Lenz
i get more than 30 minutes, due i must parse to a low end machine, not to
your 4 cores, 16Gb ram super power machine.. i'm taking about a 1G ram and
single core 1,6GHz atom cpu
i need to convert from Result/cursor to other due the problem of the odbc
lack of cursor/count ..
i thinking about use a sqlite memory structure, how can i force it?
documentation said "If Name is null, then a memory database is opened." for
sqlite..
so if i used a memory structure can be a good idea? *tested yesterday took
about 10 minutes but i dont know if i have a problem in my gambas
installation!*
Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

(SNIP)

Post by PICCORO McKAY Lenz

Post by a***@gmail.com
Here's the timing output.
17:05:59:706 Connecting to DB
17:06:00:202 Loading Data <---- so 406 mSec to establish the db connection
17:06:31:417 556502 rows <---- so 31,215 mSec to execute the query
and return the result
17:06:31:417 Unmarshalling result started
17:06:44:758 Unmarshalling completed 556502 rows processed <--- so 13,341 mSec to unmarshall the result into an array of structs
So, it took roughly 31 seconds to execute the query and return the result
of half a million rows.
To unmarshall that result into the array took just over 13 seconds. The
unmarshalling is fairly well a straight field by field copy.
(Also I must add, I ran this on a local db copy on my old steam driven
laptop, 32 bits and about 1G of memory.)

(CORRECTED)

Post by PICCORO McKAY Lenz

Post by a***@gmail.com
That's about 42 rows per mSec unmarshalling time or about 0.024 mSec per row.

Well, 30 minutes does sound very excessive. Are you certain that it's the "unmarshalling" that is taking the time and not the execution of the query itself? That is why I separated the timings in my figures above.
Regarding your machine capability, my laptop is very similar to what you described (Single core, 1GB memory). The only real difference I can see is a 1.7GHtz maximum clock speed.
So I don't think that's the cause of the difference. If I imagine your query on this PC I would expect about 200000 * 0.024 mSec to unmarshall it, say about 5 seconds.

Regarding using the memory based SQLite database approach, I wouldn't think that it would help. I don't know the actual "size" of the data returned by your query, but I would expect that you would get a major memory hit and a lot of paging by going that way. I have used the memory SQLite database several times for manipulating several hundred or so records and it is quite fast but wouldn't even consider it for a dataset that large (and I guess it would be just adding another layer of processing to handle your query Result).

By the way, where is your source database? Is it on your machine or on a networked machine? I had one of the lads in our office try the same thing that I did, but using the master database on our LAN. It took a bit longer, 38 seconds to execute the query rather than 31 so as I expected, network access to the database plays a fairly large part. ~20% for a query returning a set that large.

Query optimisation? We tend to use the Connection.Exec approach here for large queries as it let's us optimise both the database and the SQL for maximum benefit rather than rely on the
Gambas driver generated queries. (That's not a criticism by the way, its just that when dealing with large datasets our results have been better.) For example, in the query I have been
talking about and using the timing, we create a temporary index on a boolean column that is one of the WHERE clause criteria, with the NULLS FIRST option set on the index. Since we
are looking to select all the rows from that table where a flag (the "reconciled" column) has not been set, they are all at the front of that index. As soon as the back end query engine hits an index entry for a row that has been reconciled it "knows" that it has finished. At the end of the query we just delete that index again. Before I did that the query execution time was several minutes and now we are down to about 5 seconds (for the "real" query on the "real" database which returns up to 10000 rows).

So again, I would looking for other causes of that massive time if I were you.

rgrds
b

--
B Bruen <***@gnail.com (sort of)>

PICCORO McKAY Lenz

2017-07-01 09:30:27 UTC

Permalink

thanks in advance adamnt42, i need to convert the result due missing odbc
important features...

Post by a***@gmail.com
Well, 30 minutes does sound very excessive. Are you certain that it's the
"unmarshalling" that is taking the time and not the execution of the query
itself? That is why I separated the timings in my figures above.

yes, its not the query.. i hit pause and the data its yet in client side..

Regarding using the memory based SQLite database approach, I wouldn't think

Post by a***@gmail.com
that it would help. I don't know the actual "size" of the data returned by
your query, but I would expect that you would get a major memory hit and a
lot of paging by going that way. I have used the memory SQLite database
several times for manipulating several hundred or so records and it is
quite fast but wouldn't even consider it for a dataset that large (and I
guess it would be just adding another layer of processing to handle your
query Result).

i made the test and in part you have right, get mayor memory hit, the only
benefice i got was now i have a valid cursor due odbc does nto offer me

By the way, where is your source database? Is it on your machine or on a

Post by a***@gmail.com
networked machine? I had one of the lads in our office try the same thing
that I did, but using the master database on our LAN. It took a bit longer,
38 seconds to execute the query rather than 31 so as I expected, network
access to the database plays a fairly large part. ~20% for a query
returning a set that large.

all of those question are not relevant, the real problem its the lack of
gambas to handle many DB features due the ODBC connection.. the cursor are
forward only so i cannot fill a gridview faster or play with it like others
DBMS

Post by a***@gmail.com
So again, I would looking for other causes of that massive time if I were you.

the only cause of my problems, its some ODBC missing features (module
driver part) and innability of gambas to connect natively to many DBMS

Post by a***@gmail.com
rgrds
b
--
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user

Fernando Cabral

2017-07-01 11:08:04 UTC

Permalink

Post by PICCORO McKAY Lenz
all of those question are not relevant, the real problem its the lack of
gambas to handle many DB features due the ODBC connection..

I think you should be more specific. Instead of saying "the real problem is
the lack of
gambas to handle many DB features", let us know which those [lacking]
features are.
I am sure if you do so the master professionals in this list will be able
to tell you if
those features are really missing; if there are good workarounds; or
perhaps if you
should forget gambas and try something different.

If gambas can not do what you have to do, then I see no point in insisting.
But, if you want to get some help in clarifying this issue, then you' be
better be
more specific.

Regards

- fernando

Post by PICCORO McKAY Lenz

------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user

--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.

PICCORO McKAY Lenz

2017-07-01 11:43:07 UTC

Permalink

Post by Fernando Cabral
I think you should be more specific. Instead of saying "the real problem is
the lack of
gambas to handle many DB features", let us know which those [lacking]
features are.

yet explainet and bug filet to gambasbugtraker .. you read the mail without
the hole behaviour

Christof Thalhofer

2017-07-01 10:02:19 UTC

Permalink

Post by PICCORO McKAY Lenz
i'm taking about 200.000 rows in a result... the problem its that the odbc
db object support only cursor with forward only..

Show us your query. For what do you need 200.000 rows? That's way too
much if you want to visialize anything.

Alles Gute

Christof Thalhofer

--
Dies ist keine Signatur

PICCORO McKAY Lenz

2017-07-01 10:35:43 UTC

Permalink

hi cristof, the query its just "select * from table" but where "table" its
a "cube" of the datawarehouse.. so i want to made a something similar to
BussinesObject.. so get 200.000 rows its not a surprise in desktop..

the other problem to force me to get so many rows its the lack of
features/connectivity to large scalar DBMS such like DB2, ASE sybase or
Oracle.. so i must et all the rows firts to later operate in the client
side, this in any case its better due avoit goin to db on each "change" of
filters...

in any case, seems the better approach its usage of in-memory sqlite db..
and for the 30 minutes in my case seems its something on the gambas
installation.. but debug and thenn report a bug its quite complicated for
me right now.. im focused in odbc+handle data for now

Lenz McKAY Gerardo (PICCORO)
http://qgqlochekone.blogspot.com

Post by Christof Thalhofer

Post by PICCORO McKAY Lenz
i'm taking about 200.000 rows in a result... the problem its that the

odbc

Post by PICCORO McKAY Lenz
db object support only cursor with forward only..

Show us your query. For what do you need 200.000 rows? That's way too
much if you want to visialize anything.
Alles Gute
Christof Thalhofer
--
Dies ist keine Signatur
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user

Christof Thalhofer

2017-07-01 15:29:16 UTC

Permalink

Post by PICCORO McKAY Lenz
hi cristof, the query its just "select * from table" but where
"table" its a "cube" of the datawarehouse.. so i want to made a
something similar to BussinesObject.. so get 200.000 rows its not a
surprise in desktop..

For a datawarehouse 200.000 rows are not very much. But in a
datawarehouse normally those jobs are running at night. Next day you
look at the results and you get them fast, because these are just a
handful of tuples(records) or there is nothing to be extracted.

Post by PICCORO McKAY Lenz
the other problem to force me to get so many rows its the lack of
features/connectivity to large scalar DBMS such like DB2, ASE sybase
or Oracle.. so i must et all the rows firts to later operate in the
client side, this in any case its better due avoit goin to db on each
"change" of filters...

From what database do you query "select * from table" with ODBC?

If you query such a lot of tuples into Sqlite you won't make it better,
I think. Also a collection seems to be not very fast.

Why not use a mature DB like Postgresql? You could try
https://wiki.postgresql.org/wiki/Foreign_data_wrappers

If the rows are in Postgres, you can do anything you want.

Alles Gute

Christof Thalhofer

--
Dies ist keine Signatur

PICCORO McKAY Lenz

2017-07-02 00:16:16 UTC

Permalink

Post by Christof Thalhofer
From what database do you query "select * from table" with ODBC?

sybase and Oracle, a propietary odbc module does all the job very good, but
i need to use open source,, and freetds have in combination with gambas
lack of a good cursor.. the cursor are foward only, and some things like
record counts from select does not are.. in the oracle way its more
complicated

Post by Christof Thalhofer
If you query such a lot of tuples into Sqlite you won't make it better,
I think. Also a collection seems to be not very fast.

tested, very slower... you have right.. sqlite memory more faster but still
slow process passed to sqlite from remote db

Post by Christof Thalhofer
Why not use a mature DB like Postgresql? You could try

so madure, so good, but not so enterprise, mayor vendors and software
bussines works only with SAP sybase and BI oracle

so if the couple of software to implement does not connect to these
database, then not exits.

Post by Christof Thalhofer
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
If the rows are in Postgres, you can do anything you want.
Alles Gute
Christof Thalhofer
--
Dies ist keine Signatur
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user

Christof Thalhofer

2017-07-02 07:42:14 UTC

Permalink

Post by PICCORO McKAY Lenz

Post by Christof Thalhofer
If you query such a lot of tuples into Sqlite you won't make it
better, I think. Also a collection seems to be not very fast.

tested, very slower... you have right.. sqlite memory more faster
butstill slow process passed to sqlite from remote db>>

Post by Christof Thalhofer
Why not use a mature DB like Postgresql? You could try

so madure, so good, but not so enterprise, mayor vendors and
software bussines works only with SAP sybase and BI oracle

What? But Sqlite is "enterprise"?

If you dont describe in a clear way, what you want to do, nobody can
help you.

Alles Gute

Christof Thalhofer

--
Dies ist keine Signatur