Discussion:
Problem with Conv() and XmlWrite
(too old to reply)
Luigi Carlotto
2008-12-14 12:08:36 UTC
Permalink
In my application I execute a rescue on XML file of some data.
The rescue executes before a conversion: Conv (string, “UTF-8”, “ASCII”)
During my tests, I have noticed that some errors are taken place:
1) if the string is larger than 2000 characters (approximately),
XmlWrite.Attribute() crash;
2) if the language comes set up to Chinese (UTF-8), the conversion with
Conv() crash; same behavior if is used XmlWrite.Attribute(), or
XmlWrite.Element(), without to execute conversions.
I have noticed that the errors take place alone if System.Language=zh_CN
and System.Charset=UTF-8, while with various languages seems to go all
good.

Error message:

encoding error : output conversion failed due to conv error, bytes 0xE5
0x31 0x32 0xE5
I/O error : encoder error

The bytes indicated from the message they do not seem to correspond,
neither are comprised, in no string between those which they come saved
in XML file.

To notice that the reading from XML file is OK, while the writing
finishes with the error described in mine precedence mail, if the
writing of XML file with System.Language=zh_CN (Chinese); with other
languages (Italian, French, English and Spanish) it is all to OK.
Benoit Minisini
2008-12-18 01:09:00 UTC
Permalink
1970-01-01 00:00:00 UTC
Permalink
Post by Luigi Carlotto
In my application I execute a rescue on XML file of some data.
The rescue executes before a conversion: Conv (string, =E2=80=9CUTF-8=E2=
=80=9D, =E2=80=9CASCII=E2=80=9D)
Post by Luigi Carlotto
1) if the string is larger than 2000 characters (approximately),
XmlWrite.Attribute() crash;
2) if the language comes set up to Chinese (UTF-8), the conversion with
Conv() crash; same behavior if is used XmlWrite.Attribute(), or
XmlWrite.Element(), without to execute conversions.
I have noticed that the errors take place alone if System.Language=3Dzh_CN
and System.Charset=3DUTF-8, while with various languages seems to go all
good.
encoding error : output conversion failed due to conv error, bytes 0xE5
0x31 0x32 0xE5
I/O error : encoder error
The bytes indicated from the message they do not seem to correspond,
neither are comprised, in no string between those which they come saved
in XML file.
To notice that the reading from XML file is OK, while the writing
finishes with the error described in mine precedence mail, if the
writing of XML file with System.Language=3Dzh_CN (Chinese); with other
languages (Italian, French, English and Spanish) it is all to OK.
Luigi Carlotto
2008-12-21 16:16:05 UTC
Permalink
Come tu affermi, un problema di conversione può essere sicuramente
causato da un carattere non convertibile; però, nelle ultime mail, avevo
specificato che le stringhe usate nei miei test, avevano tutti caratteri
in ASCII puro (<127).
E' possibile che alcuni caratteri di controllo (es. Line Feed), possano
causare qualche problema?
Ho fatto alcune prove nella Console, ma non ho riscontrato problemi con
questo tipo di caratteri.

As you assert, a conversion problem can be sure caused from a not
convertibile character; but, in the last mails, I had specified that the
strings used in my tests, had all characters in pure ASCII (<127).
It is possible that some characters of control (es. Line Feed), can
cause some problem?
I have made some tests in the Gambas2 Console, but I have not found
problems with this type of characters.
Benoit Minisini
2008-12-21 16:29:17 UTC
Permalink
Post by Luigi Carlotto
Come tu affermi, un problema di conversione può essere sicuramente
causato da un carattere non convertibile; però, nelle ultime mail, avevo
specificato che le stringhe usate nei miei test, avevano tutti caratteri
in ASCII puro (<127).
E' possibile che alcuni caratteri di controllo (es. Line Feed), possano
causare qualche problema?
Ho fatto alcune prove nella Console, ma non ho riscontrato problemi con
questo tipo di caratteri.
As you assert, a conversion problem can be sure caused from a not
convertibile character; but, in the last mails, I had specified that the
strings used in my tests, had all characters in pure ASCII (<127).
It is possible that some characters of control (es. Line Feed), can
cause some problem?
I have made some tests in the Gambas2 Console, but I have not found
problems with this type of characters.
If you are sure that your strings are ASCII, why doing
Conv(TheString, "UTF-8", "ASCII")?

And the error message printed by the underlying iconv library is clear: your
string is not ASCII.

According to you, The toString() methods always returns an accurate string.

According to the source code, Conv() receives a non-ascii string.

Can you check that, by splitting your code?

FOR EACH oItem IN oGroup.Items
sType = oItem.toString()
sType = Conv(sType, "UTF-8", "ASCII")
oXml.Attribute(oItem.Name, sType)
NEXT

As for the crash with attributes values larger than 2000 characters, I will
look later. But maybe you could send me a little project that shows the bug
in that specific case? It will help a lot.

Regards,
--
Benoit Minisini
Luigi Carlotto
2008-12-22 19:38:45 UTC
Permalink
Post by Benoit Minisini
If you are sure that your strings are ASCII, why doing
Conv(TheString, "UTF-8", "ASCII")?
Yes!
Post by Benoit Minisini
And the error message printed by the underlying iconv library is
clear: your
Post by Benoit Minisini
string is not ASCII.
No (see result)!
Post by Benoit Minisini
According to you, The toString() methods always returns an accurate
string.
Yes!
Post by Benoit Minisini
According to the source code, Conv() receives a non-ascii string.
No!
Post by Benoit Minisini
Can you check that, by splitting your code?
FOR EACH oItem IN oGroup.Items
sType = oItem.toString()
sType = Conv(sType, "UTF-8", "ASCII")
oXml.Attribute(oItem.Name, sType)
NEXT
Result:

...
150 FOR EACH oItem IN oGroup.Items
151 sType = oItem.toString()
152 PRINT sType
153 sType = Conv(sType, "UTF-8", "ASCII")
154 oXml.Attribute(oItem.Name, sType)
155 NEXT
...

sType="ACTION,ADD,ALL,ALTER,ANALYZE,AND,AS,ASC,ASENSITIVE,BEFORE,BETWEEN,BIGINT,BINARY,BIT,BLOB,BOTH,BY,CALL,CASCADE,CASE,CHANGE,CHAR,CHARACTER,CHECK,COLLATE,COLUMN,CONDITION,CONNECTION,CONSTRAINT,CONTINUE,CONVERT,CREATE,CROSS,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR DATABASE,DATABASES,DATE,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEC,DECIMAL,DECLARE,DEFAULT,DELAYED,DELETE,DESC,DESCRIBE,DETERMINISTIC,DISTINCT,DISTINCTROW,DIV,DOUBLE,DROP,DUAL,EACH,ELSE,ELSEIF,ENCLOSED,ENUM,ESCAPED,EXISTS,EXIT,EXPLAIN,FALSE,FETCH,FLOAT,FLOAT4,FLOAT8,FOR,FORCE,FOREIGN,FROM,FULLTEXT,GOTO,GRANT,GROUP,HAVING,HIGH_PRIORITY,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IF,IGNORE,IN,INDEX,INFILE,INNER,INOUT,INSENSITIVE,INSERT,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,IS,ITERATE,JOIN,KEY,KEYS,KILL,LABEL,LEADING,LEAVE,LEFT,LIKE,LIMIT,LINES,LOAD,LOCALTIME,LOCALTIMESTAMP,LOCK,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MATCH,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MIDDLEINT,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODIFIES,NATURALNOT,NO,NO_WRITE_TO_BINLOG,NULL NUMERIC,ON,OPTIMIZE,OPTION,OPTIONALLY,OR,ORDER,OUT,OUTER,OUTFILE,PRECISION,PRIMARY,PROCEDURE,PURGE,READ,READS,REAL,REFERENCES,REGEXP,RELEASE,RENAME,REPEAT,REPLACE,REQUIRE,RESTRICT,RETURN,REVOKE,RIGHT,RLIKE,SCHEMA,SCHEMAS,SECOND_MICROSECOND,SELECT,SENSITIVE,SEPARATOR,SET,SHOW,SMALLINT,SONAME,SPATIAL,SPECIFIC,SQL,SQLEXCEPTION,SQLSTATE,SQLWARNING,SQL_BIG_RESULT,SQL_CALC_FOUND_ROWS,SQL_SMALL_RESULT,SSL STARTING,STRAIGHT_JOIN,TABLE TERMINATED,TEXT,THEN,TIME,TIMESTAMP,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRIGGER,TRUE,UNDO,UNION,UNIQUE,UNLOCK,UNSIGNED,UPDATE,UPGRADE,USAGE,USE,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARYING,WHEN,WHERE,WHILE,WITH,WRITE,XOR,YEAR_MONTH,ZEROFILL"
Error: Error writing XML data
Code: -1
Class: pgConfig
Where: pgConfig.Save.154

As you can see, the error not verification in the conversion, but in the
writing on XML, and the string “sType” (too long...) contains alone
characters ASCII.
Post by Benoit Minisini
As for the crash with attributes values larger than 2000 characters, I
will
Post by Benoit Minisini
look later. But maybe you could send me a little project that shows
the bug
Post by Benoit Minisini
in that specific case? It will help a lot.
Ok! (see result).

Thanks
Benoit Minisini
2008-12-22 19:47:34 UTC
Permalink
Post by Benoit Minisini
Post by Benoit Minisini
If you are sure that your strings are ASCII, why doing
Conv(TheString, "UTF-8", "ASCII")?
Yes!
The question was "why?"
Post by Benoit Minisini
Post by Benoit Minisini
And the error message printed by the underlying iconv library is
clear: your
Post by Benoit Minisini
string is not ASCII.
No (see result)!
Post by Benoit Minisini
According to you, The toString() methods always returns an accurate
string.
Yes!
Post by Benoit Minisini
According to the source code, Conv() receives a non-ascii string.
No!
Post by Benoit Minisini
Can you check that, by splitting your code?
FOR EACH oItem IN oGroup.Items
sType = oItem.toString()
sType = Conv(sType, "UTF-8", "ASCII")
oXml.Attribute(oItem.Name, sType)
NEXT
...
150 FOR EACH oItem IN oGroup.Items
151 sType = oItem.toString()
152 PRINT sType
153 sType = Conv(sType, "UTF-8", "ASCII")
154 oXml.Attribute(oItem.Name, sType)
155 NEXT
...
sType="ACTION,ADD,ALL,ALTER,ANALYZE,AND,AS,ASC,ASENSITIVE,BEFORE,BETWEEN,BI
GINT,BINARY,BIT,BLOB,BOTH,BY,CALL,CASCADE,CASE,CHANGE,CHAR,CHARACTER,CHECK,C
OLLATE,COLUMN,CONDITION,CONNECTION,CONSTRAINT,CONTINUE,CONVERT,CREATE,CROSS,
CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR
DATABASE,DATABASES,DATE,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEC,
DECIMAL,DECLARE,DEFAULT,DELAYED,DELETE,DESC,DESCRIBE,DETERMINISTIC,DISTINCT,
DISTINCTROW,DIV,DOUBLE,DROP,DUAL,EACH,ELSE,ELSEIF,ENCLOSED,ENUM,ESCAPED,EXIS
TS,EXIT,EXPLAIN,FALSE,FETCH,FLOAT,FLOAT4,FLOAT8,FOR,FORCE,FOREIGN,FROM,FULLT
EXT,GOTO,GRANT,GROUP,HAVING,HIGH_PRIORITY,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_
SECOND,IF,IGNORE,IN,INDEX,INFILE,INNER,INOUT,INSENSITIVE,INSERT,INT,INT1,INT
2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,IS,ITERATE,JOIN,KEY,KEYS,KILL,LABEL,L
EADING,LEAVE,LEFT,LIKE,LIMIT,LINES,LOAD,LOCALTIME,LOCALTIMESTAMP,LOCK,LONG,L
ONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MATCH,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MID
DLEINT,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODIFIES,NATURALNOT,NO,NO_WRITE_
TO_BINLOG,NULL
NUMERIC,ON,OPTIMIZE,OPTION,OPTIONALLY,OR,ORDER,OUT,OUTER,OUTFILE,PRECISION,
PRIMARY,PROCEDURE,PURGE,READ,READS,REAL,REFERENCES,REGEXP,RELEASE,RENAME,REP
EAT,REPLACE,REQUIRE,RESTRICT,RETURN,REVOKE,RIGHT,RLIKE,SCHEMA,SCHEMAS,SECOND
_MICROSECOND,SELECT,SENSITIVE,SEPARATOR,SET,SHOW,SMALLINT,SONAME,SPATIAL,SPE
CIFIC,SQL,SQLEXCEPTION,SQLSTATE,SQLWARNING,SQL_BIG_RESULT,SQL_CALC_FOUND_ROW
S,SQL_SMALL_RESULT,SSL STARTING,STRAIGHT_JOIN,TABLE
TERMINATED,TEXT,THEN,TIME,TIMESTAMP,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,T
RIGGER,TRUE,UNDO,UNION,UNIQUE,UNLOCK,UNSIGNED,UPDATE,UPGRADE,USAGE,USE,USING
,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARYI
NG,WHEN,WHERE,WHILE,WITH,WRITE,XOR,YEAR_MONTH,ZEROFILL" Error: Error writing
XML data
Code: -1
Class: pgConfig
Where: pgConfig.Save.154
As you can see, the error not verification in the conversion, but in the
writing on XML, and the string “sType” (too long...) contains alone
characters ASCII.
Why don't you have the same error as in your first post?

Does the sType string have newlines inside?
--
Benoit Minisini
Luigi Carlotto
2008-12-22 20:27:17 UTC
Permalink
Post by Benoit Minisini
Post by Benoit Minisini
If you are sure that your strings are ASCII, why doing
Conv(TheString, "UTF-8", "ASCII")?
Yes!
Post by Benoit Minisini
And the error message printed by the underlying iconv library is
clear: your
Post by Benoit Minisini
string is not ASCII.
No (see result)!
Post by Benoit Minisini
According to you, The toString() methods always returns an accurate
string.
Yes!
Post by Benoit Minisini
According to the source code, Conv() receives a non-ascii string.
No!
Post by Benoit Minisini
Can you check that, by splitting your code?
FOR EACH oItem IN oGroup.Items
sType = oItem.toString()
sType = Conv(sType, "UTF-8", "ASCII")
oXml.Attribute(oItem.Name, sType)
NEXT
...
150 FOR EACH oItem IN oGroup.Items
151 sType = oItem.toString()
152 PRINT sType
153 sType = Conv(sType, "UTF-8", "ASCII")
154 oXml.Attribute(oItem.Name, sType)
155 NEXT
...
sType="ACTION,ADD,ALL,ALTER,ANALYZE,AND,AS,ASC,ASENSITIVE,BEFORE,BETWEEN,BIGINT,BINARY,BIT,BLOB,BOTH,BY,CALL,CASCADE,CASE,CHANGE,CHAR,CHARACTER,CHECK,COLLATE,COLUMN,CONDITION,CONNECTION,CONSTRAINT,CONTINUE,CONVERT,CREATE,CROSS,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR DATABASE,DATABASES,DATE,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEC,DECIMAL,DECLARE,DEFAULT,DELAYED,DELETE,DESC,DESCRIBE,DETERMINISTIC,DISTINCT,DISTINCTROW,DIV,DOUBLE,DROP,DUAL,EACH,ELSE,ELSEIF,ENCLOSED,ENUM,ESCAPED,EXISTS,EXIT,EXPLAIN,FALSE,FETCH,FLOAT,FLOAT4,FLOAT8,FOR,FORCE,FOREIGN,FROM,FULLTEXT,GOTO,GRANT,GROUP,HAVING,HIGH_PRIORITY,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IF,IGNORE,IN,INDEX,INFILE,INNER,INOUT,INSENSITIVE,INSERT,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,IS,ITERATE,JOIN,KEY,KEYS,KILL,LABEL,LEADING,LEAVE,LEFT,LIKE,LIMIT,LINES,LOAD,LOCALTIME,LOCALTIMESTAMP,LOCK,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MATCH,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MIDDLEINT,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODIFIES,NATURALNOT,NO,NO_WRITE_TO_BINLOG,NULL NUMERIC,ON,OPTIMIZE,OPTION,OPTIONALLY,OR,ORDER,OUT,OUTER,OUTFILE,PRECISION,PRIMARY,PROCEDURE,PURGE,READ,READS,REAL,REFERENCES,REGEXP,RELEASE,RENAME,REPEAT,REPLACE,REQUIRE,RESTRICT,RETURN,REVOKE,RIGHT,RLIKE,SCHEMA,SCHEMAS,SECOND_MICROSECOND,SELECT,SENSITIVE,SEPARATOR,SET,SHOW,SMALLINT,SONAME,SPATIAL,SPECIFIC,SQL,SQLEXCEPTION,SQLSTATE,SQLWARNING,SQL_BIG_RESULT,SQL_CALC_FOUND_ROWS,SQL_SMALL_RESULT,SSL STARTING,STRAIGHT_JOIN,TABLE TERMINATED,TEXT,THEN,TIME,TIMESTAMP,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRIGGER,TRUE,UNDO,UNION,UNIQUE,UNLOCK,UNSIGNED,UPDATE,UPGRADE,USAGE,USE,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARYING,WHEN,WHERE,WHILE,WITH,WRITE,XOR,YEAR_MONTH,ZEROFILL"
Error: Error writing XML data
Code: -1
Class: pgConfig
Where: pgConfig.Save.154
As you can see, the error not verification in the conversion, but in
the writing on XML, and the string “sType” (too long...) contains
alone characters ASCII.
Post by Benoit Minisini
As for the crash with attributes values larger than 2000 characters,
I will
Post by Benoit Minisini
look later. But maybe you could send me a little project that shows
the bug
Post by Benoit Minisini
in that specific case? It will help a lot.
Ok! (see result).
Thanks
I have forgotten to add the rest of the error message, but it is
identical to the first one post.

encoding error: output conversion failed two to conv error, bytes 0xE5
0x31 0x32 0xE5 I/O error: encoder error
Error: Given Error writing XML
Tails: -1

The string does not contain characters of control (es. LineFeed).
Doriano Blengino
2008-12-23 07:24:36 UTC
Permalink
Post by Luigi Carlotto
I have forgotten to add the rest of the error message, but it is
identical to the first one post.
encoding error: output conversion failed two to conv error, bytes 0xE5
0x31 0x32 0xE5 I/O error: encoder error
Error: Given Error writing XML
Tails: -1
The string does not contain characters of control (es. LineFeed).
The hexadecimal bytes are clear enough to me: 0x31 is a "1", 0x32 a "2",
and 0xE5 is *not* an ascii character but, in Unicode, a lowercase "A"
with a small circle above it (don't know how to name it).

This could be a date in italian/european format, like 25-12-2008 or
25/12/2008 (Christmas). I have seen that browsers, which can display
different character sets, sometimes display a strange character instead
of the normal one for the apostrophe. This can happen to dashes or
hyphens and, who knows, to other characters too. It remember that was
caused by an incorrect implementation of ISO-8859-xx.

One quick way to verify this is to change the date, for example to
02-01-2009, and run the test again: if the hex bytes come out like 0xE5
0x30 0x31 0xE5, then this is the problem. Or, using a more scientific
test like Benoit suggested, do an hexdump and see if there are strange
characters: chars less than 0x20 are not good, and so are those greater
than 0x7f.

Supposing that this is the problem, I don't know what to say anymore - a
bad encoding implementation (mysql or gambas fault) or bad
pairing/coupling of encodings (programmer's fault) has to be found
somewhere...

Cheers,
--
Doriano Blengino

"Listen twice before you speak.
This is why we have two ears, but only one mouth."
Doriano Blengino
2008-12-23 07:48:35 UTC
Permalink
I was not firmly sure to send this wishes to the list but, after a few
milliseconds of intense thinking, I realized we all are, in some way, a
'community'. So I send this to you all, hoping you to have good holidays
and and a wonderful 2009.

Best regards to all,
Doriano.
--
Doriano Blengino

"Listen twice before you speak.
This is why we have two ears, but only one mouth."
Luigi Carlotto
2008-12-27 18:58:54 UTC
Permalink
The strings do not contain characters of “Carriage Return” or “Line
Feed”, but only space; if you have noticed interruptions, it is only a
problem with the mail, or the editor using the function cut&paste.
Probably, the option of “wrapping” of the editor, divides the line
through spaces, but in the code all in a same and only string are
comprised.

The sql commandos are, sometimes, composed from more of a word key, and
they do not have sense if dealt in separate way; an example could be the
definition of a PostgreSQL field of type “timestamp with time zones”.
For program requirements, I have had to list in exact way, all the words
key of the language sql, used from the motor of the database.

I have already executed the test on the single characters of the string,
and I have not evidenced anomalous situations.
As I have already written, the problem verification in phase of passage
of the string to function XML; the conversion executed from Conv() is
very well.
If I execute “PRINT Conv(sType, “UTF-8”, “ASCII”)” from Console, the
string it comes visualized in corrected way, with any LANG.

You perfectly have reason on the anomalous use of the attributes, but I
have had to use this logic because of an other problem, that I have
uncovered using the gb.xml library; it seems that it is not way to read
xml files, composed from tag multilevel.
With my tests, I have verified that the use of a hierarchical structure,
with advanced levels to 2, does not come read, that is, they come only
read the tag of first and second level; if a third level is present,
comes ignored, etc.

Like in other languages, the document xml begins with tag “root” (level
1), to which they are connected of the elements (level 2); Every element
has, in its turn (attributes to part), a series under elements (level
3), and thus via, in hierarchical way.
The methods of the library do not allow to read these ulterior elements,
for which they have been forced to use a structure with 2 single levels,
and the attributes for the values of the single element.

But to part this, probably, and as I had supposed, exists some problem
in the writing of attributes much large; but the single anomaly
verification with the use of an Asian language, while with the European
languages all it works well.

It pardons me for my English bad one.
Ron_1st
2008-12-27 20:33:42 UTC
Permalink
Post by Luigi Carlotto
The strings do not contain characters of “Carriage Return” or “Line
Feed”, but only space; if you have noticed interruptions, it is only a
problem with the mail, or the editor using the function cut&paste.
Probably, the option of “wrapping” of the editor, divides the line
through spaces, but in the code all in a same and only string are
comprised.
The sql commandos are, sometimes, composed from more of a word key, and
they do not have sense if dealt in separate way; an example could be the
definition of a PostgreSQL field of type “timestamp with time zones”.
For program requirements, I have had to list in exact way, all the words
key of the language sql, used from the motor of the database.
I have already executed the test on the single characters of the string,
and I have not evidenced anomalous situations.
As I have already written, the problem verification in phase of passage
of the string to function XML; the conversion executed from Conv() is
very well.
If I execute “PRINT Conv(sType, “UTF-8”, “ASCII”)” from Console, the
string it comes visualized in corrected way, with any LANG.
You perfectly have reason on the anomalous use of the attributes, but I
have had to use this logic because of an other problem, that I have
uncovered using the gb.xml library; it seems that it is not way to read
xml files, composed from tag multilevel.
With my tests, I have verified that the use of a hierarchical structure,
with advanced levels to 2, does not come read, that is, they come only
read the tag of first and second level; if a third level is present,
comes ignored, etc.
Like in other languages, the document xml begins with tag “root” (level
1), to which they are connected of the elements (level 2); Every element
has, in its turn (attributes to part), a series under elements (level
3), and thus via, in hierarchical way.
The methods of the library do not allow to read these ulterior elements,
for which they have been forced to use a structure with 2 single levels,
and the attributes for the values of the single element.
But to part this, probably, and as I had supposed, exists some problem
in the writing of attributes much large; but the single anomaly
verification with the use of an Asian language, while with the European
languages all it works well.
It pardons me for my English bad one.
OK, understand the nasty problem of two levels.
It explains also why I had some problems to in the past.
I switched over to PHP for that matter.

Just a question, when using XML there is also a *.dtd explaining the elements
and the hiearchi of them.
Is it it allowed to have such long string and using spaces inside?
Does a space have speial meaning in the attributes value declaration?

In HTML using the class="first second" means the tag has two classes, 'first' and 'second'
May be something like this is also valid for your XML usage.

No problem for english, mine is also bad :)


Best regards,

Ron_1st

--
Luigi Carlotto
2008-12-28 00:20:51 UTC
Permalink
The strings do not contain characters of “Carriage Return” or “Line
Feed”, but only space; if you have noticed interruptions, it is only a
problem with the mail, or the editor using the function cut&paste.
Probably, the option of “wrapping” of the editor, divides the line
through spaces, but in the code all in a same and only string are
comprised.
The sql commandos are, sometimes, composed from more of a word key,
and they do not have sense if dealt in separate way; an example could
be the definition of a PostgreSQL field of type “timestamp with time
zones”. For program requirements, I have had to list in exact way, all
the words key of the language sql, used from the motor of the
database.
I have already executed the test on the single characters of the
string, and I have not evidenced anomalous situations.
As I have already written, the problem verification in phase of
passage of the string to function XML; the conversion executed from
Conv() is very well.
If I execute “PRINT Conv(sType, “UTF-8”, “ASCII”)” from Console, the
string it comes visualized in corrected way, with any LANG.
You perfectly have reason on the anomalous use of the attributes, but
I have had to use this logic because of an other problem, that I have
uncovered using the gb.xml library; it seems that it is not way to
read xml files, composed from tag multilevel.
With my tests, I have verified that the use of a hierarchical
structure, with advanced levels to 2, does not come read, that is,
they come only read the tag of first and second level; if a third
level is present, comes ignored, etc.
Like in other languages, the document xml begins with tag
“root” (level 1), to which they are connected of the elements (level
2); Every element has, in its turn (attributes to part), a series
under elements (level 3), and thus via, in hierarchical way.
The methods of the library do not allow to read these ulterior
elements, for which they have been forced to use a structure with 2
single levels, and the attributes for the values of the single
element.
But to part this, probably, and as I had supposed, exists some problem
in the writing of attributes much large; but the single anomaly
verification with the use of an Asian language, while with the
European languages all it works well.
It pardons me for my English bad one.
I have modified the procedures of reading/writing of XML files,
following your relative suggestion I use to it of the elements
(eliminating the attributes).
In spite of this modification, the problem still remains

I send new file, in attached, so that it can be read with browser or a
editor of text, to eliminate problems of reading of invisible
characters.
The error code is always the same one:

encoding error : output conversion failed due to conv error, bytes 0xE5
0x31 0x32 0xE5
I/O error : encoder error
Error: Error writing XML data
Code: -1

I have divided the code, so as to verify if the problem is caused from
the conversion (Conv), or from the writing xml; you I can confirm that
the error verification in correspondence of the writing xml, while the
conversion comes made correctly.

With the exception of the byte 0x31 (character “1”) and 0x32 (character
“2”), character 0xE5 (224 binary) does not come understood; in effects,
using PRINT Chr (224) in Gambas Console, they come printed two
rectangles white. This character, but, is not present in the string
where the program jams; the string is: ”--,/*, *“ (excluded apexes). The
same string comes many times over saved, in the same function, but the
program jams alone on the last one.
Luigi Carlotto
2008-12-28 00:56:33 UTC
Permalink
Post by Luigi Carlotto
The strings do not contain characters of “Carriage Return” or “Line
Feed”, but only space; if you have noticed interruptions, it is only
a problem with the mail, or the editor using the function cut&paste.
Probably, the option of “wrapping” of the editor, divides the line
through spaces, but in the code all in a same and only string are
comprised.
The sql commandos are, sometimes, composed from more of a word key,
and they do not have sense if dealt in separate way; an example
could be the definition of a PostgreSQL field of type “timestamp
with time zones”. For program requirements, I have had to list in
exact way, all the words key of the language sql, used from the
motor of the database.
I have already executed the test on the single characters of the
string, and I have not evidenced anomalous situations.
As I have already written, the problem verification in phase of
passage of the string to function XML; the conversion executed from
Conv() is very well.
If I execute “PRINT Conv(sType, “UTF-8”, “ASCII”)” from Console, the
string it comes visualized in corrected way, with any LANG.
You perfectly have reason on the anomalous use of the attributes,
but I have had to use this logic because of an other problem, that I
have uncovered using the gb.xml library; it seems that it is not way
to read xml files, composed from tag multilevel.
With my tests, I have verified that the use of a hierarchical
structure, with advanced levels to 2, does not come read, that is,
they come only read the tag of first and second level; if a third
level is present, comes ignored, etc.
Like in other languages, the document xml begins with tag
“root” (level 1), to which they are connected of the elements (level
2); Every element has, in its turn (attributes to part), a series
under elements (level 3), and thus via, in hierarchical way.
The methods of the library do not allow to read these ulterior
elements, for which they have been forced to use a structure with 2
single levels, and the attributes for the values of the single
element.
But to part this, probably, and as I had supposed, exists some
problem in the writing of attributes much large; but the single
anomaly verification with the use of an Asian language, while with
the European languages all it works well.
It pardons me for my English bad one.
I have modified the procedures of reading/writing of XML files,
following your relative suggestion I use to it of the elements
(eliminating the attributes).
In spite of this modification, the problem still remains

I send new file, in attached, so that it can be read with browser or a
editor of text, to eliminate problems of reading of invisible
characters.
encoding error : output conversion failed due to conv error, bytes
0xE5 0x31 0x32 0xE5
I/O error : encoder error
Error: Error writing XML data
Code: -1
I have divided the code, so as to verify if the problem is caused from
the conversion (Conv), or from the writing xml; you I can confirm that
the error verification in correspondence of the writing xml, while the
conversion comes made correctly.
With the exception of the byte 0x31 (character “1”) and 0x32
(character “2”), character 0xE5 (224 binary) does not come understood;
in effects, using PRINT Chr (224) in Gambas Console, they come printed
two rectangles white. This character, but, is not present in the
string where the program jams; the string is: ”--,/*, *“ (excluded
apexes). The same string comes many times over saved, in the same
function, but the program jams alone on the last one.
In attached shipment 3 classes, that I have constructed on the base of
the objects of the gb.xml library (XmlReader and XmlWriter).
These classes directly work in memory, loading the xml files in a
structure, that it can be read inside of a program.
The writing comes carried out through the same structure of memory.
This logic me slightly seems or faster, and fastly releases the
connections to the file.

Only problem, perhaps, could be the management of large files; I have
tried with xml of 2Mbyte, and I am fast.
Perhaps he can be useful in Gambas

The structure follows logic, also applied in other languages, than part
from an object Document (pgXmlDocument), that a Root element contains;
like then for all the other elements (pgXmlElement), it contains an
Array of elements and an Array of attributes (pgXmlAttribute).
The connection with the object father, comes managed from the property
Parent (Root does not have Parent, obviously).

It makes me to know if it can interest.

Ron_1st
2008-12-23 01:48:44 UTC
Permalink
Post by Luigi Carlotto
sType="ACTION,ADD,ALL,ALTER,ANALYZE,AND,AS,ASC,ASENSITIVE,BEFORE,BETWEEN,BIGINT,BINARY,BIT,BLOB,BOTH,BY,CALL,CASCADE,CASE,CHANGE,CHAR,CHARACTER,CHECK,COLLATE,COLUMN,CONDITION,CONNECTION,CONSTRAINT,CONTINUE,CONVERT,CREATE,CROSS,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR DATABASE,DATABASES,DATE,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEC,DECIMAL,DECLARE,DEFAULT,DELAYED,DELETE,DESC,DESCRIBE,DETERMINISTIC,DISTINCT,DISTINCTROW,DIV,DOUBLE,DROP,DUAL,EACH,ELSE,ELSEIF,ENCLOSED,ENUM,ESCAPED,EXISTS,EXIT,EXPLAIN,FALSE,FETCH,FLOAT,FLOAT4,FLOAT8,FOR,FORCE,FOREIGN,FROM,FULLTEXT,GOTO,GRANT,GROUP,HAVING,HIGH_PRIORITY,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IF,IGNORE,IN,INDEX,INFILE,INNER,INOUT,INSENSITIVE,INSERT,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,IS,ITERATE,JOIN,KEY,KEYS,KILL,LABEL,LEADING,LEAVE,LEFT,LIKE,LIMIT,LINES,LOAD,LOCALTIME,LOCALTIMESTAMP,LOCK,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MATCH,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MIDDLEINT,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODIFIES,NATURALNOT,NO,NO_WRITE_TO_BINLOG,NULL NUMERIC,ON,OPTIMIZE,OPTION,OPTIONALLY,OR,ORDER,OUT,OUTER,OUTFILE,PRECISION,PRIMARY,PROCEDURE,PURGE,READ,READS,REAL,REFERENCES,REGEXP,RELEASE,RENAME,REPEAT,REPLACE,REQUIRE,RESTRICT,RETURN,REVOKE,RIGHT,RLIKE,SCHEMA,SCHEMAS,SECOND_MICROSECOND,SELECT,SENSITIVE,SEPARATOR,SET,SHOW,SMALLINT,SONAME,SPATIAL,SPECIFIC,SQL,SQLEXCEPTION,SQLSTATE,SQLWARNING,SQL_BIG_RESULT,SQL_CALC_FOUND_ROWS,SQL_SMALL_RESULT,SSL STARTING,STRAIGHT_JOIN,TABLE TERMINATED,TEXT,THEN,TIME,TIMESTAMP,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRIGGER,TRUE,UNDO,UNION,UNIQUE,UNLOCK,UNSIGNED,UPDATE,UPGRADE,USAGE,USE,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARYING,WHEN,WHERE,WHILE,WITH,WRITE,XOR,YEAR_MONTH,ZEROFILL"
Error: Error writing XML data
Code: -1
Class: pgConfig
Where: pgConfig.Save.154
I see something where the wrong characters may be at the end of following lines.
First line:
...CURSOR,CURSOR
DATABASE,

Second line:
...,NO,NO_WRITE_TO_BINLOG,NULL
NUMERIC,ON,OPTIMIZE

Third line:
...,SQL_SMALL_RESULT,SSL
NUMERIC,ON,

Forth line:
...STRAIGHT_JOIN,TABLE
TERMINATED,TEXT,

In fact all items are comma seperated but a few does have a space in them. that is the reason they
are spread to several lines.
I sayd a _space_ because it looks like that by wrapping on it.
It does not make sence to me to use spaces inside the items as the others are a continues string of
several word parts, like TINYINT and TIMESTAMP, why should 'TABLETERMINATED' split to 'TABLE TERMINAED' ?

I suggest to write a routine to show the ASCII number of all characters in the sType string to
see discrepancies in it and look carefull to the space alikes positions.
This way I would try to find why it goes wrong.


Beside of this problem I think you do something wrong here. (just personal feeling)

Beside of that it looks to me strange to have attributes with such long/big-size values.
The words in the sType string are data and should be inside elements and not
in the element declaration itself as attribute who are telling about those
embeded elements and/or the properties about the element itself.


<mytag stype="ACTION,ADD,..."> is IMHO fault

It should be declared as element object, attributes tells something about how to
see/handle/interpret information inside the tag and not be the data itself as sType is here.

<mytag charset="ascii">
<stypes>
<stype>ACTION</stype>
<stype>ADD</stype>
....
</stypes>
</mytag>

Just to clearout in html you do this

<ul charset="ascii" type="ACTION,ADD,...">
<li>Keywords available</li>
</ul>

but mean

<ul charset="ascii">
<li>ACTION</li>
<li>ADD</li>
....
</ul>

I know this is not the best example, for the TAG input you can have value="ACTION,ADD"
but those elements are part of the FORM tag and are/have special(s) usage.
I should have used DT and DD (definition) tags instead the list tags.
As sayd before it is just a personal feeling, you circumstance can be compleet different.

Best regards,

Ron_1st

--
Loading...