Discussion:
[Gambas-user] Isn't bracket regular expression compatible with UTF8?
Fernando Cabral
2017-07-05 01:16:32 UTC
Permalink
I have been trying something like *poder[^[:alpha:]* so I could find the
word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
an alpha character in Portuguese.)

In English it could be like finding "power" but not "powerless".

Problem is that it seems [^[alpha]] includes accented characters like "á",
"é", "ã".

That is, accented characters are not understood as alpha, but not alpha.

Please, note that I have compiled it with the UTF8 flag:
* re.Compile(poder[^[:alpha]], RegExp.utf8)*

Any hints?

- fernando
--
Fernando Cabral


Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Tobias Boege
2017-07-05 09:37:59 UTC
Permalink
Post by Fernando Cabral
I have been trying something like *poder[^[:alpha:]* so I could find the
word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
an alpha character in Portuguese.)
In English it could be like finding "power" but not "powerless".
Problem is that it seems [^[alpha]] includes accented characters like "á",
"é", "ã".
That is, accented characters are not understood as alpha, but not alpha.
* re.Compile(poder[^[:alpha]], RegExp.utf8)*
Any hints?
In your mail I can see three distinct attempts at writing down a
negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
but the correct syntax is

[[:^alpha:]]

You want to check this first.

Regards,
Tobi
--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk
Fernando Cabral
2017-07-05 09:56:32 UTC
Permalink
Post by Tobias Boege
n your mail I can see three distinct attempts at writing down a
negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
but the correct syntax is
[[:^alpha:]]
You want to check this first.
Right again, Tobi. I can't understand how I missed this. Thank you.

- fernando
Post by Tobias Boege
Post by Fernando Cabral
I have been trying something like *poder[^[:alpha:]* so I could find
the
Post by Fernando Cabral
word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
an alpha character in Portuguese.)
In English it could be like finding "power" but not "powerless".
Problem is that it seems [^[alpha]] includes accented characters like
"á",
Post by Fernando Cabral
"é", "ã".
That is, accented characters are not understood as alpha, but not alpha.
* re.Compile(poder[^[:alpha]], RegExp.utf8)*
Any hints?
In your mail I can see three distinct attempts at writing down a
negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
but the correct syntax is
[[:^alpha:]]
You want to check this first.
Regards,
Tobi
--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
https://lists.sourceforge.net/lists/listinfo/gambas-user
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: ***@gmail.com
Facebook: ***@fcabral.com.br
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
Loading...