Search code examples
mysqlregexutf-8character-encodingcyrillic

mysql delete one cyrillic word select one cyrillic word


mysql>show full columns from  bpsw;
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| Field  | Type             | Collation | Null | Key | Default | Extra          | Privileges                      | Comment |
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| bpswid | int(10) unsigned | NULL      | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| badpsw | varchar(128)     | utf8_bin  | NO   | UNI | NULL    |                | select,insert,update,references |         |
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+

dont say me about not null and default null :)

mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER());
+----------------+-----------------+-------------------+
| USER()         | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+-------------------+
| root@localhost | utf8            | utf8_general_ci   |
+----------------+-----------------+-------------------+
1 row in set (0.00 sec)

table contents:

mysql> select * from bpsw limit X offset XXX ;
+--------+------------------------+
| bpswid | badpsw                 |
+--------+------------------------+
| 495883 | by all manner of means |
| 495884 | by all means           |
| 495885 | by all odds            |
| 495886 | by an ace              |
| 495887 | by an iota             |
| 495888 | by and by              |
| 495889 | by and large           |
| 495890 | by any chance          |
| 495891 | by any manner of means |
| 495892 | by any means           |
+--------+------------------------+
...
|   94950 | яростных                                                    |
|    1599 | ярь-медянка                                                 |
|    1600 | ястреб-перепелятник                                         |
|    1601 | ястреб-тетеревятник                                         |
|   94999 | яфетический                                                 |
|    1603 | яхт-клуб                                                    |
|    1604 | яхт-клуба                                                   |
...
|    1938 | яванский желоб                                              |
|    1939 | яванское море                                               |
|   94690 | еще какое-то слово                                          |
|    1940 | яде-бузен залив                                             |
|   94751 | ядерного                                                    |
|   94755 | раз два-три                                                 | 

need: select or remove from the table cyrillic words.

it is necessary to delete the rows in which ONE Cyrillic word WITHOUT digit, without special chars, without chars of punctuation.

Condition for removal: '^[а-я]+[а-я]$+'

select * from bpsw where badpsw regexp '^[a-z]+[a-z]$+';

With English words there are no problems, but with the Cyrillic alphabet I will not understand HOW it make.

I think that it is necessary to specify a collate?

UPD: mysql regex utf-8 characters ?

can here i need to look at the meanings of the Cyrillic symbols in the byte representation?


Solution

  • It's easy:

    select * from bpsw where badpsw regexp '^[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]+$'; 
    

    https://linux.org.ua/index.php?topic=11272.msg201662#msg201662