I use SQL Import functionality outside Magento 2 Bootstrap. It perfectly works, but I have a degradation. After 2M imported products daily import benchmark decreased in 3 times. After figuring out I got to conclusion that the problem is in _cl tables Here is what I have in _cl tables:
Table | Record count |
---|---|
catalog_product_attribute_cl | 135M |
catalog_product_category_cl | 66M |
catalog_product_price_cl | 75M |
cataloginventory_stock_cl | 62M |
catalogrule_product_cl | 154M |
catalogsearch_fulltext_cl | 171M |
inventory_cl | 4M |
If I remove all data after every X number of imported products, would it properly work with incremental index through bin/magento cron:run. Would incremental functionality synchronize with ElasticSearch only data that appears in _cl tables & won't touch data that is already synchronized?
Yes, it would. I would would sync only data that was recently added. Finally due to large number of products & very stressful inefficient SELECT queries in Magento Index API we decided to refuse from using out of the box index functionality & implemented our custom methods for ElasticSearch outside Magento API