As my work involves viewing many items from a website, I need to know which items have been visited and which not, so as to avoid repeated viewing.
The problem is that the URL of these items include some garbage parameters that are dynamically changing. This means the browser's history record is almost useless in identifying which items have already been viewed.
This is an example of the URL:
https://example.com/showitemdetail/?item_id=e6de72e&hitkey=true&index=234&cur_page=1&pageSize=30
Only the "item_id=e6de72e" part is useful in identifying each item. The other parameters are dynamic garbage.
My question is: how to let Chrome mark only the "example.com/showitemdetail/?item_id=e6de72e" part as visited, and ignore the rest parameters?
Please note that I do NOT want to modify the URLs, because that might alarm the website server to suspect that I am abusing their database. I want the garbage parameters to be still there, but the browser history mechanism to ignore them.
I know this is not easy. I am proposing a possible solution, but do not know whether it can be implemented. It's like this:
Step: 1) An extension background script to extract the item_id from each page I open, and then store it in a collection of strings. This collection of strings should be saved in a file somewhere.
Step: 2) Each time I open a webpage with a list of various items, the background script verifies whether each URL contains a string which matches any one in the above collection. If so, that URL would be automatically added to history. Then that item will naturally be shown as visited.
Does the logic sound OK? And if so how to implementable it by making a simple extension?
Of course, if you have other more neat solutions, I'd be very interested to learn.
Assuming that the link to the items always have the item_id, that would work, yes.
You would need the following steps:
Recording an element
On accessing the product page:
i. You can extract the current product id by checking the URL parameters (see one of these codes).
ii. You use storage api to retrieve a certain stored variable, say: visited_products. This variable you need to implement it as a Set since it's the best data type to handle unique elements.
iii. You check whether the current element is on the list with .has(). If yes, then you skip it. If all is good, it should always be new, but no harm in checking. If not, then you use add() to add the new product id (although Set will not allow you to add a repeated items, so you can skip the check and just save add it directly). Make sure you store it to Chrome.
Now you have registered a visit to a product.
Checking visited elements
You use a content_script again to be inserted on product pages or all pages if desired.
You get all the links of the page with document.querySelectorAll(). You could apply a CSS selector like: a[href*="example.com/showitemdetail/?item_id="]
which would select all the links whose href contains that URL portion.
Then, you iterate the links with a for loop. On each iteration, you extract the item_id. Probably, the easiest way is: /(?:item_id=)(.*?)(?:&|$)/
. This matches all characters preceded by item_id= (not captured) until it finds an & or end of the string (whichever happens first, and not captured).
With the id captured, you can check the Set of the first part with .has() to see whether it's on the list.
Now, about how to handle whether it's on the list, depends on you. You could hide visited elements. Or apply different CSS classes or style to them so you differentiate them easily.
I hope this gives you a head start. Maybe you can give it a try and, if you cannot make it work, you can open a new question with where you got stuck.