I made a script in ruby that uses mechanize. It goes to google.com, logs you in and the does an image search for cats. Next i want to select one of the results links from the page and then save the image.
My problem is that the links for all of the results are shown as empty strings so im not sure how to specify and click them.
here is the output of pp page so you can see the links im talking about. Note the first link are the suggested links, i can click those because they have a title "Past 24 hours" but the second link is an actual result from the search which i cannot click.
#<Mechanize::Page::Link
"Past 24 hours"
"/search?q=cats&hl=en&gbv=1&ie=UTF8&tbm=isch&source=lnt&tbs=qdr:d&sa=X&ei=T8kDUu7aB4f8iwKZx4HoBg&ved=0CCQQpwUoAQ">
#<Mechanize::Page::Link
""
"http://www.google.com/imgres?imgurl=http://jasonlefkowitz.net/wp-content/uploads/2013/07/Cute-Cats-cats-33440930-1280-800.jpg&imgrefurl=http://jasonlefkowitz.net/2013/07/slideshow-20-cats-that-suck-at-reducing-tensions-in-the-israeli-palestinian-conflict/&usg=__1YEuvKE4A9r6IIRkcz9Pu6ahN8Q=&h=800&w=1280&sz=433&hl=en&start=1&sig2=ekqjELPNQsK-QQ2r-4TeeQ&zoom=1&tbnid=Xz9P1WD4o4TSlM:&tbnh=94&tbnw=150&ei=b8sDUq36Ge3figLCzoBY&itbs=1&sa=X&ved=0CCwQrQMwAA">
Now here is a snip of the output of:
page.links.each do |link|
puts link.text.
end
Which will display the links on the page.
More
Large
Face
Photo
Clip art
Line drawing
Animated
Past 24 hours
Past week
Reset tools
funny cats
cats and kittens
cats musical
cute cats
lots of cats
cats with guns
2
3
4
5
6
7
8
9
10
Next
Notice all the whitespace on the screen? That is where the empty name "" links are on the pp page output. Anyone have any ideas on how i can click one?
Here is the code to the script.
require 'mechanize'
agent = Mechanize.new
page = agent.get('https://google.com')
page = agent.page.link_with(:text => 'Sign in').click
# pp page
sign_in = page.form() ##leave empty = nil
sign_in.Email = '10halec'
sign_in.Passwd = 'password'
page = agent.submit(sign_in)
page = agent.page.link_with(:text => 'Images').click
search = page.form('f')
search.q = 'cats'
page = agent.submit(search)
# pp page
# agent.page.image_with(:src => /imgres?/).fetch.save
page = agent.page.link_with(:text => '').click
# pp page
# page.links.each do |link|
# puts link.text
# end
pp page
def save filename = nil
filename = find_free_name filename
save! filename
end
Notice all the whitespace on the screen? That is where the empty name "" links are on the pp page output. Anyone have any ideas on how i can click one?
page = agent.page.link_with(:text => '').click
That line works for me. I put both of the following html pages in my local apache server's htdocs directory(a publicly accessible directory):
page1.html:
<!DOCTYPE html>
<html>
<head><title>Test</title></head>
<body>
<div><a href="/somesite.com/cat1.jpg">cat1</a></div>
<div><a href="/page2.html"></a></div>
<div><a href="/somesite.com/cat3.jpg"></a></div>
</body>
</html>
page2.html:
<!DOCTYPE html>
<html>
<head><title>Page2</title></head>
<body>
<div>hello</div>
</body>
</html>
Then I started up my server, which meant that page1.html was accessible in my browser using the url:
http://localhost:8080/page1.html
Then I ran the ruby program:
require 'mechanize'
agent = Mechanize.new
agent.get('http://localhost:8080/page1.html')
pp agent.page
page = agent.page.link_with(:text => '').click
puts page.title
...and the output was:
#<Mechanize::Page
{url #<URI::HTTP:0x00000100c8dc18 URL:http://localhost:8080/page1.html>}
{meta_refresh}
{title "Test"}
{iframes}
{frames}
{links
#<Mechanize::Page::Link "cat1" "/somesite.com/cat1.jpg">
#<Mechanize::Page::Link "" "/page2.html">
#<Mechanize::Page::Link "" "/somesite.com/cat3.jpg">}
{forms}>
Page2
The pp page output looks the same as your output, and I was successfully able to click on a link that has no text--as evidenced by the output Page2.
The only problem with that code is that that link_with() returns only the first match. If I use links_with(), I get all the matching links:
require 'mechanize'
agent = Mechanize.new
agent.get('http://localhost:8080/page1.html')
links = agent.page.links_with(:text => '')
p links
--output:--
[#<Mechanize::Page::Link "" "/page2.html">
, #<Mechanize::Page::Link "" "/somesite.com/cat3.jpg">
]
I would like to see the actual html of the links you are having problems with.