Search code examples
hyperlinkjsoupf#-dataparseexception

Extract only mp4 links by jsoup


I have aEditText to insert the URL, a button to lunch the HTML parsing and another EditText to show the results, how to i extract a website source with jsoup only that links ended with .mp4 ?

This is my profile link: https://www.instagram.com/p/BEZcgC8/

there is two same mp4 link...

<meta property="og:video" content="http://igcdn-videos-h-8-a.akamaihd.net
/hphotos-ak-xal1/t50.2886-16/13053343_16890256565548_842608422_n.mp4" 
/>
<meta property="og:video:secure_url" content="https://igcdn-videos-
h-8-a.akamaihd.net/hphotos-ak-xal1/t50.2886-16
/13053343_16890911255689848_842608422_n.mp4" />
<meta property="og:video:type" content="video/mp4" />

I want results like this to EditText https://example.com/ringuser.mp4

xml layout : activity_main.xml

    <RelativeLayout xmlns:android="http://schemas.android.com/apk/res 
/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:paddingLeft="@dimen/activity_horizontal_margin"
android:paddingRight="@dimen/activity_horizontal_margin"
android:paddingTop="@dimen/activity_vertical_margin"
android:paddingBottom="@dimen/activity_vertical_margin"
tools:context="com.survivingwithandroid.jsoup.MainActivity">

<TextView android:id="@+id/txt1"
    android:text="@string/app_name"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_centerHorizontal="true"
    style="@android:style/TextAppearance.Large"/>

<TextView
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_below="@id/txt1"
    android:layout_marginTop="20dp"
    android:text="Website URL"
    android:id="@+id/txt2"/>
<EditText
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_below="@id/txt2"
    android:ems="15"
    android:id="@+id/edtURL"/>
<Button
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_below="@id/edtURL"
    android:layout_centerHorizontal="true"
    android:text="Get data!"
    android:layout_marginTop="15dp"
    android:id="@+id/btnGo"/>

<TextView
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_below="@id/btnGo"
    android:layout_marginTop="10dp"
    android:text="Result data"
    android:id="@+id/txt3"/>

<EditText
    android:id="@+id/edtResp"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:layout_below="@+id/txt3"
    android:inputType="textMultiLine"
    android:lines="6"
    android:editable="false"
    android:layout_marginTop="10dp"/>

</RelativeLayout>

logcat:

09-12 19:38:35.148: D/NativeCrypto(22237): ssl=0x52f3ceb0 sslRead  
buf=0x41837fd0 len=174,timeo=3000
09-12 19:38:35.149: D/NativeCrypto(22237): Doing SSL_Read()    
ssl=0x52f3ceb0, appData=0x52f0eec0
09-12 19:38:35.149: D/NativeCrypto(22237): Returned from SSL_Read()  
with   result 174, error code 0      
ssl=0x52f3ceb0, appData=0x52f0eec0

09-12 19:38:35.233: D/dalvikvm(22237): GC_FOR_ALLOC freed 849K, 25% free 
3278K/4324K, paused 12ms, 
total 12ms

09-12 19:38:35.309: D/NativeCrypto(22237): NativeCrypto_EVP_VerifyInit 
ctx=0x52f2ec48

09-12 19:38:35.309: D/NativeCrypto(22237): NativeCrypto_EVP_VerifyInit 
algorithmChars=RSA-SHA1

09-12 19:38:35.393: D/dalvikvm(22237): GC_FOR_ALLOC freed 856K, 21% free 
3960K/5012K, paused 15ms, 
total 15ms

09-12 19:38:35.551: D/MyTag(22237): Final links 

09-12 19:38:36.477: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.68, 
dur:1491.40, max:498.01, min:102.51

09-12 19:38:37.487: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1009.32, max:512.27, min:497.04

09-12 19:38:38.978: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.01, 
dur:1491.22, max:497.44, min:496.74

09-12 19:38:39.987: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1009.04, max:512.26, min:496.77

09-12 19:38:41.494: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.99, 
dur:1507.38, max:513.62, min:495.68

09-12 19:38:42.984: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.01, 
dur:1489.89, max:497.03, min:496.14

09-12 19:38:43.994: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1009.40, max:513.22, min:496.18

09-12 19:38:45.500: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.99, 
dur:1506.13, max:512.60, min:496.73

09-12 19:38:46.990: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.01, 
dur:1490.53, max:497.36, min:496.37

09-12 19:38:48.000: I/SurfaceTextureClient(22237): [STC::queueBuffer] 

(this:0x504d5528) fps:1.98, 
dur:1009.87, max:513.21, min:496.66

09-12 19:38:49.009: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1008.96, max:512.33, min:496.63

09-12 19:38:50.500: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.01, 
dur:1491.21, max:497.98, min:495.79

09-12 19:38:51.510: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1009.38, max:512.70, min:496.68

09-12 19:38:53.016: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.99, 
dur:1505.97, max:512.45, min:496.70


09-12 19:38:55.516: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1009.17, max:512.00, min:497.17
09-12 19:38:34.789: D/NativeCrypto(22237): Doing SSL_Read() 
ssl=0x52f3ceb0, appData=0x52f0eec0

09-12 19:38:34.789: D/NativeCrypto(22237): Returned from SSL_Read() with 
result 1, error code 0 
ssl=0x52f3ceb0, appData=0x52f0eec0
09-12 19:38:57.022: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.99, 
dur:1505.99, max:509.86, min:497.09

09-12 19:38:58.512: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:2.01, 
dur:1490.55, max:498.20, min:495.64

09-12 19:38:59.523: I/SurfaceTextureClient(22237): [STC::queueBuffer] 
(this:0x504d5528) fps:1.98, 
dur:1010.26, max:513.02, min:497.24

09-12 19:39:00.440: D/OpenGLRenderer(22237): Flushing caches (mode 0)

09-12 19:39:00.471: D/InputMethodManager(22237): deactivate the 
inputconnection in 
ControlledInputConnectionWrapper.

09-12 19:39:00.497: D/OpenGLRenderer(22237): Flushing caches (mode 0)

09-12 19:39:00.682: D/dalvikvm(22237): GC_FOR_ALLOC freed 1317K, 27% 
free 4178K/5692K, paused 21ms, 
total 21ms  
09-12 19:39:00.716: V/PhoneWindow(22237): DecorView setVisiblity: 
visibility = 4

09-12 19:39:00.720: V/PhoneWindow(22237): DecorView setVisiblity: 
visibility = 0

09-12 19:39:00.721: W/IInputConnectionWrapper(22237): showStatusIcon on 
inactive InputConnection

09-12 19:39:00.724: V/InputMethodManager(22237): Not IME target window, 
ignoring

09-12 19:39:00.783: V/InputMethodManager(22237): onWindowFocus: 
android.widget.EditText{41a17428 
VFED..CL .F....ID 24,127-504,218 #7f09003e app:id/edtURL} 
softInputMode=288 first=true flags=#1810100

New java by Davide Pastore but it doesnt show any result when press button...

public class MainActivity extends ActionBarActivity {

private EditText respText;

  @Override
  protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    final EditText edtUrl = (EditText) findViewById(R.id.edtURL);
    Button btnGo = (Button) findViewById(R.id.btnGo);
    respText = (EditText) findViewById(R.id.edtResp);
    btnGo.setOnClickListener(new View.OnClickListener() {
        @Override
        public void onClick(View view) {
            String siteUrl = edtUrl.getText().toString();
            ( new ParseURL() ).execute(new String[]{siteUrl});
        }
    });
}


@Override
public boolean onCreateOptionsMenu(Menu menu) {
    // Inflate the menu; this adds items to the action bar if it is 
present.
    //getMenuInflater().inflate(R.menu.main, menu);
    return true;
}

@Override
public boolean onOptionsItemSelected(MenuItem item) {
    // Handle action bar item clicks here. The action bar will
    // automatically handle clicks on the Home/Up button, so long
    // as you specify a parent activity in AndroidManifest.xml.
    int id = item.getItemId();
    if (id == R.id.action_settings) {
        return true;
    }
    return super.onOptionsItemSelected(item);
}


private class ParseURL extends AsyncTask<String, Void, String> {

    private String finalLinks;

    @Override
    protected String doInBackground(String... strings) {
        StringBuffer buffer = new StringBuffer();
        try {
            Document doc = Jsoup.connect(strings[0]).get();
            Elements mp4Links = doc.select("a[href$=.mp4]");
            List<String> links = new ArrayList<String>();
            for (Element mp4Link : mp4Links) {
                String absHref = mp4Link.attr("abs:href");
                links.add(absHref);
            }
            finalLinks = "";
            for (String link : links) {
                finalLinks += link + "\n";
            }

            Log.d("MyTag", "Final links " + finalLinks);

        }
        catch(Throwable t) {
            t.printStackTrace();
        }

        return buffer.toString();
    }

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
    }

    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        respText.setText(finalLinks);
    }
}
}

old java

public class MainActivity extends Activity {

// URL Address
String url = "http://www.androidbegin.com";
ProgressDialog mProgressDialog;

@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    // Locate the Buttons in activity_main.xml
    Button titlebutton = (Button) findViewById(R.id.titlebutton);
    Button descbutton = (Button) findViewById(R.id.descbutton);
    Button logobutton = (Button) findViewById(R.id.logobutton);

    // Capture button click
    titlebutton.setOnClickListener(new OnClickListener() {
        public void onClick(View arg0) {
            // Execute Title AsyncTask
            new Title().execute();
        }
    });

    // Capture button click
    descbutton.setOnClickListener(new OnClickListener() {
        public void onClick(View arg0) {
            // Execute Description AsyncTask
            new Description().execute();
        }
    });

    // Capture button click
    logobutton.setOnClickListener(new OnClickListener() {
        public void onClick(View arg0) {
            // Execute Logo AsyncTask
            new Logo().execute();
        }
    });

}

// Title AsyncTask
private class Title extends AsyncTask<Void, Void, Void> {
    String title;

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
        mProgressDialog = new ProgressDialog(MainActivity.this);
        mProgressDialog.setTitle("Android Basic JSoup Tutorial");
        mProgressDialog.setMessage("Loading...");
        mProgressDialog.setIndeterminate(false);
        mProgressDialog.show();
    }

    @Override
    protected Void doInBackground(Void... params) {
        try {
            // Connect to the web site
            Document document = Jsoup.connect(url).get();
            // Get the html document title
            title = document.title();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        // Set title into TextView
        TextView txttitle = (TextView) findViewById(R.id.titletxt);
        txttitle.setText(title);
        mProgressDialog.dismiss();
    }
    }

// Description AsyncTask
private class Description extends AsyncTask<Void, Void, Void> {
    String desc;

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
        mProgressDialog = new ProgressDialog(MainActivity.this);
        mProgressDialog.setTitle("Android Basic JSoup Tutorial");
        mProgressDialog.setMessage("Loading...");
        mProgressDialog.setIndeterminate(false);
        mProgressDialog.show();
    }

    @Override
    protected Void doInBackground(Void... params) {
        try {
            // Connect to the web site
            Document document = Jsoup.connect(url).get();
            // Using Elements to get the Meta data
            Elements description = document
                    .select("meta[name=description]");
            // Locate the content attribute
            desc = description.attr("content");
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        // Set description into TextView
        TextView txtdesc = (TextView) findViewById(R.id.desctxt);
        txtdesc.setText(desc);
        mProgressDialog.dismiss();
    }
    }

// Logo AsyncTask
private class Logo extends AsyncTask<Void, Void, Void> {
    Bitmap bitmap;

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
        mProgressDialog = new ProgressDialog(MainActivity.this);
        mProgressDialog.setTitle("Android Basic JSoup Tutorial");
        mProgressDialog.setMessage("Loading...");
        mProgressDialog.setIndeterminate(false);
        mProgressDialog.show();
    }

    @Override
    protected Void doInBackground(Void... params) {

        try {
            // Connect to the web site
            Document document = Jsoup.connect(url).get();
            // Using Elements to get the class data
            Elements img = document.select("a[class=brand brand-image]   
img[src]");
            // Locate the src attribute
            String imgSrc = img.attr("src");
            // Download image from URL
            InputStream input = new java.net.URL(imgSrc).openStream();
            // Decode Bitmap
            bitmap = BitmapFactory.decodeStream(input);

        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        // Set downloaded image into ImageView
        ImageView logoimg = (ImageView) findViewById(R.id.logo);
        logoimg.setImageBitmap(bitmap);
        mProgressDialog.dismiss();
    }

Solution

  • Update 2

    Just edit the CSS query to get what you prefer.

    public class MainActivity extends AppCompatActivity {
    
        private EditText respText;
    
        @Override
        protected void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.activity_main);
    
            final EditText edtUrl = (EditText) findViewById(R.id.edtURL);
            Button btnGo = (Button) findViewById(R.id.btnGo);
            respText = (EditText) findViewById(R.id.edtResp);
            btnGo.setOnClickListener(new View.OnClickListener() {
                @Override
                public void onClick(View view) {
                    String siteUrl = edtUrl.getText().toString();
                    ( new ParseURL() ).execute(new String[]{siteUrl});
                }
            });
        }
    
    
        @Override
        public boolean onCreateOptionsMenu(Menu menu) {
            // Inflate the menu; this adds items to the action bar if it is present.
            //getMenuInflater().inflate(R.menu.main, menu);
            return true;
        }
    
        @Override
        public boolean onOptionsItemSelected(MenuItem item) {
            // Handle action bar item clicks here. The action bar will
            // automatically handle clicks on the Home/Up button, so long
            // as you specify a parent activity in AndroidManifest.xml.
            int id = item.getItemId();
            if (id == R.id.action_settings) {
                return true;
            }
            return super.onOptionsItemSelected(item);
        }
    
    
        private class ParseURL extends AsyncTask<String, Void, String> {
    
            private String finalLinks;
    
            @Override
            protected String doInBackground(String... strings) {
                StringBuffer buffer = new StringBuffer();
                try {
                    Document doc = Jsoup.connect(strings[0]).get();
                    Elements mp4Links = doc.select("meta[content$=.mp4]");
                    List<String> links = new ArrayList<String>();
                    for (Element mp4Link : mp4Links) {
                        String absHref = mp4Link.attr("content");
                        links.add(absHref);
                    }
                    finalLinks = "";
                    for (String link : links) {
                        finalLinks += link + "\n";
                    }
    
                    Log.d("MyTag", "Final links " + finalLinks);
    
                }
                catch(Throwable t) {
                    t.printStackTrace();
                }
    
                return buffer.toString();
            }
    
            @Override
            protected void onPreExecute() {
                super.onPreExecute();
            }
    
            @Override
            protected void onPostExecute(String s) {
                super.onPostExecute(s);
                respText.setText(finalLinks);
            }
        }
    }
    

    Update

    A complete example could be:

    public class MainActivity extends AppCompatActivity {
    
        private EditText respText;
    
        @Override
        protected void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.activity_main);
    
            final EditText edtUrl = (EditText) findViewById(R.id.edtURL);
            Button btnGo = (Button) findViewById(R.id.btnGo);
            respText = (EditText) findViewById(R.id.edtResp);
            btnGo.setOnClickListener(new View.OnClickListener() {
                @Override
                public void onClick(View view) {
                    String siteUrl = edtUrl.getText().toString();
                    ( new ParseURL() ).execute(new String[]{siteUrl});
                }
            });
        }
    
    
        @Override
        public boolean onCreateOptionsMenu(Menu menu) {
            // Inflate the menu; this adds items to the action bar if it is present.
            //getMenuInflater().inflate(R.menu.main, menu);
            return true;
        }
    
        @Override
        public boolean onOptionsItemSelected(MenuItem item) {
            // Handle action bar item clicks here. The action bar will
            // automatically handle clicks on the Home/Up button, so long
            // as you specify a parent activity in AndroidManifest.xml.
            int id = item.getItemId();
            if (id == R.id.action_settings) {
                return true;
            }
            return super.onOptionsItemSelected(item);
        }
    
    
        private class ParseURL extends AsyncTask<String, Void, String> {
    
            private String finalLinks;
    
            @Override
            protected String doInBackground(String... strings) {
                StringBuffer buffer = new StringBuffer();
                try {
                    Document doc = Jsoup.connect(strings[0]).get();
                    Elements mp4Links = doc.select("a[href$=.mp4],meta[property=og:video],meta[property=og:video:secure_url]");
                    List<String> links = new ArrayList<String>();
                    for (Element mp4Link : mp4Links) {
                        String absHref = mp4Link.attr("abs:href");
                        links.add(absHref);
                    }
                    finalLinks = "";
                    for (String link : links) {
                        finalLinks += link + "\n";
                    }
    
                    Log.d("MyTag", "Final links " + finalLinks);
    
                }
                catch(Throwable t) {
                    t.printStackTrace();
                }
    
                return buffer.toString();
            }
    
            @Override
            protected void onPreExecute() {
                super.onPreExecute();
            }
    
            @Override
            protected void onPostExecute(String s) {
                super.onPostExecute(s);
                respText.setText(finalLinks);
            }
        }
    }
    

    Old

    Let's say you have a HTML like this:

    <html>
    <head>
    <title>Try jsoup</title>
    </head>
    <body>
    <p>This is <a href="http://jsoup.org/">jsoup</a>.</p>
    <a href="https://example.com/ringuser.mp4">mp4 1</a>
    <a href="https://example.com/ringuser_2.mp4">mp4 2</a>
    <a href="https://example.com/ringuser_3.mp4">mp4 3</a>
    <a href="https://example.com/ringuser_4.mp4">mp4 4</a>
    <a href="other.html">ciao</a>
    </body>
    </html>
    

    You can retrieve all the links that ends with .mp4 using this code:

    Elements mp4Links = doc.select("a[href$=.mp4]");
    List<String> links = new ArrayList<String>();
    for (Element mp4Link : mp4Links) {
      String absHref = mp4Link.attr("abs:href");
      links.add(absHref);
    }
    
    //Do your magic with the links List...
    

    links will contain:

    https://example.com/ringuser.mp4
    https://example.com/ringuser_2.mp4
    https://example.com/ringuser_3.mp4
    https://example.com/ringuser_4.mp4