I'm trying to get data from a website using Jsoup, the website is taking payload in Json ({"SEARCH_VALUE":"ab","STARTS_WITH_YN":false,"ACTIVE_ONLY_YN":false,"ELECTRONIC_NOTARY_ONLY_YN":false,"REMOTE_NOTARY_ONLY_YN":false}
) with post request but it is giving 500 error. I tried to get cookies from the home page which return null ({}).
scala version: "2.13.1"
sbt version: "1.2.8"
jsoup version: "1.15.3"
here is my code in scala
val homePageUrl = "https://firststop.sos.nd.gov/search/notary"
val searchPage = "https://firststop.sos.nd.gov/api/Records/notarysearch"
val jsoup =Jsoup.connect(searchPage)
val response = jsoup.data("ACTIVE_ONLY_YN","false" )
.data("SEARCH_VALUE", "ab")
.data("ELECTRONIC_NOTARY_ONLY_YN", "0")
.data("REMOTE_NOTARY_ONLY_YN", "false")
.data("STARTS_WITH_YN", "false")
.post()
println(response)
Error:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=[https://firststop.sos.nd.gov/api/Records/notarysearch]
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:890)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:829)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:366)
at org.jsoup.helper.HttpConnection.post(HttpConnection.java:360)
What I tried:
I've set timeout and userAgent such as
val jsoup =Jsoup.connect(searchPageUrl).userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36").timeout(0).ignoreHttpErrors(true).ignoreContentType(true).followRedirects(true)
val response = jsoup.data("ACTIVE_ONLY_YN","false" )
.data("SEARCH_VALUE", "ab")
.data("ELECTRONIC_NOTARY_ONLY_YN", "0")
.data("REMOTE_NOTARY_ONLY_YN", "false")
.data("STARTS_WITH_YN", "false")
.post()
println(response)
I got same error,
<html>
<head></head>
<body>
{"code":"api/Records","message":"Internal Error Occurred","internalerror":null,"title":""}
</body>
</html>
Then I tried to set header
val payload = """{SEARCH_VALUE:"ab",STARTS_WITH_YN:false,ACTIVE_ONLY_YN:false,ELECTRONIC_NOTARY_ONLY_YN:false,REMOTE_NOTARY_ONLY_YN:false}""".stripMargin
val request: Connection.Response = Jsoup.connect(searchPageUrl)
.header("Content-Type", "application/json")
.method(Connection.Method.POST)
.ignoreContentType(true)
.requestBody(payload)
.execute()
val responseBody = request.body()
println(responseBody)
but it is still giving 500 error.
I have also tried it with scalaj.http
val payload = """{SEARCH_VALUE:"ab",STARTS_WITH_YN:false,ACTIVE_ONLY_YN:false,ELECTRONIC_NOTARY_ONLY_YN:false}""".stripMargin
val response = Http(searchPageUrl).postData(payload).header("Content-Type", "application/json").asString
println(response)
val responseBody = response.body
val json = responseBody.parseJson
println(json)
I've got error
HttpResponse({"code":"api/Records","message":"Internal Error Occurred","internalerror":null,"title":""},500,TreeMap(Access-Control-Allow-Headers -> Vector(Origin, Content-Type, Accept, Content-Encoding, Authorization), Access-Control-Allow-Methods -> Vector(*), Access-Control-Allow-Origin -> Vector(*), Access-Control-Expose-Headers -> Vector(session-timeout, Request-Context), Cache-Control -> Vector(no-cache), Connection -> Vector(close), Content-Length -> Vector(90), Content-Type -> Vector(application/json; charset=utf-8), Date -> Vector(Wed, 30 Aug 2023 07:29:59 GMT), Expires -> Vector(-1), Pragma -> Vector(no-cache), Request-Context -> Vector(appId=cid-v1:df24017c-37e9-4e1c-afab-260d80eaaeea), Server -> Vector(State of North Dakota), session-timeout -> Vector(0), Set-Cookie -> Vector(ASP.NET_SessionId=j3ioe0kyrrkaqrwlymi0dq0r; path=/; HttpOnly; SameSite=Lax), Status -> Vector(HTTP/1.1 500 Internal Server Error), X-AspNet-Version -> Vector(4.0.30319), X-Content-Type-Options -> Vector(nosniff), X-XSS-Protection -> Vector(1; mode=block)))
{"code":"api/Records","internalerror":null,"message":"Internal Error Occurred","title":""}
Where I'm doing wrong , is there any other way to do get data from this website ?
There are several issues that have to be handled here:
To get the cookie - open a connection to this url - https://firststop.sos.nd.gov/api/GroupItems/Auth and store the cookie for later usage. Also add ignoreContentType
, since it's json and jsoup will not parse it (but anyway you don't need the content).
As for 2 and 3 you can see how I did it in the following (Java) code:
String search_url = "https://firststop.sos.nd.gov/api/Records/notarysearch";
String auth_url = "https://firststop.sos.nd.gov/api/GroupItems/Auth";
try {
Connection.Response con = Jsoup.connect(auth_url)
.ignoreContentType(true)
.method(Connection.Method.GET)
.execute();
System.out.println(con.cookies());
Document doc = Jsoup.connect(search_url)
.requestBody("{\"SEARCH_VALUE\":\"ab\",\"STARTS_WITH_YN\":false,\"ACTIVE_ONLY_YN\":false,\"ELECTRONIC_NOTARY_ONLY_YN\":false,\"REMOTE_NOTARY_ONLY_YN\":false}")
.cookies(con.cookies())
.ignoreContentType(true)
.header("Host", "firststop.sos.nd.gov")
.header("Accept" ,"*/*")
.header("Accept-Language" ,"en-US,en;q=0.5")
.header("Accept-Encoding" ,"gzip, deflate, br")
.header("Referer" ,"https://firststop.sos.nd.gov/search/notary")
.header("authorization" ,"undefined")
.header("content-type" ,"application/json")
.header("Content-Length" ,"131")
.header("Origin" ,"https://firststop.sos.nd.gov")
.header("DNT" ,"1")
.header("Connection" ,"keep-alive")
.header("Sec-Fetch-Dest" ,"empty")
.header("Sec-Fetch-Mode" ,"cors")
.header("Sec-Fetch-Site" ,"same-origin")
.header("Pragma" ,"no-cache")
.header("Cache-Control" ,"no-cache")
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0")
.post();
System.out.println(doc);
} catch (IOException e) {
e.printStackTrace();
}
There is also the content-length
header - I've copied its value from the browser, but you will have to write a method that calculates it.
Now all you have to do is to parse the output for your needs.