I am trying to start a project based on web scraping. I have the tools already setup for different platforms for JSON I use SwiftyJSON and for raw HTML I use hpple. My problem is I am trying to setup some generic class for content and some generic class for the fetcher for the content. Since every operation goes like this,
Login If there is username or password supply it. If it has captcha display and use the result Fetch the data using Alamofire Scrape the data either by using JSON or HTML Populate the content class.
I am wondering if there is a way to define some kind of protocol, enum or generic templates so that for each class I can define those different functions. I think if I can’t make this right, I will write the same code over and over again. This is what I have come up with. I will appreciate if you can help me to set this up right.
enum Company:Int {
case CNN
case BBC
case HN
case SO
var captcha:Bool {
switch self {
case CNN:
return false
case BBC:
return true
case HN:
return true
case SO:
return false
}
}
var description:String {
get {
switch self {
case CNN:
return "CNN"
case BBC:
return "BBC"
case HN:
return "Hacker News"
case SO:
return "Stack Overflow"
}
}
}
}
class Fetcher {
var username:String?
var password:String?
var url:String
var company:Company
init(company: Company, url:String) {
self.url = url
self.company = company
}
init(company: Company, url:String,username:String,password:String) {
self.url = url
self.company = company
self.username = username
self.password = password
}
func login() {
if username != nil {
// login
}
if company.captcha {
//show captcha
}
}
func fetch(){
}
func populate() {
}
}
class CNN: Fetcher {
}
Okay, this was a fun exercise...
You really just need to build out your Company
enumeration further to make your Fetcher
more abstract. Here's an approach that only slightly modifies your own that should get you much closer to what you are trying to achieve. This is based on a previous reply of mine to a different question of yours.
Company
enum Company: Printable, URLRequestConvertible {
case CNN, BBC, HN, SO
var captcha: Bool {
switch self {
case CNN:
return false
case BBC:
return true
case HN:
return true
case SO:
return false
}
}
var credentials: (username: String, password: String)? {
switch self {
case CNN:
return ("cnn_username", "cnn_password")
case BBC:
return nil
case HN:
return ("hn_username", "hn_password")
default:
return nil
}
}
var description: String {
switch self {
case CNN:
return "CNN"
case BBC:
return "BBC"
case HN:
return "Hacker News"
case SO:
return "Stack Overflow"
}
}
var loginURLRequest: NSURLRequest {
var URLString: String?
switch self {
case CNN:
URLString = "cnn_login_url"
case BBC:
URLString = "bbc_login_url"
case HN:
URLString = "hn_login_url"
case SO:
URLString = "so_login_url"
}
return NSURLRequest(URL: NSURL(string: URLString!)!)
}
var URLRequest: NSURLRequest {
var URLString: String?
switch self {
case CNN:
URLString = "cnn_url"
case BBC:
URLString = "bbc_url"
case HN:
URLString = "hn_url"
case SO:
URLString = "so_url"
}
return NSURLRequest(URL: NSURL(string: URLString!)!)
}
}
News
struct News {
let title: String
let content: String
let date: NSDate
let author: String
}
Fetcher
class Fetcher {
typealias FetchNewsSuccessHandler = [News] -> Void
typealias FetchNewsFailureHandler = (NSHTTPURLResponse?, AnyObject?, NSError?) -> Void
// MARK: - Fetch News Methods
class func fetchNewsFromCompany(company: Company, success: FetchNewsSuccessHandler, failure: FetchNewsFailureHandler) {
login(
company: company,
success: { apiKey in
Fetcher.fetch(
company: company,
apiKey: apiKey,
success: { news in
success(news)
},
failure: { response, json, error in
failure(response, json, error)
}
)
},
failure: { response, json, error in
failure(response, json, error)
}
)
}
// MARK: - Private - Helper Methods
private class func login(
#company: Company,
success: (String) -> Void,
failure: (NSHTTPURLResponse?, AnyObject?, NSError?) -> Void)
{
if company.captcha {
// You'll need to figure this part out on your own. First off, I'm not really sure how you
// would do it, and secondly, I think there may be legal implications of doing this.
}
let request = Alamofire.request(company.loginURLRequest)
if let credentials = company.credentials {
request.authenticate(username: credentials.username, password: credentials.password)
}
request.responseJSON { _, response, json, error in
if let error = error {
failure(response, json, error)
} else {
// NOTE: You'll need to parse here...I would suggest using SwiftyJSON
let apiKey = "12345678"
success(apiKey)
}
}
}
private class func fetch(
#company: Company,
apiKey: String,
success: FetchNewsSuccessHandler,
failure: FetchNewsFailureHandler)
{
let request = Alamofire.request(company.URLRequest)
request.responseJSON { _, _, json, error in
if let error = error {
failure(response, json, error)
} else {
// NOTE: You'll need to parse here...I would suggest using SwiftyJSON
let news = [News]()
success(news)
}
}
}
}
Example ViewController Calling Fetcher
class SomeViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
Fetcher.fetchNewsFromCompany(
Company.CNN,
success: { newsList in
for news in newsList {
println("\(news.title) - \(news.date)")
}
},
failure { response, data, error in
println("\(response) \(error)")
}
)
}
}
By allowing the Company
object to flow through your Fetcher
, you should never have to track state for a company in your Fetcher. It can all be stored directly inside the Enum.
Hope that helps. Cheers.