Search code examples
csvweka

Not able to load CSV file in weka


I am not being able to load csv file using weka, I have removed each and every special symbol even using text editor, still no luck. I am attaching the file, I will be obliged if solve this problem.

It shows "Wrong number of values, Read 31, expected 27, read token[EOL], line 3"

link : https://drive.google.com/open?id=0By7zyIPDD6HJMmthWnZLSUk5aFE


Solution

  • You have planty of empty fields in your file and if you download it as .csv even the header gets three commas at its end. e.g. your 6th line:

    ,Doug Walker,,,131,,Rob Walker,131,,Documentary,Doug Walker,Star Wars: Episode VII The Force Awakens  ,8,143,,0,,,,,,,,,12,7.1,,0,,,

    Simmilar to the suggestion in this post you could try s.th. like notepad++ or another text editor to replace ",," by ",?," to fill up your gaps.

    Convert NA values to ? automatically while loading

    I did this and then you get in your first row two question marks as column names wich obviously doesnt work, so change the first row to look like this:

    color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,?,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,additionalColName1,additionalColName2,additionalColName3

    if you try now to import your data weka starts telling you which lines it doesn't like and why. Btw. you did not "removed each and every special symbol"! After removing a fiew lines with e.g. the Ç character it worked.

    Thats just an ugly workaround, try filling the empty values and find a regular expression or a better way to save your file to remove the last three commas of every line, i was just too lazy for now. But i could load it into weka and that's what you wanted (: