Search code examples
juliasubstring

Received MethodError when cleaning String


I have data in a .txt file that looks like this:

04:31 Yuri Kane feat Jeza – Love Comes (Original Mix) [PREMIER]
25:31 Heatbeat & Quilla – Secret (Original Mix) [ARMADA CAPTIVATING]

All of them have this pattern:

00:00 artist - title [studio]

I want to remove the time stamp and the studio, so the output looks like this:

1. Yuri Kane feat Jeza – Love Comes (Original Mix)

Here is what I tried:

function remove_time_from(str::String)
  return last(split(str,"0 "))
end 

function remove_url(str::String)
  return first(rsplit(str,"["))
end

function main()
     
     tracks = String[]
     local number = 0
    
     for line in eachline("track-list.txt")
        number += 1
        removed_time = remove_time_from(line)
        cleaned = remove_url(removed_time)
        push!(tracks,"$number.$cleaned")
     end

     open("track-list-cleaned.txt", "w") do io
        for line in tracks
            write(io, "$line\n")
        end 
     end
end 

main()

but it returns:

MethodError: no method matching remove_url(::SubString{String})

Solution

  • When you use the function remove_time_from() it uses first() which returns a SubString{String}:

    track = "04:31 Yuri Kane feat Jeza – Love Comes (Original Mix) [PREMIER]"    
    println(typeof(remove_time_from(track))) # Output: SubString{String}
    

    You have 2 ways to fix it:

    1. Have both remove_time_from() and remove_url() convert the SubString to String before returning it. This way, no matter which function you use first, you'll get a String:
    return convert(String,last(split(str,"0 ")))
    
    1. Use AbstractString instead of String as the function parameter, because SubString is a subtype of AbstractString:
    println(SubString <: AbstractString) # Output: true
    

    This way, no matter which function you use first, it would accept a String (the variable type of line) or SubString (the type you end up with after using one of the functions).

    Suggestions:

    1. Using split(str,"0 ") won't remove the time stamp:
    last(split("04:31 Yuri Kane feat Jeza – Love Comes (Original Mix) [PREMIER]", "0 "))
    Output: 04:31 Yuri Kane feat Jeza – Love Comes (Original Mix) [PREMIER] 
    

    What you need is chop() and you can specify how many characters to ignore from the head, so in this case 5 (includes the leading whitespace).

    chop(str, head = 5)
    
    1. You don't need to read in the lines, clean it, and then store it in a Vector to write later. You can clean it (do it in one line), and write it out to the file:
    open("track-list-cleaned.txt", "w") do io
        for line in eachline("track-list.txt")
            number += 1
            cleaned = (remove_url(remove_time_from(line)))
            write(io, "$number.$cleaned\n")
        end 
    end
    
    1. Use enumerate() to number the lines as you're reading them in:
    for (number,line) in enumerate(eachline("track-list.txt"))
    

    Code:

    # Using the assignment form because each function has only one line.
    remove_time_from(str::AbstractString) = chop(str, head = 5)
    remove_url(str::AbstractString) = first(rsplit(str," https"))
    
    function main()
        open("track-list-cleaned.txt", "w") do io
            for (number,line) in enumerate(eachline("track-list.txt"))
                cleaned = strip(remove_url(remove_time_from(line)))
                write(io,"$number.$cleaned\n")
            end 
        end
    end 
    
    main()