Search code examples
mysqlpowershellfilenamesnon-ascii-charactersmysql.data

Powershell and filenames with non-ASCII characters (e.g. Æ)


I am attempting to index my movie collection and in doing so have run across an issue where at least one title is skipped in the import phase due to special characters. The code skips over "Æon Flux" due to it starting with Æ. Would anyone know how to correct this, please?

Clear-Host

# Variables:
$movie_dir = "K:\Movies"

# Because reasons...
$PSDefaultParameterValues['*:Encoding'] = 'utf8'

# Connect to the library MySQL.Data.dll
Add-Type -Path 'C:\Program Files (x86)\MySQL\Connector NET 8.0\Assemblies\v4.8\MySql.Data.dll'
 
# Create a MySQL Database connection variable that qualifies:
$Connection = [MySql.Data.MySqlClient.MySqlConnection]@{ConnectionString='server=127.0.0.1;uid=username;pwd=password;database=media'}
$Connection.Open()

# Drop the table to clear all entries.
$sql_drop_table = New-Object MySql.Data.MySqlClient.MySqlCommand
$sql_drop_table.Connection = $Connection
$sql_drop_table.CommandText = 'DROP TABLE Movies'
$sql_drop_table.ExecuteNonQuery() | Out-Null

# (Re)create the table.
$sql_create_table = New-Object MySql.Data.MySqlClient.MySqlCommand
$sql_create_table.Connection = $Connection
$sql_create_table.CommandText = 'create table Movies(movie_id INT NOT NULL AUTO_INCREMENT, movie_title VARCHAR(255) NOT NULL, movie_file_date INT, movie_IMDB_id INT, PRIMARY KEY (movie_id))'
$sql_create_table.ExecuteNonQuery() | Out-Null

$movies = Get-ChildItem $movie_dir -File -include *.mp4 -Recurse -Depth 1 |
    Select-Object -ExpandProperty FullName |
    Sort-Object |
    Get-Unique |
    where{$_ -ne ""}

foreach ($movie in $movies)
{
    # .net function to get just the filename (movie title).
    $title = [System.IO.Path]::GetFileNameWithoutExtension($movie)
    # Get the creation date of the movie and reformat it to yearmonthday.
    $add_date = (Get-ChildItem $movie).CreationTime.toString("yyyyMMdd")

    $query = "INSERT INTO Movies(movie_id, movie_title, movie_file_date) VALUES(NULL, @title, $add_date)"
    $command = $connection.CreateCommand()
    $command.CommandText = $query
    # Sanatize single quotes in filenames for input.
    $command.Parameters.AddWithValue("@title", $title) | Out-Null
    $command.ExecuteNonQuery() | Out-Null
}

# Close the MySQL connection.
$Connection.Close()

Write-Host
Write-Host("Added") $movies.Count ("movies.")

Solution

  • I don't think it is the Get-ChildItem that skips the file with that special character. More likely, you need to tell your MySql to use UTF-8.
    For that, have a look at How to make MySQL handle UTF-8 properly

    As for your code, I would change this:

    $movies = Get-ChildItem $movie_dir -File -include *.mp4 -Recurse -Depth 1 |
        Select-Object -ExpandProperty FullName |
        Sort-Object |
        Get-Unique |
        where{$_ -ne ""}
    

    into

    $movies = Get-ChildItem -Path $movie_dir -File -Filter '*.mp4' -Recurse -Depth 1 | Sort-Object -Property FullName
    

    and work with the FileInfo objects from there on:

    foreach ($movie in $movies) {
        $title = $movie.BaseName
        # Get the creation date of the movie and reformat it to yearmonthday.
        $add_date = '{0}:yyyyMMdd}' -f $movie.CreationTime
        . . .
    }