Does Python Polars have a function similar to Pandas with read_fwf ( for reading fixed-width formatted files)

Does Python Polars have a function similar to Pandas read_fwf ( for reading fixed-width formatted files)? Need to read txt files with fixed width data of the type below, and then apply a schema with columns based on positions in the file.

Sample Data: A1G01409xxx xxxx T 30499759910.001099101000199105100105430021 47.570952 -53.32332502xxxxxx WW19921220190000010AAA11 A31124xxxxxx No. 1, Subd. U SNO21499649910.001099101000199105100107410182 47.166651 -52.88563505MOBILE WW19921220190000011NNN51 A3B01339Division No. 1, Subd. G SNO35799759910.001003701000299105100105000122 47.945009 -53.06603613 WW19921220190000011NNN51 A3P01379 T 35799759910.001099101000101364100105250252 47.678897 -53.25694413RIVERHEAD WW19921220190000011NNN52

Illustrative schema: schema_dict = { "Postal_codeOM": (1, 6), "FSA": (7, 3), # ... and so on for other columns ... }

Solution

@jqurious is right here there are some solutions but not directly a function read_fwf inside polars.

You can generate a list of polars expressions with a for loop inside with_columns and make your own purpose read_fwf.

See here for the first time it appears on GitHub issues:

https://github.com/pola-rs/polars/issues/3151#issuecomment-1397354684

quoting @ghuls example (str.strip() becomes str.strip_chars()):

a sample fwf file:

$ cat fwf_test.txt
NAME                STATE     TELEPHONE  
John Smith          WA        418-Y11-4111
Mary Hartford       CA        319-Z19-4341
Evan Nolan          IL        219-532-c301

you have column names, column widths and calculate "slice tuples" for substring full string column.

df = pl.read_csv(
    "fwf_test.txt",
    has_header=False,
    skip_rows=1,
    new_columns=["full_str"]
)

column_names = [ "NAME", "STATE", "TELEPHONE" ]
widths = [20, 10, 12]


# Calculate slice values from widths.
slice_tuples = []
offset = 0

for i in widths:
    slice_tuples.append((offset, i))
    offset += i

df.with_columns(
    [
       pl.col("full_str").str.slice(slice_tuple[0], slice_tuple[1]).str.strip_chars().alias(col)
       for slice_tuple, col in zip(slice_tuples, column_names)
    ]
).drop("full_str")

shape: (3, 3)
┌───────────────┬───────┬──────────────┐
│ NAME          ┆ STATE ┆ TELEPHONE    │
│ ---           ┆ ---   ┆ ---          │
│ str           ┆ str   ┆ str          │
╞═══════════════╪═══════╪══════════════╡
│ John Smith    ┆ WA    ┆ 418-Y11-4111 │
│ Mary Hartford ┆ CA    ┆ 319-Z19-4341 │
│ Evan Nolan    ┆ IL    ┆ 219-532-c301 │
└───────────────┴───────┴──────────────┘

Then you can read see @jqurious link which is much complete with types casting, but iteration over zip(slice tuples, names, ...) remains the same logic.

It's faster than many read_fwf implementations without polars.