Search code examples
pythonsqlssms

Is there ways to add 0s in a specific columns of dataframe to match specific number of digits?


I have two tables with a column that contains specific IDs.

For example,

TABLE A

cust_id 
1234
32145
1235692
9293159

TABLE B

cust_id 
0001234
0032145
1235692
9293159 

I'm trying to apply inner join using sql, but since they do not exactly match, the first two ids 1234 (0001234), 32145 (0032145) are missing from the results using inner join clause.

So here is my question.

  1. Is there any way to add additional 0s in front of cust_id of TABLE A to match that of TABLE B's cust_id?

For example, 1234 is only 4 digits, but after transformation, it would be in 7 digits with additional three 0s in front of it --> 0001234.

  1. Are these kinds of tasks easier to handle with Python? or SQL? My past experiences with data cleansing tells me it's Python, but are there any other suggestions?

If possible, for this specific case, I am hoping I can get help using Python or SQL.

Thanks in advance.


Solution

  • For python, this can be easily handled using pandas.

    Assuming cust_id is already a string column:

    >>> df.cust_id.apply(lambda x: x.zfill(7))
    0    0001234
    1    0032145
    2    1235692
    3    9293159
    

    For SQL, you state you're using SSMS, so that implies SQL Server. SQL Server unfortunately doesn't have an LPAD function out of the box, but you can replicate it using:

    REPLACE(STR(<column_name>, <desired_length>),' ','0')
    

    For example:

    with cust_ids as (
        select * from (
            values 
                ('1234'),
                ('32145'),
                ('1235692'),
                ('9293159')
            ) a (cust_id)
    )
    
    select 
        cust_id,
        REPLACE(STR(cust_id, 7),' ','0') as padded_cust_id
    from cust_ids