1. vat.loc[(vat['Sum of VAT'].isin([np.nan, 0])) &
2. (vat['Comment'] == "Transactions 0DKK") &
3. (vat['Type'].isin(['Bill', 'Bill Credit'])) &
4. (vat['Maximum of Linked Invoice'].notnull()), 'Comment'] = 'Linked invoice'
5. vat[vat["Comment"] == "Linked invoice"]
Hi all,
I have a problem with the line:
(vat['Maximum of Linked Invoice'].notnull()
It seems not to be working properly when I'm trying to exclude all of the null values in the rows. In fact, it does not exclude the null values and instead, it is included in the output from the data frame. The rest of the syntax works perfectly. I have tried using different syntax but the null values are still included in the column 'Maximum of Linked Invoice'. I don't understand why it doesn't work?
Hi again,
I've done some more research and it seems that the csv file, when imported, had 62107 non-null values for the column 'Maximum of Linked Invoice', but this incorrect, when opening the csv_file and checking, it did have thousands of blanks in the rows, but why has it not been read as null values when imported? Have you seen anything like this before?
Please see the info below
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62108 entries, 0 to 62107
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 External ID 62108 non-null object
1 Document Number 62107 non-null object
2 Transaction Number 62107 non-null object
3 Maximum of Linked Invoice 62107 non-null object
4 Type 62107 non-null object
5 Date 62107 non-null object
6 Period 62107 non-null object
7 Terms 62107 non-null object
8 Maximum of Due Date/Receive By 50885 non-null object
9 Company Name 62107 non-null object
10 Customer VAT Registration Number 62107 non-null object
11 Bill to City 62107 non-null object
12 Bill to State 62107 non-null object
13 Bill to Country 62107 non-null object
14 Bill to Zip 62107 non-null object
15 Source System 62107 non-null object
16 Source System Identifier 62107 non-null object
17 City 62107 non-null object
18 State/Province 62107 non-null object
19 Country 62107 non-null object
20 Zip 62107 non-null object
21 Currency 62107 non-null object
22 Memo (Main) 62107 non-null object
23 Maximum of GMAX Tax Code 24189 non-null object
24 Maximum of NetSuite Tax Item 59815 non-null object
25 Maximum of Coupa Tax Code 0 non-null float64
26 Maximum of External System Tax Code 0 non-null float64
27 Maximum of Tax Code (Consolidated) 59815 non-null object
28 FOP Type 62107 non-null object
29 Sum of Assets 60680 non-null float64
30 Sum of Accounts Payable 3741 non-null float64
31 Sum of Other Liabilities 57066 non-null float64
32 Sum of Income 60290 non-null float64
33 Sum of Expense 300 non-null float64
34 Sum of VAT 56269 non-null float64
dtypes: float64(8), object(27)
memory usage: 16.6+ MB
If anyone is reading this then I have found an answer. There is nothing wrong with my syntax, but the problem lies with the CSV file itself. The reason why pandas read the column 'Maximum of Linked Invoice' as 62107 non-null, is because there was a space embedded within each row in that column. The only thing I saw at first were blank rows, but this was inaccurate. So, I urge you to check the CSV file to avoid any time-consuming efforts to solve these types of tricky problems.
And this is the solution for code line 4:
(~vat['Maximum of Linked Invoice'].isin([np.nan, ' '])