Combine two files if the values from the first column of each file are the same

I want to combine two files if the values from the first column of each file are the same. In case that they are not the same, add values NaN. The output should have the number of lines from the file that has the maximum number of lines from both files - so, in the case that the first file has 200 line and the second files has 180 lines then the output should have 200 lines, but also if the first file has 180 lines and the second files has 200 lines, the output should still have 200 lines. Because I don't know which of the files has more lines compared to the other file, the script should verify which of the files has more line for a specific value from column 1 - for example if file 1 has 13 entries regarding value 140 and in the second file I have 12 entries regarding value 140 then the output should have a total of 13 entries in which 12 entries contains values and one entry is with NaN. This should work also if the file two has 13 entries and file one has only 12 entries.
In the case that one of the files is empty, the output should contain the elements from the file that have data and NaN for the other 2 columns from the empty file. In the case that the first column does not match, it should contain the values from the file that have data and NaN for the other 2 column from the file that don't match. The data from the two input files should be sorted numerically based on the first column.

For example, if in the first file I have:

And on the second files I have:

I want the results to be:

140 12.49   140 12.70
140 12.33   140 12.64
140 10.62   140 10.52
140 8.57    140 8.61
140 10.09   140 10.11
140 9.82    140 9.88
140 11.48   140 11.48
140 13.07   140 13.12
140 14.09   140 14.16
140 15.02   140 15.07
140 14.20   140 14.25
140 15.05   140 15.16
140 16.15   140 16.25
141 14.91   141 14.84
141 13.62   141 NaN
141 13.09   141 NaN
141 11.54   141 NaN
141 9.86    141 NaN
141 8.37    141 NaN
141 7.95    141 NaN
141 7.55    141 NaN
141 7.86    141 NaN
141 7.92    141 NaN
141 9.99    141 NaN
141 10.82   141 NaN
141 9.83    141 NaN
142 9.31    142 9.54
142 9.36    142 8.74
142 8.77    142 9.68
142 9.52    142 10.00
142 9.92    142 9.64
142 9.56    142 10.64
142 10.50   142 12.69
142 12.53   142 12.15
142 12.05   142 11.51
142 11.30   142 11.69
142 11.52   142 12.77
142 12.61   142 15.25
142 15.05   142 NaN

In the case that the file 2 had the entries from files 1 that the output should be similar but the column 2 from the output should have NaN - for example:

140 15.16 140 15.05 140 16.25 140 16.15
141 14.84 141 14.91
141 NaN 141 13.62
141 NaN 141 13.09
..................... 142 15.25 142 12.61
142 NaN 142 15.05

awk 'NR==FNR{a[$1]=$2; next} {print $1, ($1 in a) ? a[$1] : "NaN", $1, $2}' file1.txt file2.txt

Thank you kindly for all the recommendations.

Solution

I would approach this as a merge join operation:

both files must be sorted by the join column (in this case both files are sorted by the first column)
read one row from each file
compare the 1st columns
if 1st columns match then print both rows to stdout and go to step 2
if 1st column from file1 is lower then print file1's row + file1's 1st column + "NaN", read a new row from file1, go to step 3
if 1st column from file2 is lower then print file2's row + file2's 1st column + "NaN", read a new row from file2, go to step 3
continue until there are no more rows to read from both files

There are a few ways to do this in awk

read both files into memory (aka arrays) then process the arrays (max memory usage)
read first file into memory and feed 2nd file to awk (half the max memory)
read first file into memory and use getline to read the 2nd file (half the max memory usage)
use getline to process both files (minimal memory usage)

One awk idea using getline to process both files:

$ cat merge_join.awk
BEGIN {
delete col1                                                              # occasionally I run into cases where awk
delete col2                                                              # confuses arrays for scalars; the "delete"
delete rc                                                                # serves to type these variables as arrays
while (1) {
   if (! (1 in col1) )       { rc[1]=getline < f1                        # if no entries in array for index "1" then read new line from file1 and ...
                               col1[1]=$1; col2[1]=$2                    # store our columns in our arrays
                             }
   if (! (2 in col1) )       { rc[2]=getline < f2                        # if no entries in array for index "2" then read new line from file2 and ...
                               col1[2]=$1; col2[2]=$2                    # store our columns in our arrays
                             }

   if ( ! rc[1] && ! rc[2] ) break                                       # if no more rows in both files then break out of the while loop
   if ( ! rc[1]            ) { col1[1]=col1[2]; col2[1]="NaN" }          # if no more rows in file1 then default the col1[1] and col2[1] entries
   if (            ! rc[2] ) { col1[2]=col1[1]; col2[2]="NaN" }          # if no more rows in file2 then default the col1[2] and col2[2] entries

   if ( col1[1] == col1[2] ) { print col1[1], col2[1], col1[2], col2[2]
                               delete col1; delete col2                  # delete both col1[] and col2[] entries
                             }
   else
   if ( col1[1] <  col1[2] ) { print col1[1], col2[1], col1[1],   "NaN"
                               delete col1[1]                            # delete just the f1/file1 entry
                             }
   else                      { print col1[2], "NaN"  , col1[2], col2[2]
                               delete col1[2]                            # delete just the f2/file2 entry
                             }
  }  # while
}    # BEGIN

NOTES: as currently coded the script will continue to run getline < fN even after all rows have been read from fN, though the return code (stored in rc[N] will flag the operation as a failure); I've opted to continue operations based on the return codes (rc[N]) instead of halting the associated getline < fN

Test data:

$ cat file1
130 130.130
130 131.132
140 140.140
140 141.141
140 142.142

$ cat file2
140 149.149
140 148.148
150 150.159
150 150.158

Running the script against the two test files:

$ awk -f merge_join.awk -v f1=file1 -v f2=file2
130 130.130 130 NaN
130 131.132 130 NaN
140 140.140 140 149.149
140 141.141 140 148.148
140 142.142 140 NaN
150 NaN 150 150.159
150 NaN 150 150.158