The error “invalidindexerror: reindexing only valid with uniquely valued index objects” occurs when you try to contact two or more pandas dataframes like data = pd.concat([dataframe1,dataframe2], axis=1)
if you remove duplicate contents that conflict or additional columns, it will still throw an invalid-index-error.
Example 1: lets try to join below two data-frames and see the result we will get.
import pandas as pd
firstDF = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
})
secondDF = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from firstDF
thirdDF = pd.concat([firstDF, secondDF], ignore_index=True).reindex(firstDF.columns, axis='columns')
Below is the expected result
thirdDF
col1 col2
1 4
2 5
3 6
7 10
8 11
9 12
If we run the application this is the full stack-trace we get…
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/index.py in get_indexer(self, target, method, limit, tolerance)
2435 index = index.intersection(other)
2436 return index
-> 2437 union = _union_indexes(indexes)
2438 return _ensure_index(union)
2439
2440
2441 if not self._index_as_unique:
-> 2442 raise InvalidIndexError(self._requires_unique_msg)
2443
2444 if not self._should_compare(target) and not is_interval_dtype(self.dtype):
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Solution: If the data that you’re working with is correct, then the reason must be related to duplicate column names. The solution to the raise error is to concat first the columns with the duplicate entry before joining it to other data frames…
firstDF = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
}).rename(columns={'col2':'col1'})
print (firstDF)
col1 col1 <- col1 is duplicated
0 1 4
1 2 5
2 3 6
secondDF = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only columns from firstDF
thirdDF = pd.concat([firstDF, secondDF], ignore_index=True).reindex(firstDF.columns, axis='columns')
print (thirdDF)
To see why the InvalidIndexError: Reindexing only valid with uniquely valued Index objects error was thrown, print them the following…
print (df1.columns[firstDF.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')
print (secondDF.columns[secondDF.columns.duplicated(keep=False)])
Index([], dtype='object')
Alternative fix is to
deduplicate it…
print (pd.io.parsers.ParserBase({'names': firstDF.columns})._maybe_dedup_names(firstDF.columns))
['col1', 'col1.1']
Example 2: Suppose we still have below 100+ dataframes with each having a unique index and some with empty columns:
import pandas as pd
df1 = pd.DataFrame({'a':[3,7,2,0], 'b':[], 'c':[7,3,3,6]}, index=['a1','a2','a3','a4'])
df2 = pd.DataFrame({'a':[9,7,3,1], 'b':[5,67,39,92], 'c':[42,27,56,66]}, index=['b1','b2','b3','b4'])
#I have ignored the rest of the dataframes with empty columns
# let concate/join the two as
df = pd.concat([df1,df2])
If we run the above python application this is the error we get…
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Solution: If the column ‘b’ in the ‘df1’ above isn’t empty, then the above application would have compiled successfully without any error. But because there is some dataframes with empty columns, we get an error and the solution is to reset or drop the empty columns first before calling the concat() method with df.reset_index(inplace=True, drop=True).