Skip to content Skip to sidebar Skip to footer

Insert Multiple Elements Into Pandas Series Where Similarities Exist

Here I'd like to insert the row 'None' between wherever there are two rows with 'href' in the tag--note, each row with href is NOT identical. im

Solution 1:

Than can be easily achieved if you go to numpy.

In your example:

dups = table.str.contains('href') & table.shift(1).str.contains('href')

array = np.insert(table.values, dups[dups].index, "<td class='test'>None</td>")

pd.Series(array)

Solution 2:

Ecotrazar's solution above is both faster and more elegant. Here is my version using for loops and his numpy insert method.

import pandas as pd

table = pd.Series(

        ["<td class='test'><a class='test' href=...", # 0 "<td class='test'>A</td>",                    # 1"<td class='test'><a class='test' href=...",  # 2"<td class='test'>B</td>",                    # 3"<td class='test'><a class='test' href=...",  # 4"<td class='test'><a class='test' href=...",  # 5"<td class='test'>C</td>",                    # 6"<td class='test'><a class='test' href=...",  # 7 "<td class='test'>F</td>",                    # 8"<td class='test'><a class='test' href=...",  # 9 "<td class='test'><a class='test' href=...",  # 10 "<td class='test'>X</td>"])                   # 11

insertAt = []
for i inrange(0, len(table)):

  if'href'in table[i] and'href'in table[i + 1] and i == 0:
    print(i + 1, ' is duplicated')
    insertAt.append(True)
  elif i == 0:
     insertAt.append(False)

  if'href'in table[i] and'href'in table[i+1] and i > 0:
    print(i + 1, ' is duplicated')
    insertAt.append(True)

  else:
    insertAt.append(False)

insertAt = pd.Series(insertAt)
print(insertAt)

import numpy as np
array = np.insert(table.values, insertAt[insertAt].index, "<td class='test'>None</td>")
pd.Series(array) # back to series if necessary

Post a Comment for "Insert Multiple Elements Into Pandas Series Where Similarities Exist"