Skip to content Skip to sidebar Skip to footer

Graph-tool - Reading Edge Lists From Pandas Dataframe

I'm starting working with graph-tool, importing a list of edges from a pandas dataframe df like: node1 node2 0 1 2 1 2 3 2 1 4 3 3 1 4

Solution 1:

That's really odd behavior, I've never used graph-tools (always networkx) so I can't reproduce right now, but this might help.

According to the docsedge_list can be an iterator. Which means you could try using comprehension to create a generator out of df.values.tolist() and passing that as edge_list, I don't know if it will speed things up on your (~4*10^6 nodes).

It'd look like this:

g.add_edge_list((item foritemin df.values.tolist()))

Example of size difference

import numpy as np
import sys

df = pd.DataFrame(np.random.rand(1000,2)) # example "large" dataframeprint sys.getsizeof(df.values.tolist())
print sys.getsizeof((item for item in df.values.tolist()))

8072#type list80# type generator

Just an idea

Solution 2:

I can't reproduce this. If I load the data frame from the csv file:

  node1,node2
  1,2
  2,3
  1,4
  3,1
  4,3
  1,5

I get your second figure after calling g.add_edge_list(df.values).

Solution 3:

This is old, but I noticed that the first graph is what would happen if you read off pairs of vertices from the dataframe in column major order. I imagine this is the source of the strange behavior.

Post a Comment for "Graph-tool - Reading Edge Lists From Pandas Dataframe"