Graph-tool - Reading Edge Lists From Pandas Dataframe
Solution 1:
That's really odd behavior, I've never used graph-tools (always networkx) so I can't reproduce right now, but this might help.
According to the docsedge_list
can be an iterator. Which means you could try using comprehension to create a generator out of df.values.tolist()
and passing that as edge_list
, I don't know if it will speed things up on your (~4*10^6 nodes).
It'd look like this:
g.add_edge_list((item foritemin df.values.tolist()))
Example of size difference
import numpy as np
import sys
df = pd.DataFrame(np.random.rand(1000,2)) # example "large" dataframeprint sys.getsizeof(df.values.tolist())
print sys.getsizeof((item for item in df.values.tolist()))
8072#type list80# type generator
Just an idea
Solution 2:
I can't reproduce this. If I load the data frame from the csv file:
node1,node2
1,2
2,3
1,4
3,1
4,3
1,5
I get your second figure after calling g.add_edge_list(df.values)
.
Solution 3:
This is old, but I noticed that the first graph is what would happen if you read off pairs of vertices from the dataframe in column major order. I imagine this is the source of the strange behavior.
Post a Comment for "Graph-tool - Reading Edge Lists From Pandas Dataframe"