Using Search Terms With Biopython To Return Accession Numbers
Solution 1:
Looking through the docs for esearch
on NCBI's website, there are only two rettype
s available - uilist
, which is the default XML format that you're getting currently (it's parsed into a dict by Entrez.read()
), and count
, which just displays the Count
value (look at the complete contents of result
, it's there), which I'm unclear on its exact meaning, as it doesn't represent the total number of items in IdList
...
At any rate, Entrez.esearch()
will take any value of rettype
and retmode
you like, but it only returns the uilist
or count
in xml
or json
mode - no accession IDs, no nothin'.
Entrez.efetch()
will pass you back all sorts of cool stuff, depending on which DB you're querying. The downside, of course, is that you need to query by one or more IDs, not by a search string, so in order to get your accession IDs you'd need to run two queries:
search_phrase = "Escherichia coli[organism]) AND (complete genome[keyword])"
handle = Entrez.esearch(db="nuccore", term=search_phrase, retmax=100)
result = Entrez.read(handle)
handle.close()
fetch_handle = Entrez.efetch(db="nuccore", id=results["IdList"], rettype="acc", retmode="text")
acc_ids = [id.strip() for id in fetch_handle]
fetch_handle.close()
print(acc_ids)
gives
['HF572917.2', 'NZ_HF572917.1', 'NC_010558.1', 'NZ_HG941720.1', 'NZ_HG941719.1', 'NZ_HG941718.1', 'NC_017633.1', 'NC_022371.1', 'NC_022370.1', 'NC_011601.1', 'NZ_HG738867.1', 'NC_012892.2', 'NC_017626.1', 'HG941719.1', 'HG941718.1', 'HG941720.1', 'HG738867.1', 'AM946981.2', 'FN649414.1', 'FN554766.1', 'FM180568.1', 'HG428756.1', 'HG428755.1', 'M37402.1', 'AJ304858.2', 'FM206294.1', 'FM206293.1', 'AM886293.1']
So, I'm not terribly sure if I answered your question satisfactorily, but unfortunately I think the answer is "There is no magic."
Post a Comment for "Using Search Terms With Biopython To Return Accession Numbers"