Using NCBI E-utilities
Using Entrez from Biopython
Step 1: import Entrez
from Bio import Entrez
Step 2: enter your e-mail
The NCBI server might block anonymous requests, especially big ones!
Entrez.email = "[email protected]"
Step 3: Call esearch to find IDs
handle = Entrez.esearch(db="value", term="keywords", retmax=100)
Parameters include:
parameter | examples |
---|---|
db | nucleotide |
protein | |
pubmed | |
term | human[Organism] |
hemoglobin | |
hemoglobin AND alpha | |
retmax | 10 (identifiers returned) |
Step 4: get a list of IDs out of esearch
records = Entrez.read(handle)
identifiers = records['IdList']
Step 5: use efetch to retrieve entries
We use the list of identifiers from step 4:
handle = Entrez.efetch(db="value", id=identifiers, retmax="200",
rettype="fasta", retmode="text")
To read data from text entries as a string:
text = handle.read()
To read records from XML entries:
records = Entrez.read(handle)
In addition to the above, parameters include:
parameter | examples |
---|---|
id | single id |
rettype | fasta |
gb | |
retmode | text |
xml |
Documentation:
You find a full list of available options on http://www.ncbi.nlm.nih.gov/books/NBK25500/
Example URLs
1. Searching for papers in PubMed
2. Retrieving publication records in Medline format
3. Searching for protein database entries by keywords
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=cancer+AND+human
4. Retrieving protein database entries in FASTA format
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=1234567&rettype=fasta
5. Retrieving protein database entries in Genbank format
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=1234567&rettype=gb
6. Retrieving nucleotide database entries
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=9790228&rettype=gb