Kristjan Kannike

ArXiv Search in Python

The arXiv API gives access to the Cornell e-preprint server archive of science articles. The answer to a query, specified in terms of keywords, a comma-separated list of article ID-s, start position, and number of articles retrieved (the default is 10), is returned in Atom format.

Let usfrom search_arxiv import * after having copied the script to our Python path.

For a relatively small number of articles on a given topic, e.g. on unparticles, a single_query(search_query='all:unparticle', id_list='', start=0, max_res=100) can be used. (The defaults are search_query='' , id_list='' , start=0, max_res=10.)

For a larger number of articles in separate Atom files, arxiv_query(search_query='', id_list='', start=0, total_res=None, max_res=100) can be used. Default arguments are given here; with total_res=None , all the articles corresponding to a given query are retrieved in chunks of max_res. (Wait time between the chunks is 3 seconds, as suggested in the manual.)

The results can be conveniently parsed with the Feedparser.