make_bib is a package that tries to make this task as automatic as possible by taking advantage of existing databases and character string manipulation softwares. Given a list of people's names and a range of years, it will produce an HTML bibliographic list for these people, using data available at the NASA ADS abstract service. Why HTML? Because it is easy to create HTML code, because once you've posted it, all the people in you lab can check their references (see the author breakdown option), because Microsoft Word can take in HTML code and secretaries use Microsoft Word and not LaTeX, and because HTML to LaTeX converters should exist (if you are interested, you can find a variety of such converters at the W3Consortium repository).
make_bib would not have seen the light of day if Gary Mamon (IAP) had not decided to build code to interact with ADS to get citation information, or to download papers automatically. If you are interested by these functionalities, go check his software page.
If you meet these requirements, you can download make_bib
Once this is done, bib2html will let you convert this structure into HTML pages with reference sorted per authors or per year, and per type (refereed journal, refereed conferences, conferences, non-refereed journals and books). You can then use any HTML4.0 compliant browser to view the file.
If you have chosen to write the HTML files as forms, then upon validation of them by their owners, you will receive coded updating information by email, and you can process this information to update your bibliographic structure with bib_update.
These three operations use IDL scripts, you therefore have to be in IDL to use them, and this is what I will assume in the following examples.
For all routines (including the shell scripts), calling them with no arguments prints the on-line help.
Since we are making a reference list, we will have many of these cards, which is no problem as IDL allows to create arrays of structures (i.e. collections of structures). This can be considered as our filing cabinet for our reference list.
To populate this cabinet, we will first use the function make_bclist. This first step will only give us bibcodes, vital information but not really what you are looking for, but we'll see later how to get the rest of the reference. If you just call it without argument, this is what you get:
IDL> result = make_bclist() correct call is: bib_str = make_bclist([name=name,personnel=personnel, [,year=year,/since,upto=upto,/verb,/debug loserver=loserver,timeout=timeout] where bib_str will be a structure where the bibcodes and related data are be stored name to make the bibliography of 1 person this should be a string of the form: surname initial (e.g. sauvage m) if year is set or surname initial year where year is the starting date for that person's bibliography personnel can be used to provide the file with the personnel names. note that either name or personnel has to be set year to build only one year's bibliography year is an integer upto can be used to build the bibliography between year and upto /since indicates that the bibliography should be made from the date given in year up to now /verb prints information for debugging loserver is a string array that contains the choice of servers to use in order to avoid being banned from ADS. choose between cl, eso, fr, in, jp, kr and us default is all timeout in seconds is a period when we wait before making the call to ADS, again not to be banned default is 30sThus, if you want to get the bibcodes of your complete bibliography (as known by ADS), and assuming that your name is einstein but you initial is b, the way to do it is:
idl> my_bibcodes = make_bclist(name='einstein b')
Adding your initial to your name is not required, it simply avoids retrieving paper from people with the same name as yours, which you would then have to delete.
Sometimes you just want to have your bibliography for a given year. This is easy, for instance, if this year is 1990, type:
idl> my_bibcodes90 = make_bclist(name='einstein b',year=1990)
if what you want is your bibliography since 1990, then just add the keyword /since to the previous command line:
IDL> my_bibcodes90_on = make_bclist(name='einstein b',year=1990,/since)and if what you want is your bibliography between 1990 and 2000, just use the keyword upto:
IDL> my_bibcodes90_on = make_bclist(name='einstein b',year=1990,upto=2000)
Now that is well for one person, but if you want to make it for a list of people, you will have to use another method (you could call make_bclist a multiple number of times but that is not very elegant). The simplest way is to create a personnel list, i.e. a list containing the names of the people in your group. Since you are editing a file, it is the occasion to introduce some other information such as the fact that not all your collaborators entered your group the same year, and not all remained in your group till now. Thus for each person you can define an entrance date and an exit date which we can use to select the papers they published while in your group. You should then create a text file where the lines looks like this:
newton b i1989 o2000
The initial is not mandatory either here, the formatting is free (i.e. the number of spaces is not a problem). The only mandatory element is that the entrance date should be present and flagged with the letter i. The exit date is optional, but when it is there it should be flagged with an o. You can insert comments in this file (i.e. to identify the group of people who are post-docs, PhD students...) by placing the character ; at the beginning of the line. You can also decide to restrict the databases searched for some authors. Imagine that one of your team member has a very common name, e.g. Martin, and you know he has not published in instrumentation journals. It would thus make sense to exclude instrumentation journals from the search as this is bound to generate a high number of false references. You can do that by adding -INST after his entrance (or exit date) such as in the following example:
martin b i1989 o2000 -INST
Similarly, the astrophysics database is excluded with -AST and the physics/geophysics one with -PHY.
To illustrate the various possibilities you have to create a personnel list, I have included a fictitious one in the distribution, and you can also view it here.
So, to get the bibcodes of a group of people listed in the file my_group.txt, just type:
IDL> group_bib = make_bclist(personnel='my_group.txt')
Note that the year and /since keywords can still be used with the personnel keyword. When they are, make_bib selects only papers that satisfy both the entrance and exit date criteria (when present) and the year and /since criteria.
Finally you will notice two extra keywords that you can use in the call to make_bclist: loserver and timeout. These keyword are there to protect you from being banned by the ADS. Indeed this package make a heavy use of the ADS: for each person, getting the complete list of bibcodes requires 2 queries to the ADS (to be able to separate the refereed from the non-refereed papers), and then for each bibcode, getting the complete reference information requires 2 to 3 queries to the ADS (because no all the information is present or easily processable in the bibtex or tagged output of the ADS). Therefore compiling the bibliography of a reasonnable list of people can easily generate a thousand request to the ADS which, should they be too quick, will result in your IP being categorized as an attacker and being banned from further access. Therefore the package distributes the queries to 7 different mirrors (if you find that one is down or very slow you can use loserver to restrict the list of servers to use. Furthermore it waits for timeout seconds before making the call. I have experimented that using all 7 servers with a timeout of 1 second is viable.
At this stage now we have in the output variable of make_bclist the list of bibcodes corresponding to the bibliography we are trying to build (there are other variables in that output but these are meant for the inner workings of the package). Each bibcode in this variable is unique but there are quite a number of them that will now make it to the final bibliography (those belonging to homonyms for instance). Among them, preprints hold a special place. The ADS now lists the arXiv preprints. get_bibcodes could be modified to avoid returning them but (1) it is not straightforward and In some cases you may want to keep some of them (those corresponding to papers in press). Therefore all the arXiv preprints found are in the output variable of make_bclist. If you want to get rid of all of them, simply do:
IDL> filtered = filter_arxiv(group_bib,/verb)the /verboption will simply show you which of the codes are discarded and how many of them there was.
We are now ready for the second part of the bibliographic information retrieval: getting the bibliographic information of all the bibcodes. This is the task of the function make_reflist. Calling it without arguments brings:
IDL> result = make_reflist() correct call is: result = make_reflist(bibcodes[,/debug, ,/verb,affil=affil,loserver=loserver, /timeout=timeout]) where bibcodes is the output structure of make_bclist affil an array of strings to search in the affilitation field to declare that a paper really belongs to your group /verb prints information for debugging /debug prints more information loserver is a string array containing the codes for the servers to use in order not to be banned by ADS. choose between cl, eso, fr, in, jp, kr, and us. default is all timeout is a wait time before making the call to ADS. default is 30s
Most of the parameters are self explanatory. The only potentially intriguing one is affil. This allows you a first automated check that the papers retrieved indeed belong to your group by checking the affiliation present in ADS (although note that this field is not complete). You can do that by providing key strings to search with keyword affil. For instance, assuming your team members sign their paper as IAP or UMR 245, the previous call could be:
IDL> reflist = make_reflist(group_bib,affil=['IAP','UMR 245'])
If affil is not set, then make_reflist searches for Saclay affiliations. It is easy to change this default at the beginning of the source code. At the end of this process, you now have a potentially large, and very precious array of structures and it would be wise to save it...
IDL> bib2html Correct call is: bib2html,bib_str,file[,/authors,/verb,/debug, personnel=personnel,/separated,title=title /url,/form,/single,address=address where bib_str is the bibliographical structure produced by make_reflist file is the filename for the HTML source /authors list references for each authors (for checking purposes mainly) and therefore leads to repetitions. This option requires a personnel list. /verb prints information on the execution. /debug adds information in the html output personnel to provide the name of the personnel file (used in the /authors option, default is SAp_personnel.txt) /separated in the /author option, makes a separate page per author title to provide a title to the HTML page (default is tailored to SAp) /url add a link to the ADS abstract after each reference /form makes the HTML page a form so that users can check their references /single combined with /form, creates a single form for the complete bibliography instead of one per authors. This is useful when a single person will check the complete bibliography. address can be used to specify who should receive the forms (default is msauvage_at_cea_dot_fr)bib2html has two arguments: the bibliographical structure that has just been created by make_reflist, and a file name, which will be the name of the HTML document. There are two sorting orders for the references: per increasing year and then alphabetically, or per authors in your personnel list, and then per year. There is then a further subdivision in refereed journal papers, refereed conference papers (a rather theoretical specie as it is not easy to get that information from ADS), conference papers, non-refereed journal papers, and books.
Since the HTML document can be rather long, there are a number of anchors placed here and there and which are accessible from the top of the document. In the author sorting option you can actually make one HTML document per author.
An important point to mention for the author sorting option is that it leads to repetition of references with more than one author from your list. This is because this option is meant to allow authors to check their publication list. If you know what HTML forms are, then there is a very powerful checking option in bib2html that you can set with the /form keyword. You can transfrom each author's HTML page into a form where each reference is followed by checkboxes that allow to exclude references because they belong to an homonym, change the refereeing status of references, and change the affiliation status of references. Once these forms are filled, the send form button sends the information to a recipient specified in the address keyword in the call to bib2html. Use the function bib_update to update the bibliographic structure with the results from the form.
If the full bibliographic file is going to be checked by one person, you can combine /form with /single to have a single form containing all the references.
To ease the checking, you can also include link to the publications in the HTML form. This is done by adding /url to the bib2html command line. Note that the /url option does not require the /form option to be set.
To give you a feeling of what output is produced, I have compiled the publication list for the a certain number of people in the Service d'Astrophysique, for the years 1998 to 2000, which is now in a variable called bib98. Following each call, a link will lead you to the resulting HTML code.
In these HTML pages, the references in red are those for which the affiliation check produced a negative result (you can only see these color changes if your browser handles correctly Cascading Style Sheets). You can force all reference to appear in black (or red) by changing manually the affiliation flag in the bibliographic structure. This is done easily with:
IDL> bib98(*).affil = 'y'
IDL> bib98(*).affil = 'n'
Note that when /authors is used, bib2html requires a file listing the authors. This is the same file you provided to make_bclist. Here again it is defaulted to an SAp-tailored file. For your use, you should provide that file name with the keyword personnel.
Also note that setting /form without setting /single sets the authors and separated keywords as well, you do not need to repeat them.
An important point here is that even if a user marks a reference for deletion (either because it belongs to an homonym or because it is not affiliated to the lab), bib_update will not delete this reference from the database. It will flag it in such a way that subsequent calls to bib2html will ignore it. This is done with the tag exclude in the reference structure. This way, the reference is not lost.
Before showing you how this works, there is a little caveat to mention. Although HTML is standardized, the format in which you will receive the form information varies with the browser. Let me dwell on that as it has some importance. The way the forms are set, we can define a unit of information as:
where, in our case, tag_name can have the following values (they are set by bib2html):
|Tag Name||Signification and action|
|exclude||Exclude this reference from the list|
|is_ref||This reference was refereed, sets referee to y|
|not_ref||This reference was not refereed, sets referee to n|
|is_affil||This reference belongs to the lab, sets affil to y|
|not_affil||This reference does not belong to the lab, excludes it from the list|
bibcode represents a standardized bibliographic code. In a form sent by the HTML4.0-compliant browser ICab (a very nice browser indeed), there is only one information unit per line, and the bibcodes are unaltered. This makes for very simple parsing. In Netscape however, the information units appear on the same line, separated by the character &. When this line gets too long, it is broken arbitrarily and generally inside an information unit. I have not checked yet what happens with Internet Explorer but I suspect it will be different again.
The linebreak inside information units is a problem bib_update is not able to handle yet. Therefore it is your job to edit the mails you receive to make sure line breaks do not occur inside information units. Note also that it appears that Netscape inserts the character ! at the line break. You must remove this character.
Firefox appears to return all the units on a single line separated by the &. This is handled "gracefully" by bib_update so no editing is in principle required. You should still check the form output as the separation character could also appear in the bibcodes of the Astronomy and Astrophysics journal.
Firefox and Netscape replace the & in bibcodes by %26, which bib_update also handles correctly, but if the & is still in the journal code, then the bibcode will be split. This should produce an error from bib_update.
Now let's assume you have done this little editing, and that the results from the form emails are stored in a file called email.txt (an example of such a file is found here, note that I have combined results from ICab and Netscape and so & characters in the bibcodes sent by Netscape are replaced by %26).
A simple call to bib_update, a function, with no parameters will produce the on-line help:
IDL> result = bib_update() Correct call is: new_bib = bib_update(bib_str,email[,/single,/verb, parsed=parsed]) where bib_str the bibliographic structure created by make_bib. Of particular importance are the existence of the tags referee and affil which are updated. email name of the file where the results of form checkings are stored. This results are sent by email once the authors clik on the submit button. this file has to be less than 5000 lines long. /single to be used when all the form codes are on a single line separated by & /verb makes the software talkative parsed an output structure containing the information in the email file parsed along the categories exclude, refereed, not refereed, affiliated, not affiliated new_bib the updated bibliographic structure, where some references have been removed and some updated.
The principle of bib_update is that is parses the information in the email file, i.e. sorts the bibcodes according to the tags they were attached to, and then removes or update informations on the references. Thus, assuming the bibliographic structure is in a variable called bib98, you would simply send the command:
new_bib = bib_update(bib98,'email.txt')
Since you will get inputs from different sources, there is a chance that this will contain contradictory informations, for instance two authors of the same paper may not attribute it the same refereeing status, or more likely the same affiliation status. bib_update checks for these contradictory information and if it finds any, it will not update the structure, and instead returns the information it has obtained from the email file, parsed in a structure containing 7 arrays:
|Structure Tag Name||Content|
|exclude||bibcodes marked for deletion|
|is_ref||bibcodes marked as refereed|
|not_ref||bibcodes marked as not refereed|
|is_affil||bibcodes marked as affiliated to the lab|
|not_affil||bibcodes marked as not affiliated to the lab|
|ref_conflict||bibcodes with conflicting referee information|
|affil_conflict||bibcodes with conflicting affiliation information|
With this structure, and the bibliographic structure, you should be able to understand the reasons for the contradictions. You need to solve them before bib_update can run. This is to avoid information losses. Note that in principle, if users check a common version of the bibliography, there can be no conflict: e.g. you can flag that a refereed reference is in fact not refereed, but you cannot confirm that it is refereed (this is implicit). Thus this feature of bib_update serves more to make sure that everyone is using the same version of the library.
For checking purposes, and even if no conflicts are found by bib_update, you can retrieve the parsed update information with the parsed keyword:
new_bib = bib_update(bib90,'email.txt',parsed=parsed)
If no conflict is detected, then new_bib is your updated bibliographic structure (better save it!) and you can use bib2html to produce your final HTML bibliographic list.
For your information, note that a reference will be printed by bib2html as long as the exclude is 'n' (even if the affiliation tag, affil is 'n'). you can update this tag manually to exclude some references. You can also select all the references based on this tag to extract a smaller array of structures that contains only the references belonging to your bibliography. I have kept this step manual in order to always keep all the references returned by ADS. This way you can always go backward in the process without having to query again the ADS, which is a rather long process.
If you have questions, just email them to me.
Q:While running make_bclist I have the following message:
lynx: can't access startfile http://cdsads.u-strasbg.fr/cgi-bin/nph-abs_connect?author=lada%2C%20&db_key=PHY&nr_to_return=1000&star_year=1995&sort=ODATE&jou_pick=NO
What does this mean?
A:This indicates that the lynx query failed, most likely because the server was unavailable at the time of the request. Although make_bclist is able to handle failed requests, this means you have lost information. It is advisable to restart make_bclist. Note that in that case, we have jou_pick=NO which indicates that we are querying ADS for refereed journals only. This means that the error occurs when make_bclist produces a second list of refereed papers. This list is used to check whether the referee status that we attribute on the basis of journal names is consistant with ADS. Therefore in that particular case, only this information is lost, but all the papers will be collected.
This type of error can also affect make_reflist, in which case you are also warned but you loose information. To know what has been lost, if the bibcode is not printed by the script, search the ouput array of structures for a structure where all the fields are 'void'.