BibLaTeX references across time and cultures

I was writing paper in English so ideally there would have been solid English language references, but I found this was not so for my topic.

This is for LaTeX authors who need to include references with unusual requirements, especially non-latin scripts, non-English references, rare scripts and ancient documents. In some references I had all three, meaning in my case:

Category Range in my references
Languages Chinese, Sanscrit, Spanish, Arabic
Eras ancient (3400BCE), modern ancient (900CE), old (1898)
Right-to-left Arabic
Dates precise, approximate, and ranges

I had no previous experience of publishing an extensive research piece with this sort of variation. No computer system really handles it well, and as content is increasingly created in non-latin scripts, latin-centric software is showing its flaws. LaTeX has responded with the modern LuaTeX project which demonstrates how robust this 50 year old software is.

The way LaTeX and BibLaTeX work is that you set up the environment in your .tex file, and then adopt conventions in your .bib file which match what the envionment is looking for.

The following is set out with .bib first and .tex second, so you can see what I achieved before you follow how I did it. Your paper may have different requirements.

Authors and titles

For authors and titles with non-latin characters, always use this form in your biblatex file:

1
2
3
       author                = {张伟},
       shortauthor           = {Zhang Wei},
       nameaddon             = {Zhang Wei},

The two identical English approximations are used differently by Biblatex: shortauthor is rendered in the main text eg:

       Leith residents built a big wall[Zhang Wei 2025]

but for the same entry nameaddon is rendered in the bibliography eg:

1
       Zhang Wei 张伟. Building the Great Wall of Corstophine.

this really matters when things get more complicated, as you’ll see.

Latinisation

Where there are latinised versions of Chinese/Arabic/Sanscrit author names, they must appear in the nameaddon field. This wasn’t a problem in the example above because the only name used is the English approximation. However is often a latinised version that retains features of the original which English cannot express. For example:

1
2
3
       author                  = {鲁迅}, 
       shortauthor             = {Lu Xun},   <-- widely used English approximation
       nameaddon               = {Lǔ Xùn},   <-- latinised equivalent (in this case pinyin)

There is maybe 100 or so latinisation systems for encoding non-latin languages. Here are some examples:

Script/Language Romanization System
Chinese Pinyin
Arabic Latin-i harakat
Japanese Hepburn romanization (Hebon-shiki)
Sanskrit IAST (International Alphabet of Sanskrit Transliteration)
Korean Revised Romanization of Korean
Russian BGN/PCGN romanization
Thai RTGS (Royal Thai General System)
Serbian Gaj’s Latin alphabet

These latinisations are for people who prefer to use latin scripts when writing in (for example) Arabic or Chinese. Even though the script is latinised these systems still often require a specific font installed due to accents. The best (but not necessarily most practical) answer is always to use the original script, as demonstrated by the number of different Latinisation systems. For example Arabic has three main systems: DIN, ALA-LC and Hans Wehr, with the least offensive everyday term for all of these being Latin-i harakat. For everyday usage by Arabic langauge speakers who wish to use a latinised script there is yet another system, a kind of text-speak which is different again from the other three.

In a similar way, Japanese has both Hebon-shiki (called Hepburn in English) and also Kunrei-shiki, while Korean has two latinisations, Mandang has three N’ko latinisations and so on.

There is a similar but slightly different trick for handling latinisations in titles:

1
2
       title                 = {كتاب الحاوي في الطب},
       titleaddon            = {Kitāb al-Ḥāwī fī al-ṭibb},  <-- Latin-i harakat

Translations

There is a problem in the Arabic title given above, because while the titleaddon contains the official latinised script it still lacks an English translation. It’s great to have the latin script so you know what to search for if you don’t read Arabic characters (you can still copy/paste Arabic and that can be essential, but even in 2026 some computer systems still don’t handle Arabic very well.). So if there is an English translation of a title or an author, it is helpful to add it.

In this case the translated title should be in the note field, as follows:

1
       note                  = {Translated as: The Comprehensive Book of Medicine}

This isn’t just for Arabic, the same is true for Chinese. Chinese is a great example of this difference: pinyin latin equivalents are often supplied, but this is not a translation. Note also that there are often many ways to translate a given title to English, so (as in the case of this current bibliograpy) where translations are few, partial or obscure, the English translation of the title/author may be so misleading to readers you are better off using the original. Even if you have no knowledge of the language and don’t read the script, a search engine is more likely to find information about a rarely-translated author if you use their native Chinese/Arabic/etc. name.

Right-to-left scripts

In the case of left-to-right scripts (which is the default for CJK and latin languages) then the above conventions will work. These conventions may seem as though they work for Arabic, but there is still a problem due to script being written right-to-left. Biber detects the arabic text and switches to right-to-left so that the Arabic script is correct, unfortunately it also switches all text in the reference including latin characters whether for English or latinisation. So an Arabic reference containing a latin field, as all of them normally do, will have latin fields rendered like this:

1
2
3
      'Medicine of Book Comprehensive The'     or

      'enicideM fo kooB eviseneherpmoC ehT'

depending on context. To fix this, set the default language in the preamble to a left-to-right language such as ‘british’ (as in this bibliography) or ‘chinese’. Then preserve the Arabic text in the reference exactly how it is written by enclosing it in double braces like this:

1
       title                 = \textarabic{{كتاب الحاوي في الطب}},

Preservation

Preservation with double braces is useful elsewhere too. Another common problem is that many latinised scripts contain special characters requiring escaping with double braces like this Arabic example:

1
2
3
      publisher   = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}
                        ^          ^        ^
                        \-----------\--------\--- breaks biblatex without {{double}}

The double quotes preserve the string exactly as written. Another example is:

1
      author = {Dundee Museum}

will render ‘Museum, Dundee’, unless you say

1
      author = {{Dundee Museum}}

Dates

Dates use the EDTF (ISO8601-2) standard. BibLaTeX handles BCE/CE dates correctly but also avoids prefixes when it is pointless or distracting. The full syntax for dates is in the BibLaTeX user manual: https://ctan.org/macros/latex/contrib/biblatex/doc/biblatex.pdf

Long lists of authors

for very long lists of authors (such as [Meisner2024] in this bibliography) include all authors separated by ‘and’, rather than saying ‘, and others’. Biber has been setup in the preamble with max/mincitenames and maxbibnames so that it will say ’et. al.’ in the text but include all names in the bibliography.

Full example

Here is a fictional example in full, handling both Chinese and right-to-left Arabic, with translations in latin script and correct use of -addon and note fields.

1
2
3
4
5
6
7
       author      = {冷開泰},                                           name
       shortauthor = {Leng Kaitai},                                      English approximation
       nameaddon   = {Lěng Kǎitài},                                      correctly latinised
       title       = \textarabic{{كتاب الحاوي في الطب}},                 Right-left preserved
       titleaddon  = {Kitāb al-Ḥāwī fī al-ṭibb},                         latinised
       note        = {Translated as: The Comprehensive Book of Medicine},English translation
       publisher   = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}              latinised, with unsafe quotes

NB ‘authoraddon’ is not a valid field name, although it would seem logical that it would be instead of shortauthor.

Entire real-world references look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
@book{sushruta_samhita_1907,
  title         = {The {Suśruta-Saṃhitā}},
  titleaddon    = {\textsanskrit{सुश्रुतसंहिता}},
  author        = {Suśruta (composite work)},
  nameaddon     = {\textsanskrit{सुश्रुत}},
  translator    = {Bhishagratna, Kaviraj Kunja Lal},
  date          = {1907},
  origdate      = {-0599~/-0499~},
  publisher     = {Calcutta},
  url           = {https://wellcomecollection.org/works/vnqskk8w/items?canvas=98&manifest=2},
  note          = {English translation of the original Sanskrit text (circa 600 BCE--500
  BCE), including discussion on transmissibility. The
  \href{https://www.wisdomlib.org/hinduism/book/sushruta-samhita-volume-2-nidanasthana/d/doc142863.html}
  {Wisdom Library translation} appears to be similar.},
  keywords = {ancient},
}

LaTeX setup for this bibliography

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
%%
%%  Font setup for Latin and Chinese
%%

% The following Chinese font support requires an exact font name match Eg on my Linux:
% I install the package adobe-source-han-serif-cn-fonts, followed by 'fc-list | grep "Han Serif"'.
% For reference, on my system 'fc-list | grep Han' gives 136 lines, because
% I installed all the Adobe CJK fonts (Chinese, Japanese, Korean) as recommended by CJK
% experts.

% The order of package loading really matters because some of the references use bidi
% (bi-directional) text in order to display the relevant Arabic, for which there is no
% translation to a Western Language. bidi was a retrofit onto latex and is a bit sensitive.
% If bidi wasn't needed, packages could be loaded in any order. These problems are steadily
% reducing as lualatex is developed. Lualatex is really quite an impressive redevelopment.

% Maths comes first in a bidi world
\usepackage{amsmath}
\usepackage{amssymb}

% Fonts next for bidi ordering reasons
\usepackage{fontspec}
\usepackage{luatexja-fontspec} % CJK handling (not just ja). No equivalent needed for other languages.

% Not needed at all except for bidi. It "stabilises arrays for bidi" according to experts.
% I don't understand but it did make errors go away.
\usepackage{array}

% Polyglossia is needed to do language-aware hyphenating, date formats, quote style etc
% in at least the csquotes and biblatex packages. Replacement for the older babel package.
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{arabic}
\setotherlanguage{sanskrit} 

% no \setotherlanguage above for chinese, because luacjk handles this and we don't want
% panglossia and luacjk to get into a fight about who captures the incoming CJK unicode.
% This potential clash is a recurring theme in this preamble.

\newfontfamily\arabicfont[Script=Arabic]{Amiri}
\newfontfamily\arabicfonttt[Script=Arabic]{Amiri}
\newfontfamily\chinesefont{Source Han Serif SC}[
  Renderer=Harfbuzz,
  Script=CJK,
  AutoFakeSlant=0.2,  % CJK doesn't have italics, so this does a bit of judicious tilting
  AutoFakeBold=2 
]

\ltjsetparameter{jacharrange={-1}} % Prevents luatexja from being too aggressive and
% seizing all Unicode text that might be CJK, including parts of biblatex references
% that are merely adjacent to CJK text. 
\setmainjfont{Source Han Serif SC}[
  Index=2,
  Renderer=Harfbuzz,
  AutoFakeSlant=0.2,
  CharacterWidth=Full,  % Forces better mapping of CJK punctuation
  BoldFont={* Bold}     % Explicitly point to the bold weight so Biber knows it exists
]
% The Index=2 above is about mandating which version of a font to pick inside a TrueType collection.
% In this case, 2 is Simplified Chinese. Latex sometimes gets confused and picks (say)
% the Japanese version, so we are explicit. 0=Japanese, 1=Korean, 2=SC, 3=TC.

% Now repeat the above, only for sans not gothic. This is a trick, the point being
% that if latex wants to use a Han sans font, it will now use gothic instead. Reduces errors
% and the result seems good. Need to check with a CJK expert.
\setsansjfont{Source Han Serif SC}[
  Index=2,
  Renderer=Harfbuzz,
  AutoFakeSlant=0.2,
  CharacterWidth=Full,
  BoldFont={* Bold}  
]
% The above three commands (\ltjsetparameter, \setmainjfont, \setsansjfont) collectively
% avoid hundreds of warnings about missing fonts, and emit a better quality result.

\newfontfamily\devanagarifont[
  Script=Devanagari,
  HyphenChar=None, % Explicitly disable hyphenation, otherwise bibtex warns it can't load hyphenation rules
  ItalicFont={Noto Serif Devanagari},         % This script doesn't have italics or bold so we map them.
  BoldItalicFont={Noto Serif Devanagari Bold} % Unlike CJK where we fake slant. Also stops biblatex warnings.
]{Noto Serif Devanagari}

\setmainfont{TeX Gyre Pagella}
\newfontfamily\bigquotefont{TeX Gyre Cursor} % used for big block quotes

% The Gyre project (https://www.gust.org.pl/projects/e-foundry/tex-gyre/index_html)
% explains it all, but basically these are TeX and OpenType font families similar to
% the well-known commercial fonts with similar names, with significantly more functionality.
% On my Linux I installed the package tex-gyre-fonts .

% Disable all CJK small caps attempts. CJK doesn't have smallcaps in the fonts,
% but biber's default is smallcaps for authors. It generates a warning when it can't
% and this avoids large numbers of warnings.
\let\scshape\upshape
\let\textsc\textup

%%
%% Referencing and quoting setup
%%

\usepackage{authblk}

\usepackage[
  backend=biber,
  style=authoryear,  % alternatives include numeric, apa, etc.
  doi=true,
  url=true,
  isbn=false,
  datecirca=true,    % prints "circa" if date is followed by a slash /. Don't use tilda ~ convention.
  dateera=secular,   % handles BCE and CE by printing "BCE/CE"
  dateeraauto=1600,  % adds CE/BCE to anything before this year CE
  backref=true,      % great idea, but sometimes gets confused with the preview feature
  dateabbrev=false,
  language=auto,     % will change language according to langid, if present in an entry
  autolang=other,    % Use polyglossia/babel environments (but I am unsure why I need to set it)
  maxcitenames=2,    % Keep citations short: (Zhang et al., 2024)
  mincitenames=1,
  maxbibnames=99,    % List all authors in the bibliography (who doesn't? rude!)
  uniquelist=false   % Prevents BibLaTeX from adding names to disambiguate
]{biblatex}


% let long DOIs and URLs in bibliography break, avoiding overfull and other errors,
% and looks nicer.
\setcounter{biburllcpenalty}{7000}
\setcounter{biburlucpenalty}{8000}
\setcounter{biburlnumpenalty}{9000}

% Forces a gap between bib entries that PDF viewers can recognise as a boundary when doing
% a mouseover preview in the main text. Also just makes a bibliography look nicer.
\setlength{\bibitemsep}{1.5\itemsep}

% These mappings make sure biblatex doesn't start translating locale specific things
% like date formats or 'Appendix', 'Bibliography' etc. Other languages are merely content.
% 'british' is equivalent to the modern en_GB locale standard. This also means that the 
% default is left-to-right even in an entry containing arabic text enclosed in \textarabic{}.
% This also suppresses error messages from biblatex about 'Language not supported'.
\DeclareLanguageMapping{arabic}{british} 
\DeclareLanguageMapping{chinese}{british}
\DeclareLanguageMapping{sanskrit}{british}

% Force always printing 'nameaddon' after the author name. I use nameaddon exclusively for latin versions of
% Chinese (etc) names, so the effect is to render the real, untranslated name in the
% references. See notes at top of biblatex file for details of translation in 'note' field,
% and the special case of right-to-left arabic script in author names. This macro has
% completely replaced the authoryear macro, so it also made dates vanish until I added back here. 
% When forcing printing of nameaddon with here, we must use the nameaddon field not shortauthor.
\AtBeginDocument{
  \renewbibmacro*{author}{
    \printfield{nameaddon}
    \setunit{\addspace}
    \printnames{author}
    \setunit{\addspace}
    % This macro handles the label (derived from the 'date' field)
    % while respecting BCE/CE and circa formatting according to EDTF (ISO8601-2) dates.
    % Modern lualatex tries to be standards-compliant.
    \usebibmacro{date+extradate}
  }
}

% Make biblatex do quotation handling as expected for a paper
\usepackage{csquotes}

% The following macro seems to approximate the style I see in academic papers.
\DeclareCiteCommand{\parencite}
  [\mkbibbrackets]  % replaces parentheses with square brackets
  {\usebibmacro{prenote}}
  {\usebibmacro{citeindex}%
   \usebibmacro{cite}}
  {\multicitedelim}
  {\usebibmacro{postnote}}

\addbibresource{discovering-epidemiology.bibtex}