BibLaTeX references across time and cultures#
I was writing paper in English so ideally there would have been solid English language references, but I found
this was not so for my topic.
This is for LaTeX authors who need to include references with unusual requirements, especially non-latin
scripts, non-English references, rare scripts and ancient documents. In some references I had all three,
meaning in my case:
| Category |
Range in my references |
| Languages |
Chinese, Sanscrit, Spanish, Arabic |
| Eras |
ancient (3400BCE), modern ancient (900CE), old (1898) |
| Right-to-left |
Arabic |
| Dates |
precise, approximate, and ranges |
I had no previous experience of publishing an extensive research piece with this sort of variation. No
computer system really handles it well, and as content is increasingly created in non-latin scripts,
latin-centric software is showing its flaws. LaTeX has responded with the modern LuaTeX
project which demonstrates how robust this 50 year old software is.
The way LaTeX and BibLaTeX work is that you set up the environment in your .tex file, and then adopt
conventions in your .bib file which match what the envionment is looking for.
The following is set out with .bib first and .tex second, so you can see what I achieved before you
follow how I did it. Your paper may have different requirements.
Authors and titles#
For authors and titles with non-latin characters, always use this form in your biblatex file:
1
2
3
|
author = {张伟},
shortauthor = {Zhang Wei},
nameaddon = {Zhang Wei},
|
The two identical English approximations are used differently by Biblatex: shortauthor is rendered in the main text eg:
Leith residents built a big wall[Zhang Wei 2025]
but for the same entry nameaddon is rendered in the bibliography eg:
1
|
Zhang Wei 张伟. Building the Great Wall of Corstophine.
|
this really matters when things get more complicated, as you’ll see.
Latinisation#
Where there are latinised versions of Chinese/Arabic/Sanscrit author
names, they must appear in the nameaddon field. This wasn’t a problem in the
example above because the only name used is the English approximation. However
is often a latinised version that retains features of the original which English
cannot express. For example:
1
2
3
|
author = {鲁迅},
shortauthor = {Lu Xun}, <-- widely used English approximation
nameaddon = {Lǔ Xùn}, <-- latinised equivalent (in this case pinyin)
|
There is maybe 100 or so latinisation systems for encoding non-latin languages. Here are some examples:
| Script/Language |
Romanization System |
| Chinese |
Pinyin |
| Arabic |
Latin-i harakat |
| Japanese |
Hepburn romanization (Hebon-shiki) |
| Sanskrit |
IAST (International Alphabet of Sanskrit Transliteration) |
| Korean |
Revised Romanization of Korean |
| Russian |
BGN/PCGN romanization |
| Thai |
RTGS (Royal Thai General System) |
| Serbian |
Gaj’s Latin alphabet |
These latinisations are for people who prefer to use latin scripts when writing in
(for example) Arabic or Chinese. Even though the script is latinised these systems
still often require a specific font installed due to accents. The best (but not
necessarily most practical) answer is always to use the original script, as
demonstrated by the number of different Latinisation systems. For example Arabic has
three main systems: DIN, ALA-LC and Hans Wehr, with the least offensive everyday
term for all of these being Latin-i harakat. For everyday usage by Arabic langauge
speakers who wish to use a latinised script there is yet another system, a kind of
text-speak which is different again from the other three.
In a similar way, Japanese has both Hebon-shiki (called Hepburn in English) and
also Kunrei-shiki, while Korean has two latinisations, Mandang has three N’ko
latinisations and so on.
There is a similar but slightly different trick for handling latinisations in
titles:
1
2
|
title = {كتاب الحاوي في الطب},
titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, <-- Latin-i harakat
|
Translations#
There is a problem in the Arabic title given above, because while the titleaddon
contains the official latinised script it still lacks an English translation. It’s great
to have the latin script so you know what to search for if you don’t read Arabic
characters (you can still copy/paste Arabic and that can be essential, but even in
2026 some computer systems still don’t handle Arabic very well.). So if there is
an English translation of a title or an author, it is helpful to add it.
In this case the translated title should be in the note field, as follows:
1
|
note = {Translated as: The Comprehensive Book of Medicine}
|
This isn’t just for Arabic, the same is true for Chinese. Chinese is a great
example of this difference: pinyin latin equivalents are often supplied, but this
is not a translation. Note also that there are often many ways to translate a given
title to English, so (as in the case of this current bibliograpy) where
translations are few, partial or obscure, the English translation of the title/author
may be so misleading to readers you are better off using the original. Even if you
have no knowledge of the language and don’t read the script, a search engine is
more likely to find information about a rarely-translated author if you use their
native Chinese/Arabic/etc. name.
Right-to-left scripts#
In the case of left-to-right scripts (which is the default for CJK and latin languages)
then the above conventions will work. These conventions may seem as though they work for
Arabic, but there is still a problem due to script being written right-to-left. Biber
detects the arabic text and switches to right-to-left so that the Arabic script is correct,
unfortunately it also switches all text in the reference including latin
characters whether for English or latinisation. So an Arabic reference containing
a latin field, as all of them normally do, will have latin fields rendered like this:
1
2
3
|
'Medicine of Book Comprehensive The' or
'enicideM fo kooB eviseneherpmoC ehT'
|
depending on context. To fix this, set the default language in the preamble to a left-to-right
language such as ‘british’ (as in this bibliography) or ‘chinese’. Then preserve
the Arabic text in the reference exactly how it is written by enclosing it in
double braces like this:
1
|
title = \textarabic{{كتاب الحاوي في الطب}},
|
Preservation#
Preservation with double braces is useful elsewhere too. Another common problem is that many latinised scripts
contain special characters requiring escaping with double braces like this Arabic example:
1
2
3
|
publisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}
^ ^ ^
\-----------\--------\--- breaks biblatex without {{double}}
|
The double quotes preserve the string exactly as written. Another example is:
1
|
author = {Dundee Museum}
|
will render ‘Museum, Dundee’, unless you say
1
|
author = {{Dundee Museum}}
|
Dates#
Dates use the EDTF (ISO8601-2) standard. BibLaTeX handles BCE/CE dates correctly but also
avoids prefixes when it is pointless or distracting. The full syntax for dates is
in the BibLaTeX user manual: https://ctan.org/macros/latex/contrib/biblatex/doc/biblatex.pdf
Long lists of authors#
for very long lists of authors (such as [Meisner2024] in this bibliography) include all authors
separated by ‘and’, rather than saying ‘, and others’. Biber has been setup in the
preamble with max/mincitenames and maxbibnames so that it will say ’et. al.’ in the
text but include all names in the bibliography.
Full example#
Here is a fictional example in full, handling both Chinese and right-to-left Arabic, with
translations in latin script and correct use of -addon and note fields.
1
2
3
4
5
6
7
|
author = {冷開泰}, name
shortauthor = {Leng Kaitai}, English approximation
nameaddon = {Lěng Kǎitài}, correctly latinised
title = \textarabic{{كتاب الحاوي في الطب}}, Right-left preserved
titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, latinised
note = {Translated as: The Comprehensive Book of Medicine},English translation
publisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}} latinised, with unsafe quotes
|
NB ‘authoraddon’ is not a valid field name, although it would seem logical that it
would be instead of shortauthor.
Entire real-world references look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
@book{sushruta_samhita_1907,
title = {The {Suśruta-Saṃhitā}},
titleaddon = {\textsanskrit{सुश्रुतसंहिता}},
author = {Suśruta (composite work)},
nameaddon = {\textsanskrit{सुश्रुत}},
translator = {Bhishagratna, Kaviraj Kunja Lal},
date = {1907},
origdate = {-0599~/-0499~},
publisher = {Calcutta},
url = {https://wellcomecollection.org/works/vnqskk8w/items?canvas=98&manifest=2},
note = {English translation of the original Sanskrit text (circa 600 BCE--500
BCE), including discussion on transmissibility. The
\href{https://www.wisdomlib.org/hinduism/book/sushruta-samhita-volume-2-nidanasthana/d/doc142863.html}
{Wisdom Library translation} appears to be similar.},
keywords = {ancient},
}
|
LaTeX setup for this bibliography#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
|
%%
%% Font setup for Latin and Chinese
%%
% The following Chinese font support requires an exact font name match Eg on my Linux:
% I install the package adobe-source-han-serif-cn-fonts, followed by 'fc-list | grep "Han Serif"'.
% For reference, on my system 'fc-list | grep Han' gives 136 lines, because
% I installed all the Adobe CJK fonts (Chinese, Japanese, Korean) as recommended by CJK
% experts.
% The order of package loading really matters because some of the references use bidi
% (bi-directional) text in order to display the relevant Arabic, for which there is no
% translation to a Western Language. bidi was a retrofit onto latex and is a bit sensitive.
% If bidi wasn't needed, packages could be loaded in any order. These problems are steadily
% reducing as lualatex is developed. Lualatex is really quite an impressive redevelopment.
% Maths comes first in a bidi world
\usepackage{amsmath}
\usepackage{amssymb}
% Fonts next for bidi ordering reasons
\usepackage{fontspec}
\usepackage{luatexja-fontspec} % CJK handling (not just ja). No equivalent needed for other languages.
% Not needed at all except for bidi. It "stabilises arrays for bidi" according to experts.
% I don't understand but it did make errors go away.
\usepackage{array}
% Polyglossia is needed to do language-aware hyphenating, date formats, quote style etc
% in at least the csquotes and biblatex packages. Replacement for the older babel package.
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{arabic}
\setotherlanguage{sanskrit}
% no \setotherlanguage above for chinese, because luacjk handles this and we don't want
% panglossia and luacjk to get into a fight about who captures the incoming CJK unicode.
% This potential clash is a recurring theme in this preamble.
\newfontfamily\arabicfont[Script=Arabic]{Amiri}
\newfontfamily\arabicfonttt[Script=Arabic]{Amiri}
\newfontfamily\chinesefont{Source Han Serif SC}[
Renderer=Harfbuzz,
Script=CJK,
AutoFakeSlant=0.2, % CJK doesn't have italics, so this does a bit of judicious tilting
AutoFakeBold=2
]
\ltjsetparameter{jacharrange={-1}} % Prevents luatexja from being too aggressive and
% seizing all Unicode text that might be CJK, including parts of biblatex references
% that are merely adjacent to CJK text.
\setmainjfont{Source Han Serif SC}[
Index=2,
Renderer=Harfbuzz,
AutoFakeSlant=0.2,
CharacterWidth=Full, % Forces better mapping of CJK punctuation
BoldFont={* Bold} % Explicitly point to the bold weight so Biber knows it exists
]
% The Index=2 above is about mandating which version of a font to pick inside a TrueType collection.
% In this case, 2 is Simplified Chinese. Latex sometimes gets confused and picks (say)
% the Japanese version, so we are explicit. 0=Japanese, 1=Korean, 2=SC, 3=TC.
% Now repeat the above, only for sans not gothic. This is a trick, the point being
% that if latex wants to use a Han sans font, it will now use gothic instead. Reduces errors
% and the result seems good. Need to check with a CJK expert.
\setsansjfont{Source Han Serif SC}[
Index=2,
Renderer=Harfbuzz,
AutoFakeSlant=0.2,
CharacterWidth=Full,
BoldFont={* Bold}
]
% The above three commands (\ltjsetparameter, \setmainjfont, \setsansjfont) collectively
% avoid hundreds of warnings about missing fonts, and emit a better quality result.
\newfontfamily\devanagarifont[
Script=Devanagari,
HyphenChar=None, % Explicitly disable hyphenation, otherwise bibtex warns it can't load hyphenation rules
ItalicFont={Noto Serif Devanagari}, % This script doesn't have italics or bold so we map them.
BoldItalicFont={Noto Serif Devanagari Bold} % Unlike CJK where we fake slant. Also stops biblatex warnings.
]{Noto Serif Devanagari}
\setmainfont{TeX Gyre Pagella}
\newfontfamily\bigquotefont{TeX Gyre Cursor} % used for big block quotes
% The Gyre project (https://www.gust.org.pl/projects/e-foundry/tex-gyre/index_html)
% explains it all, but basically these are TeX and OpenType font families similar to
% the well-known commercial fonts with similar names, with significantly more functionality.
% On my Linux I installed the package tex-gyre-fonts .
% Disable all CJK small caps attempts. CJK doesn't have smallcaps in the fonts,
% but biber's default is smallcaps for authors. It generates a warning when it can't
% and this avoids large numbers of warnings.
\let\scshape\upshape
\let\textsc\textup
%%
%% Referencing and quoting setup
%%
\usepackage{authblk}
\usepackage[
backend=biber,
style=authoryear, % alternatives include numeric, apa, etc.
doi=true,
url=true,
isbn=false,
datecirca=true, % prints "circa" if date is followed by a slash /. Don't use tilda ~ convention.
dateera=secular, % handles BCE and CE by printing "BCE/CE"
dateeraauto=1600, % adds CE/BCE to anything before this year CE
backref=true, % great idea, but sometimes gets confused with the preview feature
dateabbrev=false,
language=auto, % will change language according to langid, if present in an entry
autolang=other, % Use polyglossia/babel environments (but I am unsure why I need to set it)
maxcitenames=2, % Keep citations short: (Zhang et al., 2024)
mincitenames=1,
maxbibnames=99, % List all authors in the bibliography (who doesn't? rude!)
uniquelist=false % Prevents BibLaTeX from adding names to disambiguate
]{biblatex}
% let long DOIs and URLs in bibliography break, avoiding overfull and other errors,
% and looks nicer.
\setcounter{biburllcpenalty}{7000}
\setcounter{biburlucpenalty}{8000}
\setcounter{biburlnumpenalty}{9000}
% Forces a gap between bib entries that PDF viewers can recognise as a boundary when doing
% a mouseover preview in the main text. Also just makes a bibliography look nicer.
\setlength{\bibitemsep}{1.5\itemsep}
% These mappings make sure biblatex doesn't start translating locale specific things
% like date formats or 'Appendix', 'Bibliography' etc. Other languages are merely content.
% 'british' is equivalent to the modern en_GB locale standard. This also means that the
% default is left-to-right even in an entry containing arabic text enclosed in \textarabic{}.
% This also suppresses error messages from biblatex about 'Language not supported'.
\DeclareLanguageMapping{arabic}{british}
\DeclareLanguageMapping{chinese}{british}
\DeclareLanguageMapping{sanskrit}{british}
% Force always printing 'nameaddon' after the author name. I use nameaddon exclusively for latin versions of
% Chinese (etc) names, so the effect is to render the real, untranslated name in the
% references. See notes at top of biblatex file for details of translation in 'note' field,
% and the special case of right-to-left arabic script in author names. This macro has
% completely replaced the authoryear macro, so it also made dates vanish until I added back here.
% When forcing printing of nameaddon with here, we must use the nameaddon field not shortauthor.
\AtBeginDocument{
\renewbibmacro*{author}{
\printfield{nameaddon}
\setunit{\addspace}
\printnames{author}
\setunit{\addspace}
% This macro handles the label (derived from the 'date' field)
% while respecting BCE/CE and circa formatting according to EDTF (ISO8601-2) dates.
% Modern lualatex tries to be standards-compliant.
\usebibmacro{date+extradate}
}
}
% Make biblatex do quotation handling as expected for a paper
\usepackage{csquotes}
% The following macro seems to approximate the style I see in academic papers.
\DeclareCiteCommand{\parencite}
[\mkbibbrackets] % replaces parentheses with square brackets
{\usebibmacro{prenote}}
{\usebibmacro{citeindex}%
\usebibmacro{cite}}
{\multicitedelim}
{\usebibmacro{postnote}}
\addbibresource{discovering-epidemiology.bibtex}
|