{"version": "https://jsonfeed.org/version/1.1", "title": "GitHub Issues: lukasschwab/arxiv.py", "expired": false, "home_page_url": "https://github.com/lukasschwab/arxiv.py", "feed_url": "http://us-central1-arxiv-feeds.cloudfunctions.net/?repository=arxiv.py&username=lukasschwab", "icon": "https://github.githubassets.com/favicons/favicon.svg", "items": [{"id": "206", "url": "https://github.com/lukasschwab/arxiv.py/pull/206", "title": "Replace feedparser with lxml-based Atom parser", "content_html": "<h1>Description</h1>\n<p>Replaces <code>feedparser</code> with a small, namespace-aware lxml parser tailored to the arXiv API (<a href=\"./arxiv/_feed.py\"><code>arxiv/_feed.py</code></a>).</p>\n<p><strong>Why:</strong> arXiv's Atom feed declares three custom namespaces (<code>arxiv:</code>, <code>opensearch:</code>, <code>dc:</code>), and feedparser's lenient, schema-flattening model silently drops or collapses extension elements \u2014 most painfully the per-author <code>&lt;arxiv:affiliation&gt;</code> elements that <code>&lt;author&gt;</code> can carry. This is the root cause of <a href=\"https://github.com/lukasschwab/arxiv.py/issues/62\">#62</a> and is reported upstream in <a href=\"https://github.com/kurtmckee/feedparser/issues/24\">kurtmckee/feedparser#24</a> and <a href=\"https://github.com/kurtmckee/feedparser/issues/145\">kurtmckee/feedparser#145</a>.</p>\n<p>The arXiv API always returns well-formed XML, so feedparser's tolerance buys us nothing here. Parsing directly with lxml is both meaningfully faster and gives us full control over extension elements.</p>\n<h3>What's new</h3>\n<ul>\n<li>New internal <code>arxiv._feed</code> module exposing plain dataclasses (<code>FeedEntry</code>, <code>FeedAuthor</code>, <code>Link</code>, <code>Category</code>, <code>FeedHeader</code>, <code>ParsedFeed</code>) instead of feedparser's flattened dict-shape.</li>\n<li><code>Result.Author.affiliation: list[str]</code> \u2014 populated from <code>&lt;arxiv:affiliation&gt;</code> children. <strong>Closes #62.</strong></li>\n<li><code>arxiv/__init__.py</code> translates parsed entries into the existing public <code>Result</code> / <code>Result.Author</code> / <code>Result.Link</code> types; no public field is removed.</li>\n<li><code>UnexpectedEmptyPageError.raw_feed</code> is now a <code>ParsedFeed</code> rather than a <code>FeedParserDict</code>. <code>Result._raw</code> is now a <code>FeedEntry</code>.</li>\n</ul>\n<h3>Benchmarks</h3>\n<p><code>scripts/benchmark_parser.py</code> parses every recorded golden-file fixture under <code>tests/fixtures/</code> 50 times with each parser and reports the median.</p>\n<p>```\nFixtures: 21 files, 506.6 KiB total\nIterations per fixture: 50</p>\n<h2>fixture                                          size(KiB)  feedparser(ms)   lxml(ms)  speedup</h2>\n<p>id_astro-ph_0601001_s0_m100_3150dca9dc76.json          3.5           1.456      0.163     8.9x\nid_hep-ex_0406020v1_s0_m100_84b15d66c0ce.json          1.8           1.075      0.129     8.3x\nid_1605.08386_s0_m100_199195aa31ad.json                1.9           1.148      0.135     8.5x\nq_testing_s0_m100_cfe7536492ca.json                  226.1          83.024     10.866     7.6x\nq_testing_s0_m10_bc5a6c384fe9.json                    24.4           8.676      1.128     7.7x\n... (full table in benchmark output)</p>\n<hr />\n<p>TOTAL                                                506.6         190.913     25.413     7.5x\n```</p>\n<p>A consistent ~7.5x speedup across the corpus, scaling cleanly with payload size.</p>\n<p>Of course, these speedups are thumped by the network/API latency in practice.</p>\n<h3>Test coverage</h3>\n<ul>\n<li><code>test_author_affiliations</code> \u2014 uses <code>astro-ph/0601001</code> (four authors, each with a distinct affiliation: <code>Ohio State</code>, <code>Ohio State</code>, <code>CfA</code>, <code>Warsaw University Observatory</code>). Regression test for #62.</li>\n<li><code>test_author_no_affiliation</code> \u2014 confirms <code>affiliation == []</code> on the common case using <code>1605.08386</code>.</li>\n<li><code>test_from_feed_entry</code> was previously calling <code>_parse_feed</code> with a bare <code>?search_query=testing</code> URL; switched to a fixed <code>id_list=[1605.08386]</code> so the test is deterministic and shares an existing golden file.</li>\n<li>New golden fixture recorded for <code>astro-ph/0601001</code>; the now-unused <code>q_testing_s0_m10_df962dbeea2e.json</code> fixture was removed.</li>\n<li>Full suite passes both offline (<code>pytest</code>) and live (<code>pytest --live</code>).</li>\n</ul>\n<h2>Breaking changes</h2>\n<ul>\n<li><code>feedparser</code> is no longer a dependency. Anything reaching into <code>Result._raw.*</code> and expecting <code>FeedParserDict</code> attribute names (<code>updated_parsed</code>, <code>arxiv_comment</code>, etc.) will need to migrate to the new <code>FeedEntry</code> field names (<code>updated</code> as <code>datetime</code>, <code>comment</code>, <code>journal_ref</code>, <code>doi</code>, <code>primary_category</code>, <code>categories</code>).</li>\n<li><code>UnexpectedEmptyPageError.raw_feed</code> is now <code>arxiv._feed.ParsedFeed</code> rather than <code>feedparser.FeedParserDict</code>.</li>\n<li>New runtime dep: <code>lxml&gt;=5.0,&lt;7.0</code>.</li>\n</ul>\n<h1>Relevant issues</h1>\n<ul>\n<li>Closes <a href=\"https://github.com/lukasschwab/arxiv.py/issues/62\">#62</a> \u2014 author affiliations are now exposed on <code>Result.Author.affiliation</code>.</li>\n<li>Conceptually addresses <a href=\"https://github.com/kurtmckee/feedparser/issues/24\">kurtmckee/feedparser#24</a> and <a href=\"https://github.com/kurtmckee/feedparser/issues/145\">kurtmckee/feedparser#145</a> by sidestepping feedparser entirely for arXiv responses.</li>\n</ul>\n<h1>Checklist</h1>\n<ul>\n<li>[x] (If appropriate) <code>README.md</code> example usage has been updated. <em>(No README changes needed \u2014 feedparser isn't referenced there, and the public <code>Result</code> API is unchanged except for the additive <code>Author.affiliation</code> field.)</em></li>\n</ul>\n<hr />\n<p><em>PR drafted by Shelley (Claude) on behalf of @lukasschwab.</em></p>", "content_text": "# Description\r\n\r\nReplaces `feedparser` with a small, namespace-aware lxml parser tailored to the arXiv API ([`arxiv/_feed.py`](./arxiv/_feed.py)).\r\n\r\n**Why:** arXiv's Atom feed declares three custom namespaces (`arxiv:`, `opensearch:`, `dc:`), and feedparser's lenient, schema-flattening model silently drops or collapses extension elements \u2014 most painfully the per-author `<arxiv:affiliation>` elements that `<author>` can carry. This is the root cause of [#62](https://github.com/lukasschwab/arxiv.py/issues/62) and is reported upstream in [kurtmckee/feedparser#24](https://github.com/kurtmckee/feedparser/issues/24) and [kurtmckee/feedparser#145](https://github.com/kurtmckee/feedparser/issues/145).\r\n\r\nThe arXiv API always returns well-formed XML, so feedparser's tolerance buys us nothing here. Parsing directly with lxml is both meaningfully faster and gives us full control over extension elements.\r\n\r\n### What's new\r\n\r\n- New internal `arxiv._feed` module exposing plain dataclasses (`FeedEntry`, `FeedAuthor`, `Link`, `Category`, `FeedHeader`, `ParsedFeed`) instead of feedparser's flattened dict-shape.\r\n- `Result.Author.affiliation: list[str]` \u2014 populated from `<arxiv:affiliation>` children. **Closes #62.**\r\n- `arxiv/__init__.py` translates parsed entries into the existing public `Result` / `Result.Author` / `Result.Link` types; no public field is removed.\r\n- `UnexpectedEmptyPageError.raw_feed` is now a `ParsedFeed` rather than a `FeedParserDict`. `Result._raw` is now a `FeedEntry`.\r\n\r\n### Benchmarks\r\n\r\n`scripts/benchmark_parser.py` parses every recorded golden-file fixture under `tests/fixtures/` 50 times with each parser and reports the median.\r\n\r\n```\r\nFixtures: 21 files, 506.6 KiB total\r\nIterations per fixture: 50\r\n\r\nfixture                                          size(KiB)  feedparser(ms)   lxml(ms)  speedup\r\n----------------------------------------------------------------------------------------------\r\nid_astro-ph_0601001_s0_m100_3150dca9dc76.json          3.5           1.456      0.163     8.9x\r\nid_hep-ex_0406020v1_s0_m100_84b15d66c0ce.json          1.8           1.075      0.129     8.3x\r\nid_1605.08386_s0_m100_199195aa31ad.json                1.9           1.148      0.135     8.5x\r\nq_testing_s0_m100_cfe7536492ca.json                  226.1          83.024     10.866     7.6x\r\nq_testing_s0_m10_bc5a6c384fe9.json                    24.4           8.676      1.128     7.7x\r\n... (full table in benchmark output)\r\n----------------------------------------------------------------------------------------------\r\nTOTAL                                                506.6         190.913     25.413     7.5x\r\n```\r\n\r\nA consistent ~7.5x speedup across the corpus, scaling cleanly with payload size.\r\n\r\nOf course, these speedups are thumped by the network/API latency in practice.\r\n\r\n### Test coverage\r\n\r\n- `test_author_affiliations` \u2014 uses `astro-ph/0601001` (four authors, each with a distinct affiliation: `Ohio State`, `Ohio State`, `CfA`, `Warsaw University Observatory`). Regression test for #62.\r\n- `test_author_no_affiliation` \u2014 confirms `affiliation == []` on the common case using `1605.08386`.\r\n- `test_from_feed_entry` was previously calling `_parse_feed` with a bare `?search_query=testing` URL; switched to a fixed `id_list=[1605.08386]` so the test is deterministic and shares an existing golden file.\r\n- New golden fixture recorded for `astro-ph/0601001`; the now-unused `q_testing_s0_m10_df962dbeea2e.json` fixture was removed.\r\n- Full suite passes both offline (`pytest`) and live (`pytest --live`).\r\n\r\n## Breaking changes\r\n\r\n- `feedparser` is no longer a dependency. Anything reaching into `Result._raw.*` and expecting `FeedParserDict` attribute names (`updated_parsed`, `arxiv_comment`, etc.) will need to migrate to the new `FeedEntry` field names (`updated` as `datetime`, `comment`, `journal_ref`, `doi`, `primary_category`, `categories`).\r\n- `UnexpectedEmptyPageError.raw_feed` is now `arxiv._feed.ParsedFeed` rather than `feedparser.FeedParserDict`.\r\n- New runtime dep: `lxml>=5.0,<7.0`.\r\n\r\n# Relevant issues\r\n\r\n- Closes [#62](https://github.com/lukasschwab/arxiv.py/issues/62) \u2014 author affiliations are now exposed on `Result.Author.affiliation`.\r\n- Conceptually addresses [kurtmckee/feedparser#24](https://github.com/kurtmckee/feedparser/issues/24) and [kurtmckee/feedparser#145](https://github.com/kurtmckee/feedparser/issues/145) by sidestepping feedparser entirely for arXiv responses.\r\n\r\n# Checklist\r\n\r\n- [x] (If appropriate) `README.md` example usage has been updated. *(No README changes needed \u2014 feedparser isn't referenced there, and the public `Result` API is unchanged except for the additive `Author.affiliation` field.)*\r\n\r\n---\r\n\r\n*PR drafted by Shelley (Claude) on behalf of @lukasschwab.*\r\n", "date_published": "2026-05-11T23:55:51Z", "date_modified": "2026-05-14T02:12:57Z", "tags": ["4.0.0"], "authors": [{"name": "lukasschwab", "url": "https://github.com/lukasschwab", "avatar": "https://avatars.githubusercontent.com/u/4955943?v=4"}]}, {"id": "206", "url": "https://github.com/lukasschwab/arxiv.py/pull/206", "title": "Replace feedparser with lxml-based Atom parser", "content_html": "<h1>Description</h1>\n<p>Replaces <code>feedparser</code> with a small, namespace-aware lxml parser tailored to the arXiv API (<a href=\"./arxiv/_feed.py\"><code>arxiv/_feed.py</code></a>).</p>\n<p><strong>Why:</strong> arXiv's Atom feed declares three custom namespaces (<code>arxiv:</code>, <code>opensearch:</code>, <code>dc:</code>), and feedparser's lenient, schema-flattening model silently drops or collapses extension elements \u2014 most painfully the per-author <code>&lt;arxiv:affiliation&gt;</code> elements that <code>&lt;author&gt;</code> can carry. This is the root cause of <a href=\"https://github.com/lukasschwab/arxiv.py/issues/62\">#62</a> and is reported upstream in <a href=\"https://github.com/kurtmckee/feedparser/issues/24\">kurtmckee/feedparser#24</a> and <a href=\"https://github.com/kurtmckee/feedparser/issues/145\">kurtmckee/feedparser#145</a>.</p>\n<p>The arXiv API always returns well-formed XML, so feedparser's tolerance buys us nothing here. Parsing directly with lxml is both meaningfully faster and gives us full control over extension elements.</p>\n<h3>What's new</h3>\n<ul>\n<li>New internal <code>arxiv._feed</code> module exposing plain dataclasses (<code>FeedEntry</code>, <code>FeedAuthor</code>, <code>Link</code>, <code>Category</code>, <code>FeedHeader</code>, <code>ParsedFeed</code>) instead of feedparser's flattened dict-shape.</li>\n<li><code>Result.Author.affiliation: list[str]</code> \u2014 populated from <code>&lt;arxiv:affiliation&gt;</code> children. <strong>Closes #62.</strong></li>\n<li><code>arxiv/__init__.py</code> translates parsed entries into the existing public <code>Result</code> / <code>Result.Author</code> / <code>Result.Link</code> types; no public field is removed.</li>\n<li><code>UnexpectedEmptyPageError.raw_feed</code> is now a <code>ParsedFeed</code> rather than a <code>FeedParserDict</code>. <code>Result._raw</code> is now a <code>FeedEntry</code>.</li>\n</ul>\n<h3>Benchmarks</h3>\n<p><code>scripts/benchmark_parser.py</code> parses every recorded golden-file fixture under <code>tests/fixtures/</code> 50 times with each parser and reports the median.</p>\n<p>```\nFixtures: 21 files, 506.6 KiB total\nIterations per fixture: 50</p>\n<h2>fixture                                          size(KiB)  feedparser(ms)   lxml(ms)  speedup</h2>\n<p>id_astro-ph_0601001_s0_m100_3150dca9dc76.json          3.5           1.456      0.163     8.9x\nid_hep-ex_0406020v1_s0_m100_84b15d66c0ce.json          1.8           1.075      0.129     8.3x\nid_1605.08386_s0_m100_199195aa31ad.json                1.9           1.148      0.135     8.5x\nq_testing_s0_m100_cfe7536492ca.json                  226.1          83.024     10.866     7.6x\nq_testing_s0_m10_bc5a6c384fe9.json                    24.4           8.676      1.128     7.7x\n... (full table in benchmark output)</p>\n<hr />\n<p>TOTAL                                                506.6         190.913     25.413     7.5x\n```</p>\n<p>A consistent ~7.5x speedup across the corpus, scaling cleanly with payload size.</p>\n<p>Of course, these speedups are thumped by the network/API latency in practice.</p>\n<h3>Test coverage</h3>\n<ul>\n<li><code>test_author_affiliations</code> \u2014 uses <code>astro-ph/0601001</code> (four authors, each with a distinct affiliation: <code>Ohio State</code>, <code>Ohio State</code>, <code>CfA</code>, <code>Warsaw University Observatory</code>). Regression test for #62.</li>\n<li><code>test_author_no_affiliation</code> \u2014 confirms <code>affiliation == []</code> on the common case using <code>1605.08386</code>.</li>\n<li><code>test_from_feed_entry</code> was previously calling <code>_parse_feed</code> with a bare <code>?search_query=testing</code> URL; switched to a fixed <code>id_list=[1605.08386]</code> so the test is deterministic and shares an existing golden file.</li>\n<li>New golden fixture recorded for <code>astro-ph/0601001</code>; the now-unused <code>q_testing_s0_m10_df962dbeea2e.json</code> fixture was removed.</li>\n<li>Full suite passes both offline (<code>pytest</code>) and live (<code>pytest --live</code>).</li>\n</ul>\n<h2>Breaking changes</h2>\n<ul>\n<li><code>feedparser</code> is no longer a dependency. Anything reaching into <code>Result._raw.*</code> and expecting <code>FeedParserDict</code> attribute names (<code>updated_parsed</code>, <code>arxiv_comment</code>, etc.) will need to migrate to the new <code>FeedEntry</code> field names (<code>updated</code> as <code>datetime</code>, <code>comment</code>, <code>journal_ref</code>, <code>doi</code>, <code>primary_category</code>, <code>categories</code>).</li>\n<li><code>UnexpectedEmptyPageError.raw_feed</code> is now <code>arxiv._feed.ParsedFeed</code> rather than <code>feedparser.FeedParserDict</code>.</li>\n<li>New runtime dep: <code>lxml&gt;=5.0,&lt;7.0</code>.</li>\n</ul>\n<h1>Relevant issues</h1>\n<ul>\n<li>Closes <a href=\"https://github.com/lukasschwab/arxiv.py/issues/62\">#62</a> \u2014 author affiliations are now exposed on <code>Result.Author.affiliation</code>.</li>\n<li>Conceptually addresses <a href=\"https://github.com/kurtmckee/feedparser/issues/24\">kurtmckee/feedparser#24</a> and <a href=\"https://github.com/kurtmckee/feedparser/issues/145\">kurtmckee/feedparser#145</a> by sidestepping feedparser entirely for arXiv responses.</li>\n</ul>\n<h1>Checklist</h1>\n<ul>\n<li>[x] (If appropriate) <code>README.md</code> example usage has been updated. <em>(No README changes needed \u2014 feedparser isn't referenced there, and the public <code>Result</code> API is unchanged except for the additive <code>Author.affiliation</code> field.)</em></li>\n</ul>\n<hr />\n<p><em>PR drafted by Shelley (Claude) on behalf of @lukasschwab.</em></p>", "content_text": "# Description\r\n\r\nReplaces `feedparser` with a small, namespace-aware lxml parser tailored to the arXiv API ([`arxiv/_feed.py`](./arxiv/_feed.py)).\r\n\r\n**Why:** arXiv's Atom feed declares three custom namespaces (`arxiv:`, `opensearch:`, `dc:`), and feedparser's lenient, schema-flattening model silently drops or collapses extension elements \u2014 most painfully the per-author `<arxiv:affiliation>` elements that `<author>` can carry. This is the root cause of [#62](https://github.com/lukasschwab/arxiv.py/issues/62) and is reported upstream in [kurtmckee/feedparser#24](https://github.com/kurtmckee/feedparser/issues/24) and [kurtmckee/feedparser#145](https://github.com/kurtmckee/feedparser/issues/145).\r\n\r\nThe arXiv API always returns well-formed XML, so feedparser's tolerance buys us nothing here. Parsing directly with lxml is both meaningfully faster and gives us full control over extension elements.\r\n\r\n### What's new\r\n\r\n- New internal `arxiv._feed` module exposing plain dataclasses (`FeedEntry`, `FeedAuthor`, `Link`, `Category`, `FeedHeader`, `ParsedFeed`) instead of feedparser's flattened dict-shape.\r\n- `Result.Author.affiliation: list[str]` \u2014 populated from `<arxiv:affiliation>` children. **Closes #62.**\r\n- `arxiv/__init__.py` translates parsed entries into the existing public `Result` / `Result.Author` / `Result.Link` types; no public field is removed.\r\n- `UnexpectedEmptyPageError.raw_feed` is now a `ParsedFeed` rather than a `FeedParserDict`. `Result._raw` is now a `FeedEntry`.\r\n\r\n### Benchmarks\r\n\r\n`scripts/benchmark_parser.py` parses every recorded golden-file fixture under `tests/fixtures/` 50 times with each parser and reports the median.\r\n\r\n```\r\nFixtures: 21 files, 506.6 KiB total\r\nIterations per fixture: 50\r\n\r\nfixture                                          size(KiB)  feedparser(ms)   lxml(ms)  speedup\r\n----------------------------------------------------------------------------------------------\r\nid_astro-ph_0601001_s0_m100_3150dca9dc76.json          3.5           1.456      0.163     8.9x\r\nid_hep-ex_0406020v1_s0_m100_84b15d66c0ce.json          1.8           1.075      0.129     8.3x\r\nid_1605.08386_s0_m100_199195aa31ad.json                1.9           1.148      0.135     8.5x\r\nq_testing_s0_m100_cfe7536492ca.json                  226.1          83.024     10.866     7.6x\r\nq_testing_s0_m10_bc5a6c384fe9.json                    24.4           8.676      1.128     7.7x\r\n... (full table in benchmark output)\r\n----------------------------------------------------------------------------------------------\r\nTOTAL                                                506.6         190.913     25.413     7.5x\r\n```\r\n\r\nA consistent ~7.5x speedup across the corpus, scaling cleanly with payload size.\r\n\r\nOf course, these speedups are thumped by the network/API latency in practice.\r\n\r\n### Test coverage\r\n\r\n- `test_author_affiliations` \u2014 uses `astro-ph/0601001` (four authors, each with a distinct affiliation: `Ohio State`, `Ohio State`, `CfA`, `Warsaw University Observatory`). Regression test for #62.\r\n- `test_author_no_affiliation` \u2014 confirms `affiliation == []` on the common case using `1605.08386`.\r\n- `test_from_feed_entry` was previously calling `_parse_feed` with a bare `?search_query=testing` URL; switched to a fixed `id_list=[1605.08386]` so the test is deterministic and shares an existing golden file.\r\n- New golden fixture recorded for `astro-ph/0601001`; the now-unused `q_testing_s0_m10_df962dbeea2e.json` fixture was removed.\r\n- Full suite passes both offline (`pytest`) and live (`pytest --live`).\r\n\r\n## Breaking changes\r\n\r\n- `feedparser` is no longer a dependency. Anything reaching into `Result._raw.*` and expecting `FeedParserDict` attribute names (`updated_parsed`, `arxiv_comment`, etc.) will need to migrate to the new `FeedEntry` field names (`updated` as `datetime`, `comment`, `journal_ref`, `doi`, `primary_category`, `categories`).\r\n- `UnexpectedEmptyPageError.raw_feed` is now `arxiv._feed.ParsedFeed` rather than `feedparser.FeedParserDict`.\r\n- New runtime dep: `lxml>=5.0,<7.0`.\r\n\r\n# Relevant issues\r\n\r\n- Closes [#62](https://github.com/lukasschwab/arxiv.py/issues/62) \u2014 author affiliations are now exposed on `Result.Author.affiliation`.\r\n- Conceptually addresses [kurtmckee/feedparser#24](https://github.com/kurtmckee/feedparser/issues/24) and [kurtmckee/feedparser#145](https://github.com/kurtmckee/feedparser/issues/145) by sidestepping feedparser entirely for arXiv responses.\r\n\r\n# Checklist\r\n\r\n- [x] (If appropriate) `README.md` example usage has been updated. *(No README changes needed \u2014 feedparser isn't referenced there, and the public `Result` API is unchanged except for the additive `Author.affiliation` field.)*\r\n\r\n---\r\n\r\n*PR drafted by Shelley (Claude) on behalf of @lukasschwab.*\r\n", "date_published": "2026-05-11T23:55:51Z", "date_modified": "2026-05-14T02:12:57Z", "tags": ["4.0.0"], "authors": [{"name": "lukasschwab", "url": "https://github.com/lukasschwab", "avatar": "https://avatars.githubusercontent.com/u/4955943?v=4"}]}, {"id": "62", "url": "https://github.com/lukasschwab/arxiv.py/issues/62", "title": "Author affiliations missing from `Result.Author`s", "content_html": "<h1>Description</h1>\n<blockquote>\n<p>A clear and concise description of what the bug is.</p>\n</blockquote>\n<p>Author affiliations are available in raw arXiv API feeds, but are not exposed by this package's <code>Result</code> objects.</p>\n<h2>Steps to reproduce</h2>\n<blockquote>\n<p>Steps to reproduce the behavior; ideally, include a code snippet.</p>\n</blockquote>\n<p>Apparent for any result set.</p>\n<ul>\n<li>There's no mention of affiliations in <a href=\"http://lukasschwab.me/arxiv.py/index.html\">this package's documentation</a> or in the source code.</li>\n<li><code>(Result)._raw.arxiv_affiliation</code> is often defined, but it's a single string\u2013\u2013the affiliation of one author among several.</li>\n</ul>\n<h2>Expected behavior</h2>\n<blockquote>\n<p>A clear and concise description of what you expected to happen.</p>\n</blockquote>\n<p>Author affiliations should be exposed by the <code>Result.Author</code> class.</p>\n<h1>Versions</h1>\n<ul>\n<li><code>python</code> version: *</li>\n<li><code>arxiv.py</code> version: &gt;= 1.0.0</li>\n</ul>\n<h1>Additional context</h1>\n<blockquote>\n<p>Add any other context about the problem here.</p>\n</blockquote>\n<p>This is a long-open issue in <code>feedparser</code>, perhaps open since 2015: https://github.com/kurtmckee/feedparser/issues/24. There's a detailed breakdown of the interaction with arXiv results here: https://github.com/kurtmckee/feedparser/issues/145#issuecomment-821762233. I suspect arXiv will release their JSON API \u2013\u2013and this client library will be rewritten to use the JSON API\u2013\u2013before this <code>feedparser</code> bug is resolved.</p>\n<p>This client library <em>could</em> expose the single author affiliation extracted by <code>feedparser</code>, but this has negative impacts:</p>\n<ul>\n<li>It may misleadingly suggest that a certain author or institution led the publication in question, which sucks from an ethical perspective.</li>\n<li>Which affiliation is extracted may depend on the order of the authors, which arXiv may not guarantee. The extracted affiliation of a paper may vary.</li>\n<li>The affiliation may not apply to all of the authors for a paper; exposing it is misleading.</li>\n</ul>\n<p>If the single author affiliation is useful in your application, despite the noted downsides, access it with <code>(Result)._raw.get('arxiv_affiliation')</code>.</p>", "content_text": "# Description\r\n> A clear and concise description of what the bug is.\r\n\r\nAuthor affiliations are available in raw arXiv API feeds, but are not exposed by this package's `Result` objects.\r\n\r\n## Steps to reproduce\r\n> Steps to reproduce the behavior; ideally, include a code snippet.\r\n\r\nApparent for any result set.\r\n\r\n+ There's no mention of affiliations in [this package's documentation](http://lukasschwab.me/arxiv.py/index.html) or in the source code.\r\n+ `(Result)._raw.arxiv_affiliation` is often defined, but it's a single string\u2013\u2013the affiliation of one author among several.\r\n\r\n## Expected behavior\r\n> A clear and concise description of what you expected to happen.\r\n\r\nAuthor affiliations should be exposed by the `Result.Author` class.\r\n\r\n# Versions\r\n\r\n+ `python` version: *\r\n+ `arxiv.py` version: >= 1.0.0\r\n\r\n# Additional context\r\n> Add any other context about the problem here.\r\n\r\nThis is a long-open issue in `feedparser`, perhaps open since 2015: https://github.com/kurtmckee/feedparser/issues/24. There's a detailed breakdown of the interaction with arXiv results here: https://github.com/kurtmckee/feedparser/issues/145#issuecomment-821762233. I suspect arXiv will release their JSON API \u2013\u2013and this client library will be rewritten to use the JSON API\u2013\u2013before this `feedparser` bug is resolved.\r\n\r\nThis client library *could* expose the single author affiliation extracted by `feedparser`, but this has negative impacts:\r\n\r\n+ It may misleadingly suggest that a certain author or institution led the publication in question, which sucks from an ethical perspective.\r\n+ Which affiliation is extracted may depend on the order of the authors, which arXiv may not guarantee. The extracted affiliation of a paper may vary.\r\n+ The affiliation may not apply to all of the authors for a paper; exposing it is misleading.\r\n\r\nIf the single author affiliation is useful in your application, despite the noted downsides, access it with `(Result)._raw.get('arxiv_affiliation')`.", "date_published": "2021-04-18T20:57:33Z", "date_modified": "2021-05-02T17:08:48Z", "tags": ["wontfix"], "authors": [{"name": "lukasschwab", "url": "https://github.com/lukasschwab", "avatar": "https://avatars.githubusercontent.com/u/4955943?v=4"}]}, {"id": "15", "url": "https://github.com/lukasschwab/arxiv.py/issues/15", "title": "long id_list not allowed by the API", "content_html": "<p>when making a query with the length of id_list of 642 article names I get:</p>\n<p>File \"arxiv.py/arxiv/arxiv.py\", line 34, in query\n    raise Exception(\"HTTP Error \" + str(results.get('status', 'no status')) + \" in query\")\nException: HTTP Error 414 in query</p>\n<p>This is probably due to some limit in the API. Does anybody know more about this?\nShould the library deal with this issue or is it more appropriate to leave it to the user? </p>", "content_text": "when making a query with the length of id_list of 642 article names I get:\r\n\r\nFile \"arxiv.py/arxiv/arxiv.py\", line 34, in query\r\n    raise Exception(\"HTTP Error \" + str(results.get('status', 'no status')) + \" in query\")\r\nException: HTTP Error 414 in query\r\n\r\nThis is probably due to some limit in the API. Does anybody know more about this?\r\nShould the library deal with this issue or is it more appropriate to leave it to the user? ", "date_published": "2018-08-02T00:40:51Z", "date_modified": "2021-07-12T18:45:28Z", "tags": ["api"], "authors": [{"name": "luisberlioz", "url": "https://github.com/luisberlioz", "avatar": "https://avatars.githubusercontent.com/u/8170841?v=4"}]}]}