Search Results - "Suarez, Pedro Ortiz" :: Katalog Arama

1
Automatic extraction of materials and properties from superconductors scientific literature by Foppiano, Luca, Castro, Pedro Baptista, Ortiz Suarez, Pedro, Terashima, Kensei, Takano, Yoshihiko, Ishii, Masashi

Published in Science and technology of advanced materials. Methods (31-12-2023)
“…The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Semi-automatic staging area for high-quality structured data extraction from scientific literature by Foppiano, Luca, Mato, Tomoya, Terashima, Kensei, Ortiz Suarez, Pedro, Tou, Taku, Sakai, Chikako, Wang, Wei-Sheng, Amagasa, Toshiyuki, Takano, Yoshihiko, Ishii, Masashi

Published in Science and technology of advanced materials. Methods (31-12-2023)
“…We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature,…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets by Kreutzer, Julia, Caswell, Isaac, Wang, Lisa, Wahab, Ahsan, van Esch, Daan, Ulzii-Orshikh, Nasanbayar, Tapo, Allahsera, Subramani, Nishant, Sokolov, Artem, Sikasote, Claytone, Setyawan, Monang, Sarin, Supheakmungkol, Samb, Sokhar, Sagot, Benoît, Rivera, Clara, Rios, Annette, Papadimitriou, Isabel, Osei, Salomey, Suarez, Pedro Ortiz, Orife, Iroro, Ogueji, Kelechi, Rubungo, Andre Niyongabo, Nguyen, Toan Q., Müller, Mathias, Müller, André, Muhammad, Shamsuddeen Hassan, Muhammad, Nanda, Mnyakeni, Ayanda, Mirzakhalov, Jamshidbek, Matangira, Tapiwanashe, Leong, Colin, Lawson, Nze, Kudugunta, Sneha, Jernite, Yacine, Jenny, Mathias, Firat, Orhan, Dossou, Bonaventure F. P., Dlamini, Sakhile, de Silva, Nisansa, Çabuk Ballı, Sakine, Biderman, Stella, Battisti, Alessia, Baruwa, Ahmed, Bapna, Ankur, Baljekar, Pallavi, Azime, Israel Abebe, Awokoya, Ayodele, Ataman, Duygu, Ahia, Orevaoghene, Ahia, Oghenefego, Agrawal, Sweta, Adeyemi, Mofetoluwa

Published in Transactions of the Association for Computational Linguistics (31-01-2022)
“…With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large,…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
Semi-automatic staging area for high-quality structured data extraction from scientific literature by Foppiano, Luca, Mato, Tomoya, Terashima, Kensei, Suarez, Pedro Ortiz, Tou, Taku, Sakai, Chikako, Wang, Wei-Sheng, Amagasa, Toshiyuki, Takano, Yoshihiko, Ishii, Masashi

Published 16-11-2023
“…We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature,…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
5
Automatic extraction of materials and properties from superconductors scientific literature by Foppiano, Luca, de Castro, Pedro Baptista, Suarez, Pedro Ortiz, Terashima, Kensei, Takano, Yoshihiko, Ishii, Masashi

Published 23-11-2022
“…STAM:M, 2023, VOL. 3, NO. 1, 2153633 The automatic extraction of materials and related properties from the scientific literature is gaining attention in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
6
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets by Kreutzer, Julia, Caswell, Isaac, Wang, Lisa, Wahab, Ahsan, van Esch, Daan, Ulzii-Orshikh, Nasanbayar, Tapo, Allahsera, Subramani, Nishant, Sokolov, Artem, Sikasote, Claytone, Setyawan, Monang, Sarin, Supheakmungkol, Samb, Sokhar, Sagot, Benoît, Rivera, Clara, Rios, Annette, Papadimitriou, Isabel, Osei, Salomey, Suarez, Pedro Ortiz, Orife, Iroro, Ogueji, Kelechi, Rubungo, Andre Niyongabo, Nguyen, Toan Q, Müller, Mathias, Müller, André, Muhammad, Shamsuddeen Hassan, Muhammad, Nanda, Mnyakeni, Ayanda, Mirzakhalov, Jamshidbek, Matangira, Tapiwanashe, Leong, Colin, Lawson, Nze, Kudugunta, Sneha, Jernite, Yacine, Jenny, Mathias, Firat, Orhan, Dossou, Bonaventure F. P, Dlamini, Sakhile, de Silva, Nisansa, Ballı, Sakine Çabuk, Biderman, Stella, Battisti, Alessia, Baruwa, Ahmed, Bapna, Ankur, Baljekar, Pallavi, Azime, Israel Abebe, Awokoya, Ayodele, Ataman, Duygu, Ahia, Orevaoghene, Ahia, Oghenefego, Agrawal, Sweta, Adeyemi, Mofetoluwa

Published 21-02-2022
“…Transactions of the Association for Computational Linguistics (2022) 10: 50-72 With the success of large-scale pre-training and multilingual modeling in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
7
Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data by Jansen, Tim, Tong, Yangling, Zevallos, Victoria, Suarez, Pedro Ortiz

Published 20-12-2022
“…As demand for large corpora increases with the size of current state-of-the-art language models, using web data as the main part of the pre-training corpus for…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
8
$Moly\'e: A Corpus-based Approach to Language Contact in Colonial France$
Moly\'e: A Corpus-based Approach to Language Contact in Colonial France by Dent, Rasul, Janès, Juliette, Clérice, Thibault, Suarez, Pedro Ortiz, Sagot, Benoît

Published 08-08-2024
“…Whether or not several Creole languages which developed during the early modern period can be considered genetic descendants of European languages has been the…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
9
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus by Abadji, Julien, Suarez, Pedro Ortiz, Romary, Laurent, Sagot, Benoît

Published 17-01-2022
“…The need for raw large raw corpora has dramatically increased in recent years with the introduction of transfer learning and semi-supervised learning methods…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus by Futeral, Matthieu, Zebaze, Armel, Suarez, Pedro Ortiz, Abadji, Julien, Lacroix, Rémi, Schmid, Cordelia, Bawden, Rachel, Sagot, Benoît

Published 12-06-2024
“…Multimodal Large Language Models (mLLMs) are trained on a large amount of text-image data. While most mLLMs are trained on caption-like data only, Alayrac et…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages by Suárez, Pedro Javier Ortiz, Romary, Laurent, Sagot, Benoît

Published 18-06-2020
“…Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, Online We use the multilingual OSCAR corpus, extracted from…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French by Gabay, Simon, Suarez, Pedro Ortiz, Bartz, Alexandre, Chagué, Alix, Bawden, Rachel, Gambette, Philippe, Sagot, Benoît

Published 18-02-2022
“…Language models for historical states of language are becoming increasingly important to allow the optimal digitisation and analysis of old textual sources…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
Data Processing for the OpenGPT-X Model Family by Brandizzi, Nicolo', Abdelwahab, Hammam, Bhowmick, Anirban, Helmer, Lennard, Stein, Benny Jörg, Denisov, Pavel, Saleem, Qasid, Fromm, Michael, Ali, Mehdi, Rutmann, Richard, Naderi, Farzad, Agy, Mohamad Saif, Schwirjow, Alexander, Küch, Fabian, Hahn, Luzian, Ostendorff, Malte, Suarez, Pedro Ortiz, Rehm, Georg, Wegener, Dennis, Flores-Herr, Nicolas, Köhler, Joachim, Leveling, Johannes

Published 11-10-2024
“…This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
CamemBERT: a Tasty French Language Model by Martin, Louis, Muller, Benjamin, Suárez, Pedro Javier Ortiz, Dupont, Yoann, Romary, Laurent, de la Clergerie, Éric Villemonte, Seddah, Djamé, Sagot, Benoît

Published 21-05-2020
“…Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, Online Pretrained language models are now ubiquitous in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
15
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs by Ali, Mehdi, Fromm, Michael, Thellmann, Klaudia, Ebert, Jan, Weber, Alexander Arno, Rutmann, Richard, Jain, Charvi, Lübbering, Max, Steinigen, Daniel, Leveling, Johannes, Klug, Katrin, Buschhoff, Jasper Schulze, Jurkschat, Lena, Abdelwahab, Hammam, Stein, Benny Jörg, Sylla, Karl-Heinz, Denisov, Pavel, Brandizzi, Nicolo', Saleem, Qasid, Bhowmick, Anirban, Helmer, Lennard, John, Chelsea, Suarez, Pedro Ortiz, Ostendorff, Malte, Jude, Alex, Manjunath, Lalith, Weinbach, Samuel, Penke, Carolin, Filatov, Oleg, Asaadi, Shima, Barth, Fabio, Sifa, Rafet, Küch, Fabian, Herten, Andreas, Jäkel, René, Rehm, Georg, Kesselheim, Stefan, Köhler, Joachim, Flores-Herr, Nicolas

Published 30-09-2024
“…We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
16
Tokenizer Choice For LLM Training: Negligible or Crucial? by Ali, Mehdi, Fromm, Michael, Thellmann, Klaudia, Rutmann, Richard, Lübbering, Max, Leveling, Johannes, Klug, Katrin, Ebert, Jan, Doll, Niclas, Buschhoff, Jasper Schulze, Jain, Charvi, Weber, Alexander Arno, Jurkschat, Lena, Abdelwahab, Hammam, John, Chelsea, Suarez, Pedro Ortiz, Ostendorff, Malte, Weinbach, Samuel, Sifa, Rafet, Kesselheim, Stefan, Flores-Herr, Nicolas

Published 12-10-2023
“…The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
17
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources by McMillan-Major, Angelina, Alyafeai, Zaid, Biderman, Stella, Chen, Kimbo, De Toni, Francesco, Dupont, Gérard, Elsahar, Hady, Emezue, Chris, Aji, Alham Fikri, Ilić, Suzana, Khamis, Nurulaqilla, Leong, Colin, Masoud, Maraim, Soroa, Aitor, Suarez, Pedro Ortiz, Talat, Zeerak, van Strien, Daniel, Jernite, Yacine

Published 24-01-2022
“…In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
18
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset by Laurençon, Hugo, Saulnier, Lucile, Wang, Thomas, Akiki, Christopher, del Moral, Albert Villanova, Scao, Teven Le, Von Werra, Leandro, Mou, Chenghao, Ponferrada, Eduardo González, Nguyen, Huu, Frohberg, Jörg, Šaško, Mario, Lhoest, Quentin, McMillan-Major, Angelina, Dupont, Gerard, Biderman, Stella, Rogers, Anna, allal, Loubna Ben, De Toni, Francesco, Pistilli, Giada, Nguyen, Olivier, Nikpoor, Somaieh, Masoud, Maraim, Colombo, Pierre, de la Rosa, Javier, Villegas, Paulo, Thrush, Tristan, Longpre, Shayne, Nagel, Sebastian, Weber, Leon, Muñoz, Manuel, Zhu, Jian, Van Strien, Daniel, Alyafeai, Zaid, Almubarak, Khalid, Vu, Minh Chien, Gonzalez-Dios, Itziar, Soroa, Aitor, Lo, Kyle, Dey, Manan, Suarez, Pedro Ortiz, Gokaslan, Aaron, Bose, Shamik, Adelani, David, Phan, Long, Tran, Hieu, Yu, Ian, Pai, Suhas, Chim, Jenny, Lepercq, Violette, Ilic, Suzana, Mitchell, Margaret, Luccioni, Sasha Alexandra, Jernite, Yacine

Published 07-03-2023
“…As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
19
Establishing a New State-of-the-Art for French Named Entity Recognition by Suárez, Pedro Javier Ortiz, Dupont, Yoann, Muller, Benjamin, Romary, Laurent, Sagot, Benoît

Published 27-05-2020
“…LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France The French TreeBank developed at the University Paris 7 is the main…”

Get full text

Journal Article
QR Code
Save to List

Saved in: