How do dictionaries source attestation?Are there dictionaries like Collins COBUILD for other languages than...

How to deal with an underperforming subordinate?

Is the fingering of thirds flexible or do I have to follow the rules?

Why is it that Bernie Sanders is always called a "socialist"?

Why is 'diphthong' not pronounced otherwise?

A fantasy book with seven white haired women on the cover

Does it take energy to move something in a circle?

I have trouble understanding this fallacy: "If A, then B. Therefore if not-B, then not-A."

hrule into tikz circle node

What's the oldest plausible frozen specimen for a Jurassic Park style story-line?

Minimum Viable Product for RTS game?

Is Screenshot Time-tracking Common?

Where does documentation like business and software requirement spec docs fit in an agile project?

What is a good reason for every spaceship to carry gun on board?

Why did Luke use his left hand to shoot?

How do you get out of your own psychology to write characters?

Potential client has a problematic employee I can't work with

Count repetitions of an array

Equivalent of "illegal" for violating civil law

How do I avoid the "chosen hero" feeling?

Are the positive and negative planes inner or outer planes?

Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?

What can I do to encourage my players to use their consumables?

How do I narratively explain how in-game circumstances do not mechanically allow a PC to instantly kill an NPC?

Does diversity provide anything that meritocracy does not?



How do dictionaries source attestation?


Are there dictionaries like Collins COBUILD for other languages than English?How are meanings of a word ordered in a dictionary?How are dictionaries producedHow can a multi-language dictionary be made?Is there an open source English dictionary that isn't too fine-grained in defining a word?How do native speakers determine a word's literal/basic meaning?an open source lexicographical framework













4















Some dictionaries source attestation and try to go for the earliest quotes they can find. How do they find them? Without electronic indexing this must have been impossibly difficult.



The reason I'm asking is that many answers on Stack Exchange often conclude earliest attestation and imply earliest use, instead of the logical used at the latest in ... That's a slightly different issue to frame the question, not to make it too broad. To be fair, no inference is made because of this in many cases.










share|improve this question

























  • Help, I don't know which answer to mark, they are both nearly identical.

    – vectory
    16 hours ago
















4















Some dictionaries source attestation and try to go for the earliest quotes they can find. How do they find them? Without electronic indexing this must have been impossibly difficult.



The reason I'm asking is that many answers on Stack Exchange often conclude earliest attestation and imply earliest use, instead of the logical used at the latest in ... That's a slightly different issue to frame the question, not to make it too broad. To be fair, no inference is made because of this in many cases.










share|improve this question

























  • Help, I don't know which answer to mark, they are both nearly identical.

    – vectory
    16 hours ago














4












4








4


1






Some dictionaries source attestation and try to go for the earliest quotes they can find. How do they find them? Without electronic indexing this must have been impossibly difficult.



The reason I'm asking is that many answers on Stack Exchange often conclude earliest attestation and imply earliest use, instead of the logical used at the latest in ... That's a slightly different issue to frame the question, not to make it too broad. To be fair, no inference is made because of this in many cases.










share|improve this question
















Some dictionaries source attestation and try to go for the earliest quotes they can find. How do they find them? Without electronic indexing this must have been impossibly difficult.



The reason I'm asking is that many answers on Stack Exchange often conclude earliest attestation and imply earliest use, instead of the logical used at the latest in ... That's a slightly different issue to frame the question, not to make it too broad. To be fair, no inference is made because of this in many cases.







lexicography






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 13 hours ago









Peter Mortensen

1032




1032










asked 22 hours ago









vectoryvectory

35211




35211













  • Help, I don't know which answer to mark, they are both nearly identical.

    – vectory
    16 hours ago



















  • Help, I don't know which answer to mark, they are both nearly identical.

    – vectory
    16 hours ago

















Help, I don't know which answer to mark, they are both nearly identical.

– vectory
16 hours ago





Help, I don't know which answer to mark, they are both nearly identical.

– vectory
16 hours ago










3 Answers
3






active

oldest

votes


















5














I agree with your assumption that the date of the earliest recorded usage of a word does not necessarily correspond to the earliest usage of a word, since words may have been in circulation in spoken language before they were first used in publications, and many old publications have simply not survived. I touched upon this issue in my answer to the question How many of Shakespeare's words in his plays were new? on Literature Stack Exchange: a number of words that are first attested in Shakespeare's plays may have circulated earlier, either in the spoken language or in writings that have not survived (or both).



Before there were digital corpora and digital texts, lexicographers had to read physical books to find usage examples. The editors of the first edition of the Oxford English Dictionary (which started their work in 1857 or 1858) asked thousands of volunteer readers to submit usage examples, in what was an early example of crowdsourcing. (See the crowdsourcing timeline, which says that 800 volunteers contributed to the first fascicle alone.)



See also the description of the OED's Reading Programme. James A. H. Murray set this up in 1879; its focus was not specifically on finding the earliest examples of words and phrases. In addition, the first edition "relied heavily on a small number of authors (notably, of course, Shakespeare) for its coverage of Early Modern English (1500-1700)". This selection was possibly too narrow to find the earliest surviving examples of words and phrases. For this reason, the current Historical Reading Programme looks at a broader spectrum of texts:




Today, readers systematically survey a much broader spectrum of texts from this and other periods. A separate Historical Reading Programme has been created to serve this function. Readers participating in this programme supply material specifically for the revision of the OED; usually these are earlier examples of words or phrases that are already included in the Dictionary. These readers check their findings against the Oxford English Dictionary to ensure that only significant items are filed, such as earlier examples of words and meanings or terms that have not yet been registered in the database.




In short, before the digital era, finding early usage examples required a lot of reading.






share|improve this answer

































    4














    Typically, you don't ever really know for certain that you have the earliest example. Or even the earliest written example. It's just the best so far.



    (As a person who frequently writes answers to etymology questions on ELU, I try to make this clear. "According to the OED", "according to my own research", "dates at least back to X" are all things I say, but I sometimes get sloppy and don't do this all the time.)



    There are some exceptions. We can be very certain that "cromulent", for example, was coined in 1996 (or '95 depending on when the episode of the Simpsons was written).





    How etymological research is done has varied through time. In the case of the "New English Dictionary" (the first edition of the Oxford English Dictionary), work started on it in 1857. Then:




    [I]n January 1859, the Society issued their 'Proposal for the publication of a New English Dictionary,' in which the characteristics of the proposed work were explained, and an appeal made to the English and American public to assist in collecting the raw materials for the work, these materials consisting of quotations illustrating the use of English words by all writers of all ages and in all senses, each quotation being made on a uniform plan on a half-sheet of notepaper, that they might in due course be arranged and classified alphabetically and by meanings. This Appeal met with a generous response: some hundreds of volunteers began to read books, make quotations, and send in their slips to 'sub-editors,' who volunteered each to take charge of a letter or part of one, and by whom the slips were in tum further arranged, classified, and to some extent used as the basis of definitions and skeleton schemes of the meanings of words in preparation for the Dictionary.
    An Appeal to the English-Speaking and English-Reading Public to Read Books and Make Extracts for The Philological Society's New English Dictionary




    The "Reading Programme" is still used by the OED, although the methodology is different. The books are still read all the same but here's what happens next according to a freelance researcher for the OED:




    I then consult OED Online to determine whether the word or phrase is in the Dictionary: if it is not, I submit it as a ‘not-in’, and if it is, I decide whether its form or context is important enough to warrant its submission. If it does qualify, I enter the information into tagged fields in an electronic file that has been set up in a standard format. When I have finished the reading, I submit the file to Oxford or New York, where the records are incorporated into OED‘s working database for consideration by the editors, along with thousands of paper citation slips, as they proceed through the current revision. Yes, some of my finds are still submitted as paper slips—a reminder of OED‘s long heritage—but, electronic or paper, I can hardly imagine a better job.




    The quotations were collected in a machine readable format for the first time in 1989. The 1990 UK Reading Programme captured material electronically. (Note that the second edition of the Oxford English Dictionary came out in 1989.)



    In addition to this, the OED now utilizes several online databases of texts, such as Early English Books Online, Eighteenth Century Collections Online, and some newspaper databases.



    If you do your own research with databases (many people use the free Google Books), it's often easy to beat pages that haven't been updated for the third edition of the OED. Updates to the OED3 started in 2000 and continue to this day: it's a huge dictionary and updating takes time.



    See also:




    • OED: Researching the Language






    share|improve this answer
























    • I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

      – vectory
      16 hours ago



















    2














    It is not uncommon to detect word usage that predates the earliest attestation noted in the OED, for example in the Corpus of Early English Correspondence such earlier attestations were found.



    Of course, when really every written document from the past is digitised this process comes to an end unless some old writings are newly discovered.






    share|improve this answer
























    • ... or if the digitization is faulty

      – vectory
      16 hours ago











    • Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

      – jknappen
      16 hours ago











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "312"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f30661%2fhow-do-dictionaries-source-attestation%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5














    I agree with your assumption that the date of the earliest recorded usage of a word does not necessarily correspond to the earliest usage of a word, since words may have been in circulation in spoken language before they were first used in publications, and many old publications have simply not survived. I touched upon this issue in my answer to the question How many of Shakespeare's words in his plays were new? on Literature Stack Exchange: a number of words that are first attested in Shakespeare's plays may have circulated earlier, either in the spoken language or in writings that have not survived (or both).



    Before there were digital corpora and digital texts, lexicographers had to read physical books to find usage examples. The editors of the first edition of the Oxford English Dictionary (which started their work in 1857 or 1858) asked thousands of volunteer readers to submit usage examples, in what was an early example of crowdsourcing. (See the crowdsourcing timeline, which says that 800 volunteers contributed to the first fascicle alone.)



    See also the description of the OED's Reading Programme. James A. H. Murray set this up in 1879; its focus was not specifically on finding the earliest examples of words and phrases. In addition, the first edition "relied heavily on a small number of authors (notably, of course, Shakespeare) for its coverage of Early Modern English (1500-1700)". This selection was possibly too narrow to find the earliest surviving examples of words and phrases. For this reason, the current Historical Reading Programme looks at a broader spectrum of texts:




    Today, readers systematically survey a much broader spectrum of texts from this and other periods. A separate Historical Reading Programme has been created to serve this function. Readers participating in this programme supply material specifically for the revision of the OED; usually these are earlier examples of words or phrases that are already included in the Dictionary. These readers check their findings against the Oxford English Dictionary to ensure that only significant items are filed, such as earlier examples of words and meanings or terms that have not yet been registered in the database.




    In short, before the digital era, finding early usage examples required a lot of reading.






    share|improve this answer






























      5














      I agree with your assumption that the date of the earliest recorded usage of a word does not necessarily correspond to the earliest usage of a word, since words may have been in circulation in spoken language before they were first used in publications, and many old publications have simply not survived. I touched upon this issue in my answer to the question How many of Shakespeare's words in his plays were new? on Literature Stack Exchange: a number of words that are first attested in Shakespeare's plays may have circulated earlier, either in the spoken language or in writings that have not survived (or both).



      Before there were digital corpora and digital texts, lexicographers had to read physical books to find usage examples. The editors of the first edition of the Oxford English Dictionary (which started their work in 1857 or 1858) asked thousands of volunteer readers to submit usage examples, in what was an early example of crowdsourcing. (See the crowdsourcing timeline, which says that 800 volunteers contributed to the first fascicle alone.)



      See also the description of the OED's Reading Programme. James A. H. Murray set this up in 1879; its focus was not specifically on finding the earliest examples of words and phrases. In addition, the first edition "relied heavily on a small number of authors (notably, of course, Shakespeare) for its coverage of Early Modern English (1500-1700)". This selection was possibly too narrow to find the earliest surviving examples of words and phrases. For this reason, the current Historical Reading Programme looks at a broader spectrum of texts:




      Today, readers systematically survey a much broader spectrum of texts from this and other periods. A separate Historical Reading Programme has been created to serve this function. Readers participating in this programme supply material specifically for the revision of the OED; usually these are earlier examples of words or phrases that are already included in the Dictionary. These readers check their findings against the Oxford English Dictionary to ensure that only significant items are filed, such as earlier examples of words and meanings or terms that have not yet been registered in the database.




      In short, before the digital era, finding early usage examples required a lot of reading.






      share|improve this answer




























        5












        5








        5







        I agree with your assumption that the date of the earliest recorded usage of a word does not necessarily correspond to the earliest usage of a word, since words may have been in circulation in spoken language before they were first used in publications, and many old publications have simply not survived. I touched upon this issue in my answer to the question How many of Shakespeare's words in his plays were new? on Literature Stack Exchange: a number of words that are first attested in Shakespeare's plays may have circulated earlier, either in the spoken language or in writings that have not survived (or both).



        Before there were digital corpora and digital texts, lexicographers had to read physical books to find usage examples. The editors of the first edition of the Oxford English Dictionary (which started their work in 1857 or 1858) asked thousands of volunteer readers to submit usage examples, in what was an early example of crowdsourcing. (See the crowdsourcing timeline, which says that 800 volunteers contributed to the first fascicle alone.)



        See also the description of the OED's Reading Programme. James A. H. Murray set this up in 1879; its focus was not specifically on finding the earliest examples of words and phrases. In addition, the first edition "relied heavily on a small number of authors (notably, of course, Shakespeare) for its coverage of Early Modern English (1500-1700)". This selection was possibly too narrow to find the earliest surviving examples of words and phrases. For this reason, the current Historical Reading Programme looks at a broader spectrum of texts:




        Today, readers systematically survey a much broader spectrum of texts from this and other periods. A separate Historical Reading Programme has been created to serve this function. Readers participating in this programme supply material specifically for the revision of the OED; usually these are earlier examples of words or phrases that are already included in the Dictionary. These readers check their findings against the Oxford English Dictionary to ensure that only significant items are filed, such as earlier examples of words and meanings or terms that have not yet been registered in the database.




        In short, before the digital era, finding early usage examples required a lot of reading.






        share|improve this answer















        I agree with your assumption that the date of the earliest recorded usage of a word does not necessarily correspond to the earliest usage of a word, since words may have been in circulation in spoken language before they were first used in publications, and many old publications have simply not survived. I touched upon this issue in my answer to the question How many of Shakespeare's words in his plays were new? on Literature Stack Exchange: a number of words that are first attested in Shakespeare's plays may have circulated earlier, either in the spoken language or in writings that have not survived (or both).



        Before there were digital corpora and digital texts, lexicographers had to read physical books to find usage examples. The editors of the first edition of the Oxford English Dictionary (which started their work in 1857 or 1858) asked thousands of volunteer readers to submit usage examples, in what was an early example of crowdsourcing. (See the crowdsourcing timeline, which says that 800 volunteers contributed to the first fascicle alone.)



        See also the description of the OED's Reading Programme. James A. H. Murray set this up in 1879; its focus was not specifically on finding the earliest examples of words and phrases. In addition, the first edition "relied heavily on a small number of authors (notably, of course, Shakespeare) for its coverage of Early Modern English (1500-1700)". This selection was possibly too narrow to find the earliest surviving examples of words and phrases. For this reason, the current Historical Reading Programme looks at a broader spectrum of texts:




        Today, readers systematically survey a much broader spectrum of texts from this and other periods. A separate Historical Reading Programme has been created to serve this function. Readers participating in this programme supply material specifically for the revision of the OED; usually these are earlier examples of words or phrases that are already included in the Dictionary. These readers check their findings against the Oxford English Dictionary to ensure that only significant items are filed, such as earlier examples of words and meanings or terms that have not yet been registered in the database.




        In short, before the digital era, finding early usage examples required a lot of reading.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 17 hours ago

























        answered 20 hours ago









        Christophe StrobbeChristophe Strobbe

        4031312




        4031312























            4














            Typically, you don't ever really know for certain that you have the earliest example. Or even the earliest written example. It's just the best so far.



            (As a person who frequently writes answers to etymology questions on ELU, I try to make this clear. "According to the OED", "according to my own research", "dates at least back to X" are all things I say, but I sometimes get sloppy and don't do this all the time.)



            There are some exceptions. We can be very certain that "cromulent", for example, was coined in 1996 (or '95 depending on when the episode of the Simpsons was written).





            How etymological research is done has varied through time. In the case of the "New English Dictionary" (the first edition of the Oxford English Dictionary), work started on it in 1857. Then:




            [I]n January 1859, the Society issued their 'Proposal for the publication of a New English Dictionary,' in which the characteristics of the proposed work were explained, and an appeal made to the English and American public to assist in collecting the raw materials for the work, these materials consisting of quotations illustrating the use of English words by all writers of all ages and in all senses, each quotation being made on a uniform plan on a half-sheet of notepaper, that they might in due course be arranged and classified alphabetically and by meanings. This Appeal met with a generous response: some hundreds of volunteers began to read books, make quotations, and send in their slips to 'sub-editors,' who volunteered each to take charge of a letter or part of one, and by whom the slips were in tum further arranged, classified, and to some extent used as the basis of definitions and skeleton schemes of the meanings of words in preparation for the Dictionary.
            An Appeal to the English-Speaking and English-Reading Public to Read Books and Make Extracts for The Philological Society's New English Dictionary




            The "Reading Programme" is still used by the OED, although the methodology is different. The books are still read all the same but here's what happens next according to a freelance researcher for the OED:




            I then consult OED Online to determine whether the word or phrase is in the Dictionary: if it is not, I submit it as a ‘not-in’, and if it is, I decide whether its form or context is important enough to warrant its submission. If it does qualify, I enter the information into tagged fields in an electronic file that has been set up in a standard format. When I have finished the reading, I submit the file to Oxford or New York, where the records are incorporated into OED‘s working database for consideration by the editors, along with thousands of paper citation slips, as they proceed through the current revision. Yes, some of my finds are still submitted as paper slips—a reminder of OED‘s long heritage—but, electronic or paper, I can hardly imagine a better job.




            The quotations were collected in a machine readable format for the first time in 1989. The 1990 UK Reading Programme captured material electronically. (Note that the second edition of the Oxford English Dictionary came out in 1989.)



            In addition to this, the OED now utilizes several online databases of texts, such as Early English Books Online, Eighteenth Century Collections Online, and some newspaper databases.



            If you do your own research with databases (many people use the free Google Books), it's often easy to beat pages that haven't been updated for the third edition of the OED. Updates to the OED3 started in 2000 and continue to this day: it's a huge dictionary and updating takes time.



            See also:




            • OED: Researching the Language






            share|improve this answer
























            • I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

              – vectory
              16 hours ago
















            4














            Typically, you don't ever really know for certain that you have the earliest example. Or even the earliest written example. It's just the best so far.



            (As a person who frequently writes answers to etymology questions on ELU, I try to make this clear. "According to the OED", "according to my own research", "dates at least back to X" are all things I say, but I sometimes get sloppy and don't do this all the time.)



            There are some exceptions. We can be very certain that "cromulent", for example, was coined in 1996 (or '95 depending on when the episode of the Simpsons was written).





            How etymological research is done has varied through time. In the case of the "New English Dictionary" (the first edition of the Oxford English Dictionary), work started on it in 1857. Then:




            [I]n January 1859, the Society issued their 'Proposal for the publication of a New English Dictionary,' in which the characteristics of the proposed work were explained, and an appeal made to the English and American public to assist in collecting the raw materials for the work, these materials consisting of quotations illustrating the use of English words by all writers of all ages and in all senses, each quotation being made on a uniform plan on a half-sheet of notepaper, that they might in due course be arranged and classified alphabetically and by meanings. This Appeal met with a generous response: some hundreds of volunteers began to read books, make quotations, and send in their slips to 'sub-editors,' who volunteered each to take charge of a letter or part of one, and by whom the slips were in tum further arranged, classified, and to some extent used as the basis of definitions and skeleton schemes of the meanings of words in preparation for the Dictionary.
            An Appeal to the English-Speaking and English-Reading Public to Read Books and Make Extracts for The Philological Society's New English Dictionary




            The "Reading Programme" is still used by the OED, although the methodology is different. The books are still read all the same but here's what happens next according to a freelance researcher for the OED:




            I then consult OED Online to determine whether the word or phrase is in the Dictionary: if it is not, I submit it as a ‘not-in’, and if it is, I decide whether its form or context is important enough to warrant its submission. If it does qualify, I enter the information into tagged fields in an electronic file that has been set up in a standard format. When I have finished the reading, I submit the file to Oxford or New York, where the records are incorporated into OED‘s working database for consideration by the editors, along with thousands of paper citation slips, as they proceed through the current revision. Yes, some of my finds are still submitted as paper slips—a reminder of OED‘s long heritage—but, electronic or paper, I can hardly imagine a better job.




            The quotations were collected in a machine readable format for the first time in 1989. The 1990 UK Reading Programme captured material electronically. (Note that the second edition of the Oxford English Dictionary came out in 1989.)



            In addition to this, the OED now utilizes several online databases of texts, such as Early English Books Online, Eighteenth Century Collections Online, and some newspaper databases.



            If you do your own research with databases (many people use the free Google Books), it's often easy to beat pages that haven't been updated for the third edition of the OED. Updates to the OED3 started in 2000 and continue to this day: it's a huge dictionary and updating takes time.



            See also:




            • OED: Researching the Language






            share|improve this answer
























            • I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

              – vectory
              16 hours ago














            4












            4








            4







            Typically, you don't ever really know for certain that you have the earliest example. Or even the earliest written example. It's just the best so far.



            (As a person who frequently writes answers to etymology questions on ELU, I try to make this clear. "According to the OED", "according to my own research", "dates at least back to X" are all things I say, but I sometimes get sloppy and don't do this all the time.)



            There are some exceptions. We can be very certain that "cromulent", for example, was coined in 1996 (or '95 depending on when the episode of the Simpsons was written).





            How etymological research is done has varied through time. In the case of the "New English Dictionary" (the first edition of the Oxford English Dictionary), work started on it in 1857. Then:




            [I]n January 1859, the Society issued their 'Proposal for the publication of a New English Dictionary,' in which the characteristics of the proposed work were explained, and an appeal made to the English and American public to assist in collecting the raw materials for the work, these materials consisting of quotations illustrating the use of English words by all writers of all ages and in all senses, each quotation being made on a uniform plan on a half-sheet of notepaper, that they might in due course be arranged and classified alphabetically and by meanings. This Appeal met with a generous response: some hundreds of volunteers began to read books, make quotations, and send in their slips to 'sub-editors,' who volunteered each to take charge of a letter or part of one, and by whom the slips were in tum further arranged, classified, and to some extent used as the basis of definitions and skeleton schemes of the meanings of words in preparation for the Dictionary.
            An Appeal to the English-Speaking and English-Reading Public to Read Books and Make Extracts for The Philological Society's New English Dictionary




            The "Reading Programme" is still used by the OED, although the methodology is different. The books are still read all the same but here's what happens next according to a freelance researcher for the OED:




            I then consult OED Online to determine whether the word or phrase is in the Dictionary: if it is not, I submit it as a ‘not-in’, and if it is, I decide whether its form or context is important enough to warrant its submission. If it does qualify, I enter the information into tagged fields in an electronic file that has been set up in a standard format. When I have finished the reading, I submit the file to Oxford or New York, where the records are incorporated into OED‘s working database for consideration by the editors, along with thousands of paper citation slips, as they proceed through the current revision. Yes, some of my finds are still submitted as paper slips—a reminder of OED‘s long heritage—but, electronic or paper, I can hardly imagine a better job.




            The quotations were collected in a machine readable format for the first time in 1989. The 1990 UK Reading Programme captured material electronically. (Note that the second edition of the Oxford English Dictionary came out in 1989.)



            In addition to this, the OED now utilizes several online databases of texts, such as Early English Books Online, Eighteenth Century Collections Online, and some newspaper databases.



            If you do your own research with databases (many people use the free Google Books), it's often easy to beat pages that haven't been updated for the third edition of the OED. Updates to the OED3 started in 2000 and continue to this day: it's a huge dictionary and updating takes time.



            See also:




            • OED: Researching the Language






            share|improve this answer













            Typically, you don't ever really know for certain that you have the earliest example. Or even the earliest written example. It's just the best so far.



            (As a person who frequently writes answers to etymology questions on ELU, I try to make this clear. "According to the OED", "according to my own research", "dates at least back to X" are all things I say, but I sometimes get sloppy and don't do this all the time.)



            There are some exceptions. We can be very certain that "cromulent", for example, was coined in 1996 (or '95 depending on when the episode of the Simpsons was written).





            How etymological research is done has varied through time. In the case of the "New English Dictionary" (the first edition of the Oxford English Dictionary), work started on it in 1857. Then:




            [I]n January 1859, the Society issued their 'Proposal for the publication of a New English Dictionary,' in which the characteristics of the proposed work were explained, and an appeal made to the English and American public to assist in collecting the raw materials for the work, these materials consisting of quotations illustrating the use of English words by all writers of all ages and in all senses, each quotation being made on a uniform plan on a half-sheet of notepaper, that they might in due course be arranged and classified alphabetically and by meanings. This Appeal met with a generous response: some hundreds of volunteers began to read books, make quotations, and send in their slips to 'sub-editors,' who volunteered each to take charge of a letter or part of one, and by whom the slips were in tum further arranged, classified, and to some extent used as the basis of definitions and skeleton schemes of the meanings of words in preparation for the Dictionary.
            An Appeal to the English-Speaking and English-Reading Public to Read Books and Make Extracts for The Philological Society's New English Dictionary




            The "Reading Programme" is still used by the OED, although the methodology is different. The books are still read all the same but here's what happens next according to a freelance researcher for the OED:




            I then consult OED Online to determine whether the word or phrase is in the Dictionary: if it is not, I submit it as a ‘not-in’, and if it is, I decide whether its form or context is important enough to warrant its submission. If it does qualify, I enter the information into tagged fields in an electronic file that has been set up in a standard format. When I have finished the reading, I submit the file to Oxford or New York, where the records are incorporated into OED‘s working database for consideration by the editors, along with thousands of paper citation slips, as they proceed through the current revision. Yes, some of my finds are still submitted as paper slips—a reminder of OED‘s long heritage—but, electronic or paper, I can hardly imagine a better job.




            The quotations were collected in a machine readable format for the first time in 1989. The 1990 UK Reading Programme captured material electronically. (Note that the second edition of the Oxford English Dictionary came out in 1989.)



            In addition to this, the OED now utilizes several online databases of texts, such as Early English Books Online, Eighteenth Century Collections Online, and some newspaper databases.



            If you do your own research with databases (many people use the free Google Books), it's often easy to beat pages that haven't been updated for the third edition of the OED. Updates to the OED3 started in 2000 and continue to this day: it's a huge dictionary and updating takes time.



            See also:




            • OED: Researching the Language







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 19 hours ago









            LaurelLaurel

            3115




            3115













            • I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

              – vectory
              16 hours ago



















            • I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

              – vectory
              16 hours ago

















            I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

            – vectory
            16 hours ago





            I'm asking specifically because gbooks is not satisfying, not the least because OCR fails for old books. I take it that "you don't ever really know for certain" counts for lost copies, as well as otherwise unread/not-scanned books. "hundreds of volunteers" pale in comparison to the amount of available books.

            – vectory
            16 hours ago











            2














            It is not uncommon to detect word usage that predates the earliest attestation noted in the OED, for example in the Corpus of Early English Correspondence such earlier attestations were found.



            Of course, when really every written document from the past is digitised this process comes to an end unless some old writings are newly discovered.






            share|improve this answer
























            • ... or if the digitization is faulty

              – vectory
              16 hours ago











            • Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

              – jknappen
              16 hours ago
















            2














            It is not uncommon to detect word usage that predates the earliest attestation noted in the OED, for example in the Corpus of Early English Correspondence such earlier attestations were found.



            Of course, when really every written document from the past is digitised this process comes to an end unless some old writings are newly discovered.






            share|improve this answer
























            • ... or if the digitization is faulty

              – vectory
              16 hours ago











            • Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

              – jknappen
              16 hours ago














            2












            2








            2







            It is not uncommon to detect word usage that predates the earliest attestation noted in the OED, for example in the Corpus of Early English Correspondence such earlier attestations were found.



            Of course, when really every written document from the past is digitised this process comes to an end unless some old writings are newly discovered.






            share|improve this answer













            It is not uncommon to detect word usage that predates the earliest attestation noted in the OED, for example in the Corpus of Early English Correspondence such earlier attestations were found.



            Of course, when really every written document from the past is digitised this process comes to an end unless some old writings are newly discovered.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 16 hours ago









            jknappenjknappen

            11.4k22853




            11.4k22853













            • ... or if the digitization is faulty

              – vectory
              16 hours ago











            • Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

              – jknappen
              16 hours ago



















            • ... or if the digitization is faulty

              – vectory
              16 hours ago











            • Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

              – jknappen
              16 hours ago

















            ... or if the digitization is faulty

            – vectory
            16 hours ago





            ... or if the digitization is faulty

            – vectory
            16 hours ago













            Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

            – jknappen
            16 hours ago





            Given enough time and workforce there will be perfect or near-perfect digitisations of everything and not just mdeiOCRe OCR'ed scans. Of course there is some leeway of different readings or reconstructions of partially destroyed texts.

            – jknappen
            16 hours ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Linguistics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f30661%2fhow-do-dictionaries-source-attestation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Paper upload error, “Upload failed: The top margin is 0.715 in on page 3, which is below the required...

            Emraan Hashmi Filmografia | Linki zewnętrzne | Menu nawigacyjneGulshan GroverGulshan...

            How can I write this formula?newline and italics added with leqWhy does widehat behave differently if I...