• mozz
    link
    fedilink
    47 months ago

    Simple and just works

        def fetch_html(self, url):
            domain = urllib.parse.urlparse(url).netloc
            if domain not in self.robot_parsers:
                rp = urllib.robotparser.RobotFileParser()
                rp.set_url(f'https://{domain}/robots.txt')
                rp.read()
                self.robot_parsers[domain] = rp
    
            rp = self.robot_parsers[domain]
            if not rp.can_fetch(self.user_agent, url):
                print(f"Fetching not allowed by robots.txt: {url}")
                return None
    
            if self.last_fetch_time:
                time_since_last_fetch = time.time() - self.last_fetch_time
                if time_since_last_fetch < self.delay:
                    time.sleep(self.delay - time_since_last_fetch)
    
            headers = {'User-Agent': self.user_agent}
            response = requests.get(url, headers=headers)
            self.last_fetch_time = time.time()
    
            if response.status_code == 200:
                return response.text
            else:
                print(f"Failed to fetch {url}: {response.status_code}")
                return None
    

    Randomly selected something from a project I’m working on that’s simple and just works. Show me less than 300 lines of .NET to do the same, and I would be somewhat surprised.

      • mozz
        link
        fedilink
        07 months ago

        I am familiar.

        Not saying don’t pay your bills with it; that part sounds great. I was just confused by this guy’s enthusiasm for it, that’s all.

        • @[email protected]
          link
          fedilink
          English
          17 months ago

          Well, for starters, it’s a greentext, so who knows how genuine it is, right? Most of the points listed are either subjective or citation needed fodder. However, maybe there’s one fact I can bring to the table:

          ASP.NET’s benchmark performance ranked 16th in Round 22 of the TechEmpower Web Framework Benchmarks, ranking below solutions written in Rust, C, Java, and JS. C# has advantages over each of those languages and frameworks in exchange for the relative loss in performance. Rust and C are much lower level. Java’s syntax is generally considered to lag behind C#'s at this point. JS’s disadvantages could fill a whole post of their own. C# and .NET have their own disadvantages (such as relatively fewer libraries available) as you’ve pointed out in this thread and another in this post, but when you take into consideration the relatively high performance while being a strongly-typed higher-level language with plenty of nice QoL features, you might be able to see why it could be attractive to a specific slice of professionals.

    • @ShortFuse
      link
      17 months ago

      You left out the hundred of lines from the library you’re importing. Where’s all the code for robotparser?

      You can import libraries with C# too. That says nothing about the differences between languages.

      • mozz
        link
        fedilink
        27 months ago

        That’s exactly my point though - part of my assertion of a big weakness in C# would be that more mainstream languages (python or node) have massive libraries you can draw on with existing code for simple stuff like parsing robots.txt, whereas C# has one that probably seems pretty luxurious if you’re comparing it to nothing, but is well short of what OSS programmers are accustomed to.

        So yeah it’s not a purely fair language-design comparison but it’s a perfectly fair “how easy is it to get stuff done in this language” comparison. And then at a certain point it starts to become not just a convenience but a whole new area of computation (something like numpy or pytorch) that’s simply impossible in C# without a whole research project devoted to it to implement. That said, I’m sure there are areas (esp in heavily business-oriented fields like airline or medical backend or whatnot) where it’s the other way around, of course, and you have C#-specific stuff for that domain that would be real difficult to replicate in some other environment. I’m not trying to say that side doesn’t exist, just saying what’s generally applicable to my experience.

        So I’m not like being critical of C# because of language features (it seems perfectly fine and functional; I get what the people are saying who say they get work done every day in it and it seems fine.) But also, I think it’s relevant that it’s missing some big advantages if you’re trying to go beyond the “it doesn’t actively punish you for using it” stage.