• Backup Your Home Server with Duplicati

    (Note: this is part of my ongoing series on cheaply selfhosting)

    Through some readily available Docker containers and OpenMediaVault, I have a cheap mini-PC which serves as:

    But, over time, as the server has picked up more uses, it’s also become a vulnerability. If any of the drives on my machine ever fail, I’ll lose data that is personally (and sometimes economically) significant.

    I needed a home server backup plan.

    Duplicati

    Duplicati is open source software that helps you efficiently and securely backup specific partitions and folders to any destination. This could be another home server or it can be a cloud service provider (like Amazon S3 or Backblaze B2 or even a consumer service like Dropbox, Google Drive, and OneDrive). While there are many other tools that can support backup, I went with Duplicati because I wanted:

    • Support for consumer storage services as a target: I am a customer of Google Drive (through Google One) and Microsoft 365 (which comes with generous OneDrive subscription) and only intend to backup some of the files I’m currently storing (mainly some of the network storage I’m using to hold important files)
    • A web-based control interface so I could access this from any computer (and not just whichever machine had the software I wanted)
    • An active user form so I could find how-to guides and potentially get help
    • Available as a Docker container on linuxserver.io: linuxserver.io is well-known for hosting and maintaining high quality and up-to-date Docker container images

    Installation

    To install Duplicati on OpenMediaVault:

    • If you haven’t already, make sure you have OMV Extras and Docker Compose installed (refer to the section Docker and OMV-Extras in my previous post, you’ll want to follow all 10 steps as I refer to different parts of the process throughout this post) and have a static local IP address assigned to your server.
    • Login to your OpenMediaVault web admin panel, and then go to [Services > Compose > Files] in the sidebar. Press the button in the main interface to add a new Docker compose file.

      Under Name put down Duplicati and under File, adapt the following (making sure the number of spaces are consistent)
      ---
      services:
      duplicati:
      image: lscr.io/linuxserver/duplicati:latest
      container_name: duplicati
      ports:
      - <unused port number>:8200
      environment:
      - TZ: 'America/Los_Angeles'
      - PUID=<UID of Docker User>
      - PGID=<GID of Docker User>
      volumes:
      - <absolute paths to folders to backup>:<names to use in Duplicati interface>
      - <absolute path to shared config folder>/Duplicati:/config
      restart: unless-stopped
      Under ports:, make sure to add an unused port number (I went with 8200).

      Replace <absolute path to shared config folder> with the absolute path to the config folder where you want Docker-installed applications to store their configuration information (accessible by going to [Storage > Shared Folders] in the administrative panel).

      You’ll notice there’s extra lines under volumes: for <absolute paths to folders to backup>. This should correspond with the folders you are interested in backing up. You should map them to names that will show up in the Duplicati interface that you recognize. For example, I directed my <absolute path to shared config folder> to /containerconfigs as one of the things I want to make sure I backup are my containers.

      Once you’re done, hit Save and you should be returned to your list of Docker compose files for the next step. Notice that the new Duplicati entry you created has a Down status, showing the container has yet to be initialized.
    • To start your Duplicati container, click on the new Duplicati entry and press the (up) button. This will create the container, download any files needed, and run it.

      To show it worked, go to your-servers-static-ip-address:8200 from a browser that’s on the same network as your server (replacing 8200 if you picked a different port in the configuration file above) and you should see the Duplicati web interface which should look something like below
    • You can skip this step if you didn’t set up Pihole and local DNS / Nginx proxy or if you don’t care about having a user-readable domain name for Duplicati. But, assuming you do and you followed my instructions, open up WeTTy (which you can do by going to wetty.home in your browser if you followed my instructions or by going to [Services > WeTTY] from OpenMediaVault administrative panel and pressing Open UI button in the main panel) and login as the root user. Run:
      cd /etc/nginx/conf.d
      ls
      Pick out the file you created before for your domains and run
      nano <your file name>.conf
      This opens up the text editor nano with the file you just listed. Use your cursor to go to the very bottom of the file and add the following lines (making sure to use tabs and end each line with a semicolon)
      server {
      listen 80;
      server_name <duplicati.home or the domain you'd like to use>;
      location / {
      proxy_pass http://<your-server-static-ip>:<duplicati port no.>;
      }
      }
      And then hit Ctrl+X to exit, Y to save, and Enter to overwrite the existing file. Then in the command line run the following to restart Nginx with your new configuration loaded.
      systemctl restart nginx
      Now, if your server sees a request for duplicati.home (or whichever domain you picked), it will direct them to Duplicati.

      Login to your Pihole administrative console (you can just go to pi.hole in a browser) and click on [Local DNS > DNS Records] from the sidebar. Under the section called Add a new domain/IP combination, fill out under Domain: the domain you just added above (i.e. duplicati.home) and next to IP Address: you should add your server’s static IP address. Press the Add button and it will show up below.

      To make sure it all works, enter the domain you just added (duplicati.home if you went with my default) in a browser and you should see the Duplicati interface!

    Configuring your Backups

    Duplicati conceives of each “backup” as a “source” (folder of files to backup), a “destination” (the place the files should be backed up to), a schedule (how often does the backup run), and some options to configure how the backup works.

    To configure a “backup”, click on +Add Backup button on the menu on the lefthand side. I’ll show you the screens I went through to backup my Docker container configurations:

    1. Add a name (I called it DockerConfigs) and enter a Passphrase (you can use the Generate link to create a strong password) which you’d use to restore from backup. Then hit Next
    2. Enter a destination. Here, you can select another computer or folder connected to your network. You can also select an online storage service.

      I’m using Microsoft OneDrive — for a different service, a quick Google search or a search of the Duplicati how-to forum can give you more specific instructions, but the basic steps of generating an AuthID link appear to be similar across many services.

      I selected Microsoft OneDrive v2 and picked a path in my OneDrive for the backup to go to (Backup/dockerconfigs). I then clicked on the AuthID link and went through an authentication process to formally grant Duplicati access to OneDrive. Depending on the service, you may need to manually copy a long string of letters and numbers and colons into the text field. After all of that, to prove it all worked, press Test connection!

      Then hit Next
    3. Select the source. Use the folder browsing widget on the interface to select the folder you wish to backup.

      If you recall in my configuration step, I mapped the <absolute path to shared config folder> to /containerconfigs which is why I selected this as a one-click way to backup all my Docker container configurations. If necessary, feel free to shut down and delete your current container and start over with a configuration where you point and map the folders in a better way.

      Then hit Next
    4. Pick a schedule. Do you want to backup every day? Once a week? Twice a week? Since my docker container configurations don’t change that frequently, I decided to schedule weekly backups on Saturday early morning (so it wouldn’t interfere with something else I might be doing).

      Pick your option and then hit Next
    5. Select your backup options. Unless you have a strong reason to, I would not change the remote volume size from the default (50 MB). The backup retention, however, is something you may want to think about. Duplicati gives you the option to hold on to every backup (something I would not do unless you have a massive amount of storage relative to the amount of data you want to backup), to hold on to backups younger than a certain age, to hold on to a specific number of backups, or customized permutations of the above.

      The option you should choose depends on your circumstances, but to share what I did. For some of my most important files, I’m using Duplicati’s smart backup retention option (which gives me one backup from the last week, one for each of the last 4 weeks, and one for each of the last 12 months). For some of my less important files (for example, my docker container configurations), I’m holding on to just the last 2 weeks worth of backups.

      Then hit Save and you’re set!

    I hope this helps you on your self-hosted backup journey.

    If you’re interested in how to setup a home server on OpenMediaVault or how to self-host different services, check out all my posts on the subject!

  • The California home insurance conundrum

    As a California homeowner, I’ve watched with dismay as homeowner insurance provider after homeowner insurance provider have fled the state in the face of wildfire risk.

    It was quite the shock when I discovered recently (HT: Axios Markets newsletter) that, according to NerdWallet, California actually has some of the cheapest homeowners insurance rates in the country!

    It begs the Econ 101 question — is it really that the cost of wildfires are too high? Or that the price insurance companies can charge (something heavily regulated by state insurance commissions) is kept too low / not allowed to vary enough based on actual fire risk?

  • Why Intel has to make its foundry business work

    Historically, Intel has (1) designed and (2) manufactured its chips that it sells (primarily into computer and server systems). It prided itself on having the most advanced (1) designs and (2) manufacturing technology, keeping both close to its chest.

    In the late 90s/00s, semiconductor companies increasingly embraced the “fabless model”, whereby they would only do the (1) design while outsourcing the manufacturing to foundries like TSMC. This made it much easier and less expensive to build up a burgeoning chip business and is the secret to the success of semiconductor giants like NVIDIA and Qualcomm.

    Companies like Intel scoffed at this, arguing that the combination of (1) design and (2) manufacturing gave their products an advantage, one that they used to achieve a dominant position in the computing chip segment. And, it’s an argument which underpins why they have never made a significant effort in becoming a contract manufacturer — after all, if part of your technological magic is the (2) manufacturing, why give it to anyone else?

    The success of TSMC has brought a lot of questions about Intel’s advantage in manufacturing and, given recent announcements by Intel and the US’s CHIPS Act, a renewed focus on actually becoming a contract manufacturer to the world’s leading chip designers.

    While much of the attention has been paid to the manufacturing prowess rivalry and the geopolitical reasons behind this, I think the real reason Intel has to make the foundry business work is simple: their biggest customers are all becoming chip designers.

    While a lot of laptops and desktops and servers are still sold in the traditional fashion, the reality is more and more of the server market is being dominated by a handful of hyperscale data center operators like Amazon, Google, Meta/Facebook, and Microsoft, companies that have historically been able to obtain the best prices from Intel because of their volume. But, in recent years, in the chase for better and better performance and cost and power consumption, they have begun designing their own chips adapted to their own systems (as this latest Google announcement for Google’s own ARM-based server chips shows).

    Are these chips as good as Intel’s across every dimension? Almost certainly not. It’s hard to overtake a company like Intel’s decades of design prowess and market insight. But, they don’t have to be. They only have to be better at the specific use case Google / Microsoft / Amazon / etc need it to be for.

    And, in that regard, that leaves Intel with really only one option: it has to make the foundry business work, or it risks losing not just the revenue from (1) designing a data center chip, but from the (2) manufacturing as well.


  • Starlink in the wrong hands

    On one level, this shouldn’t be a surprise. Globally always available satellite constellation = everyone and anyone will try to access this. This was, like many technologies, always going to have positive impacts — i.e. people accessing the internet where they otherwise couldn’t due to lack of telecommunications infrastructure or repression — and negative — i.e. terrorists and criminal groups evading communications blackouts.

    The question is whether or not SpaceX had the foresight to realize this was a likely outcome and to institute security processes and checks to reduce the likelihood of the negative.

    That remains to be seen…


    Elon Musk’s Starlink Terminals Are Falling Into the Wrong Hands
    Bruce Einhorn, Loni Prinsloo, Marissa Newman, Simon Marks | Bloomberg

  • Why don’t we (still) have rapid viral diagnostics?

    One of the most disappointing outcomes in the US from the COVID pandemic was the rise of the antivaxxer / public health skeptic and the dramatic politicization of public health measures.

    But, not everything disappointing has stemmed from that. Our lack of cheap rapid tests for diseases like Flu and RSV is a sad reminder of our regulatory system failing to learn from the COVID crisis of the value of cheap, rapid in-home testing or adopting to the new reality that many Americans now know how to do such testing.


  • Huggingface: security vulnerability?

    Anyone who’s done any AI work is familiar with Huggingface. They are a repository of trained AI models and maintainer of AI libraries and services that have helped push forward AI research. It is now considered standard practice for research teams with something to boast to publish their models to Huggingface for all to embrace. This culture of open sharing has helped the field make its impressive strides in recent years and helped make Huggingface a “center” in that community.

    However, this ease of use and availability of almost every publicly accessible model under the sun comes with a price. Because many AI models require additional assets as well as the execution of code to properly initialize, Huggingface’s own tooling could become a vulnerability. Aware of this, Huggingface has instituted their own security scanning procedures on models they host.

    But security researchers at JFrog have found that even with such measures, have identified a number of models that exploit gaps in Huggingface’s scanning which allow for remote code execution. One example model they identified baked into a Pytorch model a “phone home” functionality which would initiate a secure connection between the server running the AI model and another (potentially malicious) computer (seemingly based in Korea).

    The JFrog researchers were also able to demonstrate that they could upload models which would allow them to execute other arbitrary Python code which would not be flagged by Huggingface’s security scans.

    While I think it’s a long way from suggesting that Huggingface is some kind of security cesspool, the research reminds us that so long as a connected system is both popular and versatile, there will always be the chance for security risk, and it’s important to keep that in mind.


  • Nope, the Dunning-Kruger Effect is just bad statistics

    The Dunning-Kruger effect encapsulates something many of us feel familiar with: that the least intelligent oftentimes assume they know more than they actually do. Wrap that sentiment in an academic paper written by two professors at an Ivy League institution and throw in some charts and statistics and you’ve got a easily citable piece of trivia to make yourself feel smarter than the person who you just caught commenting on something they know nothing about.

    Well, according to this fascinating blog post (HT: Eric), we have it all wrong. The way that Dunning-Kruger constructed their statistical test was designed to always construct a positive relationship between skill and perceived ability.

    The whole thing is worth a read, but they showed that using completely randomly generated numbers (where there is no relationship between perceived ability and skill), you will always find a relationship between the “skill gap” (perceived ability – skill) and skill, or to put it more plainly,

     (y-x) \sim x

    With y being perceived ability and x being actual measured ability.

    What you should be looking for is a relationship between perceived ability and measured ability (or directly between y and x) and when you do this with data, you find that the evidence for such a claim generally isn’t there!

    In other words:


    The Dunning-Kruger Effect is Autocorrelation
    Blair Fix | Economics from the Top Down

  • A Heart Atlas

    The human heart is an incredibly sophisticated organ that, in addition to being one of the first organs developed while embryos develop, is quite difficult to understand at a cellular level (where are the cells, how do they first develop, etc.).

    Neil Chi’s group at UCSD (link to Nature paper) were able to use multiplex imaging of fluorescent-tagged RNA molecules to profile the gene expression profiles of different types of heart cells and see where they are located and how they develop!

    The result is an amazing visualization, check it out at the video:

  • Cat Bond Fortunes

    Until recently, I only knew of the existence of cat(astrophe) bonds — financial instruments used to raise money for insurance against catastrophic events where investors profit when no disaster happens.

    I had no idea, until reading this Bloomberg article about the success of Fermat Capital Management, how large the space had gotten ($45 billion!!) or how it was one of the most profitable hedge fund strategies of 2023!

    This is becoming an increasingly important intersection between climate change and finance as insurance companies and property owners struggle with the rising risk of rising damage from extreme climate events. Given how young much of the science of evaluating these types of risks is, it’s no surprise that quantitative minds and modelers are able to profit here.

    The entire piece reminded me of Richard Zeckhauser’s famous 2006 article Investing in the Unknown and Unknowable which covers how massive investment returns can be realized by tackling problems that seem too difficult for other investors to understand.


  • Shein and Temu now drive global cargo

    Maybe you have shopped on Shein or Temu. Maybe you only know someone (younger?) who has. Maybe you only know Temu because of their repeat Superbowl ads.

    But these Chinese eCommerce companies are now the main driver behind air and ship cargo rates with Temu and Shein combined accounting for 9,000 tons per day of shipments!

    This is scale.


    Rise of fast-fashion Shein, Temu roils global air cargo industry
    Arriana McLymore, Casey Hall, and Lisa Barrington | Reuters

  • Geothermal data centers

    The data centers that power AI and cloud services are limited by 3 things:

    • the server hardware (oftentimes limited by access to advanced semiconductors)
    • available space (their footprint is massive which makes it hard to put them close to where people live)
    • availability of cheap & reliable (and, generally, clean) power

    If you, as a data center operator, can tap a new source of cheap & reliable power, you will go very far as you alleviate one of the main constraints on the ability to add to your footprint.

    It’s no small wonder, then, that Google is willing to explore partnerships with next-gen geothermal startups like Fervo in a meaningful long-term fashion.


  • Dexcom non-prescription glucose monitor approved

    Cheap and accurate continuous glucose monitoring is a bit of a holy grail for consumer metabolic health as it allows people to understand how their diet and exercise impact their blood sugar levels, which can vary from person to person.

    It’s also a holy grail for diabetes care as making sure blood sugar levels are neither too high nor too low is critical for health (too low and you can pass out or risk seizure or coma; too high and you risk diabetic neuropathy, kidney disease, and cardiovascular problems). For Type I diabetics and severe Type II diabetics, it’s also vital for dosing insulin.

    Because insulin dosing needs to be done just right, I was always under the impression that one of two things would happen along the way to producing a cheap continuous glucose monitor, either:

    1. The FDA would be hesitant to approve a device that wasn’t highly accurate to avoid the risk of a consumer using the reading to mis-dose insulin OR
    2. The device makers (like Dexcom) would be hesitant to create an accurate enough glucose monitor that it might cannibalize their highly profitable prescription glucose monitoring business

    As a result, I was pleasantly surprised that Dexcom’s over-the-counter Stelo continuous glucose monitor was approved by the FDA. It remains to be seen what the price will be and what level of information the Stelo will share with the customer, but I view this as a positive development and (at least for now) tip my hat to both the FDA and Dexcom here.

    (Thanks to Erin Brodwin from Axios for sharing the news on X)


  • “Corporate” Design

    Read an introspective piece by famed ex-Frog Design leader Robert Fabricant about the state of the design industry and the unease that he says many of his peers are feeling. While I disagree with some of the concerns he lays out around AI / diversity being the drivers of this unease, he makes a strong case for how this is a natural pendulum swing after years of seeing “Chief Design Officers” and design innovation groups added to many corporate giants.

    I’ve had the privilege of working with very strong designers. This has helped me appreciate the value of design thinking as something that goes far beyond “making things pretty” and believe, wholeheartedly, that it’s something that should be more broadly adopted.

    At the same time, it’s also not a surprise to me that during a time of layoffs and cost cutting, a design function which has become a little “spoiled” in the past years and of which calculating financial returns is experiencing some painful transition especially for creative-minded designers who struggle with that ROI evolution.

    If Phase 1 was getting companies to recognize that design thinking is needed, Phase 2 will be the space learning how to measure, communicate, and optimize what the value of a team of seasoned designers brings to the bottom line.


  • Costco Love

    Nice piece in the Economist about how Costco’s model of operational simplicity leads to a unique position in modern retail: beloved by customers, investors, AND workers:

    • sell fewer things ➡️
    • get better prices from suppliers & less inventory needed ➡️
    • lower costs for customers ➡️
    • more customers & more willing to pay recurring membership fee ➡️
    • strong, recurring profits ➡️
    • ability to pay well and promote from within 📈💪🏻

    Why Costco is so loved
    The Economist

  • How packaging tech is changing how we build & design chips

    Once upon a time, the hottest thing in chip design was “system-on-a-chip” (SOC). The idea is that you’d get the best cost and performance out of a chip by combining more parts into one piece of silicon. This would result in smaller area (less silicon = less cost) and faster performance (closer parts = faster communication) and resulted in more and more chips integrating more and more things.

    While the laws of physics haven’t reversed any of the above, the cost of designing chips that integrate more and more components has gone up sharply. Worse, different types of parts (like on-chip memory and physical/analog componentry) don’t scale down as well as pure logic transistors, making it very difficult to design chips that combine all these pieces.

    The rise of new types of packaging technologies, like Intel’s Foveros, Intel’s EMIB, TSMC’s InFO, new ways of separating power delivery from data delivery (backside power delivery), and more, has also made it so that you can more tightly integrate different pieces of silicon and improve their performance and size/cost.

    The result is now that many of the most advanced silicon today is built as packages of chiplets rather than as massive SOC projects: a change that has happened over a fairly short period of time.

    This interview with IMEC (a semiconductor industry research center)’s head of logic technologies breaks this out…


    What is CMOS 2.0?
    Samuel K. Moore | IEEE Spectrum

  • Store all the things: clean electricity means thermal energy storage boom

    Thermal energy storage has been a difficult place for climatetech in years past. The low cost of fossil fuels (the source for vast majority of high temperature industrial heat to date) and the failure of large scale solar thermal power plants to compete with the rapidly scaling solar photovoltaic industry made thermal storage feel like, at best, a market reserved for niche applications with unique fossil fuel price dynamics. This is despite some incredibly cool (dad-joke intended 🔥🥵🤓) technological ingenuity in the space.

    But, in a classic case of how cheap universal inputs change market dynamics, the plummeting cost and soaring availability of renewable electricity and the growing desire for industrial companies to get “clean” sources of industrial heat has resulted in almost a renaissance for the space as this Canary Media article (with a very nice table of thermal energy startups) points out.

    With cheap renewables (especially if the price varies), companies can buy electricity at low (sometimes near-zero if in the middle of a sunny and windy day) prices, convert that to high-temperature heat with an electric furnace, and store it for use later.

    While the devil’s in the details, in particular the round trip energy efficiency (how much energy you can get out versus what you put in), the delivered heat temperature range and rate (how hot and how much power), and, of course, the cost of the system, technologies like this could represent a key technology to green sectors of the economy that would otherwise be extremely difficult to lower carbon output for.


  • The IE6 YouTube conspiracy

    An oldie but a goodie — the story of how the YouTube team, post-Google acquisition, put up a “we won’t support Internet Explorer 6 in the future” message without any permission from anyone. (HT: Eric S)


    A Conspiracy to Kill IE6
    Chris Zacharias

  • Using your ear to control devices

    Very cool that we’re still finding new things we can control that can be applied to making the lives of people better.


  • Intel’s focus on chip packaging technology

    Intel has been interested in entering the foundry (semiconductor contract manufacturing) space for a long time. For years, Intel proudly boasted of being at the forefront of semiconductor technology — being first to market with the FinFET and smaller and smaller process geometries.

    So it’s interesting how, with the exception of the RibbonFET (the successor to the FinFET), almost all of Intel’s manufacturing technology announcements (see whitepaper) in it’s whitepaper to appeal to prospective foundry customers, all of it’s announcements pertain to packaging / “back end” technologies.

    I think it’s both a recognition that they are no longer the furthest ahead in that race, as well as recognition that Moore’s Law scaling has diminishing returns for many applications. Now, a major cost and performance driver is technology that was once considered easily outsourced to low cost assemblers in Asia is now front and center.


    A Peek at Intel’s Future Foundry Tech
    Samuel K. Moore | IEEE Spectrum

  • Iovance brings cell therapy to solid tumors

    Immune cell therapy — the use of modified immune cells directly to control cancer and autoimmune disease — has shown incredible results in liquid tumors (cancers of the blood and bone marrow like lymphoma, leukemia, etc), but has stumbled in addressing solid tumors.

    Iovance, which recently had its drug lifileucel approved by the FDA to treat advanced melanoma, has demonstrated an interesting spin on the cellular path which may prove to be effective in solid tumors. They extract Tumor-Infiltrating Lymphocytes (TILs), immune cells that are already “trying” to attack a solid tumor directly. Iovance then treats those TILs with their own proprietary process to expand the number of those cells and “further activate” them (to resist a tumor’s efforts to inactivate immune cells that may come after them) before reintroducing them to the patient.

    This is logistically very challenging (not dissimilar to what patients awaiting other cell therapies or Vertex’s new sickle cell treatment need to go through) as it also requires chemotherapy for lymphocyte depletion in the patient prior to reintroduction of the activated TILs. But, the upshot is that you now have an expanded population of cells known to be predisposed to attacking a solid tumor that can now resist the tumor’s immune suppression efforts.

    And, they’ve presented some impressive 4-year followup data on a study of advanced melanoma in patients who have already failed immune checkpoint inhibitor therapy, enough to convince the FDA of their effectiveness!

    To me, the beauty of this method is that it can work across tumor types. Iovance’s process (from what I’ve gleamed from their posters & presentations) works by getting more and more activated immune cells. Because they’re derived from the patient, these cells are already predisposed to attack the particular molecular targets of their tumor.

    This is contrast to most other immune cell therapy approaches (like CAR-T) where the process is inherently target-specific (i.e. get cells that go after this particular marker on this particular tumor) and each new target / tumor requires R&D work to validate. Couple this with the fact that TILs are already the body’s first line of defense against solid tumors and you may have an interesting platform for immune cell therapy in solid tumors.

    The devil’s in the details and requires more clinical study on more cancer types, but suffice to say, I think this is incredibly exciting!